Iβm in your hypervisor, collecting your evidence
Authored by Erik Schamper
Data acquisition during incident response engagements is always a big exercise, both for us and our clients. Itβs rarely smooth sailing, and we usually encounter a hiccup or two. Fox-ITβs approach to enterprise scale incident response for the past few years has been to collect small forensic artefact packages using our internal data collection utility, βacquireβ, usually deployed using the clientsβ preferred method of software deployment. While this method works fine in most cases, we often encounter scenarios where deploying our software is tricky or downright impossible. For example, the client may not have appropriate software deployment methods or has fallen victim to ransomware, leaving the infrastructure in a state where mass software deployment has become impossible.
Many businesses have moved to the cloud, but most of our clients still have an on-premises infrastructure, usually in the form of virtual environments. The entire on-premises infrastructure might be running on a handful of physical machines, yet still restricted by the software deployment methods available within the virtual network. It feels like that should be easier, right? The entire infrastructure is running on one or two physical machines, canβt we just collect data straight from there?
Turns out we can.
Setting the stage
Most of our clients who run virtualized environments use either VMware ESXi or Microsoft Hyper-V, with a slight bias towards ESXi. Hyper-V was considered the βeasyβ one between these two, so letβs first focus our attention towards ESXi.
VMware ESXi is one of the more popular virtualization platforms. Without going into too much premature detail on how everything works, itβs important to know that there are two primary components that make up an ESXi configuration: the hypervisor that runs virtual machines, and the datastore that stores all the files for virtual machines, like virtual disks. These datastores can be local storage or, more commonly, some form of network attached storage. ESXi datastores use VMwareβs proprietary VMFS filesystem.
There are several challenges that we need to overcome to make this possible. What those challenges are depends on which concessions weβre willing to make with regards to ease of use and flexibility. Iβm not one to back down from a challenge and not one to take unnecessary shortcuts that may come back to haunt me. Am I making this unnecessarily hard for myself? Perhaps. Will it pay off? Definitely.
The end goal is obvious, we want to be able to perform data acquisition on ideally live running virtual machines. Our internal data collection utility, βAcquireβ, will play a key part in this. Acquire itself isnβt anything special, really. It builds on top of the Dissect framework, which is where all its power and flexibility comes from. Acquire itself is really nothing more than a small script that utilizes Dissect to read some files from a target and write it to someplace else. Ideally, we can utilize all this same tooling at the end of this.
The first attempts
So why not just run Acquire on the virtual machine files from an ESXi shell? Unfortunately, ESXi locks access to all virtual machine files while that virtual machine is running. Youβd have to create full clones of every virtual machine youβd want to acquire, which takes up a lot of time and resources. This may be fine in small environments but becomes troublesome in environments with thousands of virtual machines or limited storage. We need some sort of offline access to these files.
Weβve already successfully done this in the past. However, those times took considerably more effort, time and resources and had their own set of issues. We would take a separate physical machine or virtual machine that was directly connected to the SAN where the ESXi datastores are located. Weβd then use the open-source vmfs-tools or vmfs6-tools to gain access to the files on these datastores. Using this method, weβre bypassing any file locks that ESXi or VMFS may impose on us, and we can run acquire on the virtual disks without any issues.
Well, almost without any issues. Unfortunately, vmfs-tools and vmfs6-tools arenβt exactly proper VMFS implementations and routinely cause errors or data corruption. Any incident responder using vmfs-tools and vmfs6-tools will run into those issues sooner or later and will have to find a way to deal with them in the context of their investigation. This method also requires a lot of manual effort, resources and coordination. Far from an ideal βfire and forgetβ data collection solution.
Next steps
We know that acquiring data directly from the datastore is possible, itβs just that our methods of accessing these datastores is very cumbersome. Canβt we somehow do all of this directly from an ESXi shell?
When using local or iSCSI network storage, ESXi also exposes the block devices of those datastores. While ESXi may put a lock on the files on a datastore, we can still read the on-device filesystem data just fine through these block devices. You can also run arbitrary executables on ESXi through its shell (except when using the execInstalledOnly configuration, or can you� ), so this opens some possibilities to run acquisition software directly from the hypervisor.
Remember I said I liked a challenge? So far, everything has been relatively straightforward. We can just incorporate vmfs-tools into acquire and call it a day. Acquire and Dissect are pure Python, though, and incorporating some C library could overcomplicate things. We also mentioned the data corruption in vmfs-tools, which is something we ideally avoid. So whatβs the next logical step? If you guessed βdo it yourselfβ you are correct!
You got something to prove?
While vmfs-tools works for the most part, it lacks a lot of βcorrectnessβ with regards to the implementation. Much respect to anyone who has worked on these tools over the years, but it leaves a lot on the table as far as a reference implementation goes. For our purposes we have some higher requirements on the correctness of a filesystem implementation, so itβs worth spending some time working on one ourselves.
As part of an upcoming engagement, there just so happened to be some time available to work on this project. I open my trusty IDA Pro and get to work reverse engineering VMFS. I use vmfs-tools as a reference to get an idea of the structure of the filesystem, while reverse engineering everything else completely from scratch.
Simultaneously I work on reconstructing an ESXi system from its βofflineβ state. With Dissect, our preferred approach is to always work from the cleanest slate possible, even when dealing with a live system . For ESXi, this means that we donβt utilize anything from the βliveβ system, but instead will reconstruct this βliveβ state within Dissect ourselves from however ESXi stores its files when itβs turned off. This can cause an initial higher effort but pays of in the end because we can then interface with ESXi in any possible way with the same codebase: live by reading the block devices, or offline from reading a disk image.
This also brought its own set of implementation and reverse engineering challenges, which include:
- Writing a FAT16 implementation, which ESXi uses for its bootbank filesystem.
- Writing a vmtar implementation, a slightly customized tar file that is used for storing OS files (akin to VIBs).
- Writing an Envelope implementation, a file encryption format that is used to encrypt system configuration on supported systems.
- Figuring out how ESXi mounts and symlinks its βliveβ filesystem together.
- Writing parsers for the various configuration file formats within ESXi.
After two weeks itβs time for the first trial run for this engagement. There are some initial missed edge cases, but a few quick iterations later and weβve just performed our first live evidence acquisition through the hypervisor!
A short excursion to Hyper-V
Now that weβve achieved our goal on ESXi, letβs take a quick look to see what we need to do to achieve the same on Hyper-V. I mentioned earlier that Hyper-V was the easy one, and it really was. Hyper-V is just an extension of Windows, and we already know how to deal with Windows in Dissect and Acquire. We only need to figure out how to get and interpret information about virtual machines and weβre off to the races.
Hyper-V uses VHD or VHDX as its virtual disks. We already support that in Dissect, so nothing to do there. We need some metadata on where these virtual disks are located, as well as which virtual disks belong to a virtual machine. This is important, because we want a complete picture of a system for data acquisition. Not only to collect all filesystem artefacts (e.g. MFT or UsnJrnl of all filesystems), but also because important artefacts, like the Windows event logs, may be configured to store data on a different filesystem. We also want to know where to find all the registered virtual machines, so that no manual steps are required to run Acquire on all of them.
A little bit of research shows that information about virtual machines was historically stored in XML files, but any recent version of Hyper-V uses VMCX files, a proprietary file format. Never easy! For comparison, VMware stores this virtual machine metadata in a plaintext VMX file. Information about which virtual machines are present is stored in another VMCX file, located at C:\ProgramData\Microsoft\Windows\Hyper-V\data.vmcx. Before weβre able to progress, we must parse these VMCX files.
Nothing our trusty IDA Pro canβt solve! A few hours later we have a fully featured parser and can easily extract the necessary information out of these VMCX files. Just add a few lines of code in Dissect to interpret this information and we have fully automated virtual machine acquisition capabilities for Hyper-V!
Conclusion
The end result of all this work is that we can add a new capability to our Dissect toolbelt: hypervisor data acquisition. As a bonus, we can now also easily perform investigations on hypervisor systems with the same toolset!
There are of course some limitations to these methods, most of which are related to how the storage is configured. At the time of writing, our approach only works on local or iSCSI-based storage. Usage of vSAN or NFS are currently unsupported. Thankfully most of our clients use these supported methods, and research into improvements obviously never stops.
We initially mentioned scale and ease of deployment as a primary motivator, but other important factors are stealth and preservation of evidence. These seem to be often overlooked by recent trends in DFIR, but theyβre still very important factors for us at Fox-IT. Especially when dealing with advanced threat actors, you want to be as stealthy as possible. Assuming your hypervisor isnβt compromised (), it doesnβt get much stealthier than performing data acquisition from the virtualization layer, while still maintaining some sense of scalability. This also achieves the ultimate preservation of evidence. Any new file you introduce or piece of software you execute contaminates your evidence, while also risking rollovers of evidence.
The last takeaway we want to mention is just how relatively easy all of this was. Sure, there was a lot of reverse engineering and writing filesystem implementations, but those are auxiliary tasks that you would have to perform regardless of the end goal. The important detail is that we only had to add a few lines of code to Dissect to have it all justβ¦ work. The immense flexibility of Dissect allows us to easily add βcomplexβ capabilities like these with ease. All our analysts can continue to use the same tools theyβre already used to, and we can employ all our existing analysis capabilities on these new platforms.
In the time between when this blog was written and published, Mandiant released excellent blog posts[1][2] about malware targeting VMware ESXi, highlighting the importance of hypervisor forensics and incident response. We will also dive deeper into this topic with future blog posts.
[1] https://www.mandiant.com/resources/blog/esxi-hypervisors-malware-persistence
[2] https://www.mandiant.com/resources/blog/esxi-hypervisors-detection-hardening