RSS Security

🔒
❌ About FreshRSS
There are new articles available, click to refresh the page.
Before yesterdayThreat Research

capa 2.0: Better, Faster, Stronger

19 July 2021 at 18:00

We are excited to announce version 2.0 of our open-source tool called capa. capa automatically identifies capabilities in programs using an extensible rule set. The tool supports both malware triage and deep dive reverse engineering. If you haven’t heard of capa before, or need a refresher, check out our first blog post. You can download capa 2.0 standalone binaries from the project’s release page and checkout the source code on GitHub.

capa 2.0 enables anyone to contribute rules more easily, which makes the existing ecosystem even more vibrant. This blog post details the following major improvements included in capa 2.0:

  • New features and enhancements for the capa explorer IDA Pro plugin, allowing you to interactively explore capabilities and write new rules without switching windows
  • More concise and relevant results via identification of library functions using FLIRT and the release of accompanying open-source FLIRT signatures
  • Hundreds of new rules describing additional malware capabilities, bringing the collection up to 579 total rules, with more than half associated with ATT&CK techniques
  • Migration to Python 3, to make it easier to integrate capa with other projects

capa explorer and Rule Generator

capa explorer is an IDAPython plugin that shows capa results directly within IDA Pro. The version 2.0 release includes many additions and improvements to the plugin, but we'd like to highlight the most exciting addition: capa explorer now helps you write new capa rules directly in IDA Pro!

Since we spend most of our time in reverse engineering tools such as IDA Pro analyzing malware, we decided to add a capa rule generator. Figure 1 shows the rule generator interface.


Figure 1: capa explorer rule generator interface

Once you’ve installed capa explorer using the Getting Started guide, open the plugin by navigating to Edit > Plugins > FLARE capa explorer. You can start using the rule generator by selecting the Rule Generator tab at the top of the capa explorer pane. From here, navigate your IDA Pro Disassembly view to the function containing a technique you'd like to capture and click the Analyze button. The rule generator will parse, format, and display all the capa features that it finds in your function. You can write your rule using the rule generator's three main panes: Features, Preview, and Editor. Your first step is to add features from the Features pane.

The Features pane is a tree view containing all the capa features extracted from your function. You can filter for specific features using the search bar at the top of the pane. Then, you can add features by double-clicking them. Figure 2 shows this in action.


Figure 2: capa explorer feature selection

As you add features from the Features pane, the rule generator automatically formats and adds them to the Preview and Editor panes. The Preview and Editor panes help you finesse the features that you've added and allow you to modify other information like the rule's metadata.

The Editor pane is an interactive tree view that displays the statement and feature hierarchy that forms your rule. You can reorder nodes using drag-and-drop and edit nodes via right-click context menus. To help humans understand the rule logic, you can add descriptions and comments to features by typing in the Description and Comment columns. The rule generator automatically formats any changes that you make in the Editor pane and adds them to the Preview pane. Figure 3 shows how to manipulate a rule using the Editor pane.


Figure 3: capa explorer editor pane

The Preview pane is an editable textbox containing the final rule text. You can edit any of the text displayed. The rule generator automatically formats any changes that you make in the Preview pane and adds them to the Editor pane. Figure 4 shows how to edit a rule directly in the Preview pane.


Figure 4: capa explorer preview pane

As you make edits the rule generator lints your rule and notifies you of any errors using messages displayed underneath the Preview pane. Once you've finished writing your rule you can save it to your capa rules directory by clicking the Save button. The rule generator saves exactly what is displayed in the Preview pane. It’s that simple!

We’ve found that using the capa explorer rule generator significantly reduces the amount of time spent writing new capa rules. This tool not only automates most of the rule writing process but also eliminates the need to context switch between IDA Pro and your favorite text editor allowing you to codify your malware knowledge while it’s fresh in your mind.

To learn more about capa explorer and the rule generator check out the README.

Library Function Identification Using FLIRT

As we wrote hundreds of capa rules and inspected thousands of capa results, we recognized that the tool sometimes shows distracting results due to embedded library code. We believe that capa needs to focus its attention on the programmer’s logic and ignore supporting library code. For example, highly optimized C/C++ runtime routines and open-source library code enable a programmer to quickly build a product but are not the product itself. Therefore, capa results should reflect the programmer’s intent for the program rather than a categorization of every byte in the program.

Compare the capa v1.6 results in Figure 5 versus capa v2.0 results in Figure 6. capa v2.0 identifies and skips almost 200 library functions and produces more relevant results.


Figure 5: capa v1.6 results without library code recognition


Figure 6: capa v2.0 results ignoring library code functions

So, we searched for a way to differentiate a programmer’s code from library code.

After experimenting with a few strategies, we landed upon the Fast Library Identification and Recognition Technology (FLIRT) developed by Hex-Rays. Notably, this technique has remained stable and effective since 1996, is fast, requires very limited code analysis, and enjoys a wide community in the IDA Pro userbase. We figured out how IDA Pro matches FLIRT signatures and re-implemented a matching engine in Rust with Python bindings. Then, we built an open-source signature set that covers many of the library routines encountered in modern malware. Finally, we updated capa to use the new signatures to guide its analysis.

capa uses these signatures to differentiate library code from a programmer’s code. While capa can extract and match against the names of embedded library functions, it will skip finding capabilities and behaviors within the library code. This way, capa results better reflect the logic written by a programmer.

Furthermore, library function identification drastically improves capa runtime performance: since capa skips processing of library functions, it can avoid the costly rule matching steps across a substantial percentage of real-world functions. Across our testbed of 206 samples, 28% of the 186,000 total functions are recognized as library code by our function signatures. As our implementation can recognize around 100,000 functions/sec, library function identification overhead is negligible and capa is approximately 25% faster than in 2020!

Finally, we introduced a new feature class that rule authors can use to match recognized library functions: function-name. This feature matches at the file-level scope. We’ve already started using this new capability to recognize specific implementations of cryptography routines, such as AES provided by Crypto++, as shown in the example rule in Figure 7.


Figure 7: Example rule using function-name to recognize AES via Crypto++

As we developed rules for interesting behaviors, we learned a lot about where uncommon techniques are used legitimately. For example, as malware analysts, we most commonly see the cpuid instruction alongside anti-analysis checks, such as in VM detection routines. Therefore, we naively crafted rules to flag this instruction. But, when we tested it against our testbed, the rule matched most modern programs because this instruction is often legitimately used in high-optimized routines, such as memcpy, to opt-in to newer CPU features. In hindsight, this is obvious, but at the time it was a little surprising to see cpuid in around 15% of all executables. With the new FLIRT support, capa recognizes the optimized memcpy routine embedded by Visual Studio and won’t flag the embedded cpuid instruction, as it's not part of the programmer’s code.

When a user upgrades to capa 2.0, they’ll see that the tool runs faster and provides more precise results.

Signature Generation

To provide the benefits of python-flirt to all users (especially those without an IDA Pro license) we have spent significant time to create a comprehensive FLIRT signature set for the common malware analysis use-case. The signatures come included with capa and are also available at our GitHub under the Apache 2.0 license. We believe that other projects can benefit greatly from this. For example, we expect the performance of FLOSS to improve once we’ve incorporated library function identification. Moreover, you can use our signatures with IDA Pro to recognize more library code.

Our initial signatures include:

  • From Microsoft Visual Studio (VS), for all major versions from VS6 to VS2019:
    • C and C++ run-time libraries
    • Active Template Library (ATL) and Microsoft Foundation Class (MFC) libraries
  • The following open-source projects as compiled with VS2015, VS2017, and VS2019:
    • CryptoPP
    • curl
    • Microsoft Detours
    • Mbed TLS (previously PolarSSL)
    • OpenSSL
    • zlib

Identifying and collecting the relevant library and object files took a lot of work. For the older VS versions this was done manually. For newer VS versions and the respective open-source projects we were able to automate the process using vcpgk and Docker.

We then used the IDA Pro FLAIR utilities to convert gigabytes of executable code into pattern files and then into signatures. This process required extensive research and much trial and error. For instance, we spent two weeks testing and exploring the various FLAIR options to understand the best combination. We appreciate Hex-Rays for providing high-quality signatures for IDA Pro and thank them for sharing their research and tools with the community.

To learn more about the pattern and signature file generation check out the siglib repository. The FLAIR utilities are available in the protected download area on Hex-Rays’ website.

Rule Updates

Since the initial release, the community has more than doubled the total capa rule count from 260 to over 570 capability detection rules! This means that capa recognizes many more techniques seen in real-world malware, certainly saving analysts time as they reverse engineer programs. And to reiterate, we’ve surfed a wave of support as almost 30 colleagues from a dozen organizations have volunteered their experience to develop these rules. Thank you!

Figure 8 provides a high-level overview of capabilities capa currently captures, including:

  • Host Interaction describes program functionality to interact with the file system, processes, and the registry
  • Anti-Analysis describes packers, Anti-VM, Anti-Debugging, and other related techniques
  • Collection describes functionality used to steal data such as credentials or credit card information
  • Data Manipulation describes capabilities to encrypt, decrypt, and hash data
  • Communication describes data transfer techniques such as HTTP, DNS, and TCP


Figure 8: Overview of capa rule categories

More than half of capa’s rules are associated with a MITRE ATT&CK technique including all techniques introduced in ATT&CK version 9 that lie within capa’s scope. Moreover, almost half of the capa rules are currently associated with a Malware Behavior Catalog (MBC) identifier.

For more than 70% of capa rules we have collected associated real-world binaries. Each binary implements interesting capabilities and exhibits noteworthy features. You can view the entire sample collection at our capa test files GitHub page. We rely heavily on these samples for developing and testing code enhancements and rule updates.

Python 3 Support

Finally, we’ve spent nearly three months migrating capa from Python 2.7 to Python 3. This involved working closely with vivisect and we would like to thank the team for their support. After extensive testing and a couple of releases supporting two Python versions, we’re excited that capa 2.0 and future versions will be Python 3 only.

Conclusion

Now that you’ve seen all the recent improvements to capa, we hope you’ll upgrade to the newest capa version right away! Thanks to library function identification capa will report faster and more relevant results. Hundreds of new rules capture the most interesting malware functionality while the improved capa explorer plugin helps you to focus your analysis and codify your malware knowledge while it’s fresh.

Standalone binaries for Windows, Mac, and Linux are available on the capa Releases page. To install capa from PyPi use the command pip install flare-capa. The source code is available at our capa GitHub page. The project page on GitHub contains detailed documentation, including thorough installation instructions and a walkthrough of capa explorer. Please use GitHub to ask questions, discuss ideas, and submit issues.

We highly encourage you to contribute to capa’s rule corpus. The improved IDA Pro plugin makes it easier than ever before. If you have any issues or ideas related to rules, please let us know on the GitHub repository. Remember, when you share a rule with the community, you scale your impact across hundreds of reverse engineers in dozens of organizations.

capa 2.0: Better, Faster, Stronger

19 July 2021 at 18:00

We are excited to announce version 2.0 of our open-source tool called capa. capa automatically identifies capabilities in programs using an extensible rule set. The tool supports both malware triage and deep dive reverse engineering. If you haven’t heard of capa before, or need a refresher, check out our first blog post. You can download capa 2.0 standalone binaries from the project’s release page and checkout the source code on GitHub.

capa 2.0 enables anyone to contribute rules more easily, which makes the existing ecosystem even more vibrant. This blog post details the following major improvements included in capa 2.0:

  • New features and enhancements for the capa explorer IDA Pro plugin, allowing you to interactively explore capabilities and write new rules without switching windows
  • More concise and relevant results via identification of library functions using FLIRT and the release of accompanying open-source FLIRT signatures
  • Hundreds of new rules describing additional malware capabilities, bringing the collection up to 579 total rules, with more than half associated with ATT&CK techniques
  • Migration to Python 3, to make it easier to integrate capa with other projects

capa explorer and Rule Generator

capa explorer is an IDAPython plugin that shows capa results directly within IDA Pro. The version 2.0 release includes many additions and improvements to the plugin, but we'd like to highlight the most exciting addition: capa explorer now helps you write new capa rules directly in IDA Pro!

Since we spend most of our time in reverse engineering tools such as IDA Pro analyzing malware, we decided to add a capa rule generator. Figure 1 shows the rule generator interface.


Figure 1: capa explorer rule generator interface

Once you’ve installed capa explorer using the Getting Started guide, open the plugin by navigating to Edit > Plugins > FLARE capa explorer. You can start using the rule generator by selecting the Rule Generator tab at the top of the capa explorer pane. From here, navigate your IDA Pro Disassembly view to the function containing a technique you'd like to capture and click the Analyze button. The rule generator will parse, format, and display all the capa features that it finds in your function. You can write your rule using the rule generator's three main panes: Features, Preview, and Editor. Your first step is to add features from the Features pane.

The Features pane is a tree view containing all the capa features extracted from your function. You can filter for specific features using the search bar at the top of the pane. Then, you can add features by double-clicking them. Figure 2 shows this in action.


Figure 2: capa explorer feature selection

As you add features from the Features pane, the rule generator automatically formats and adds them to the Preview and Editor panes. The Preview and Editor panes help you finesse the features that you've added and allow you to modify other information like the rule's metadata.

The Editor pane is an interactive tree view that displays the statement and feature hierarchy that forms your rule. You can reorder nodes using drag-and-drop and edit nodes via right-click context menus. To help humans understand the rule logic, you can add descriptions and comments to features by typing in the Description and Comment columns. The rule generator automatically formats any changes that you make in the Editor pane and adds them to the Preview pane. Figure 3 shows how to manipulate a rule using the Editor pane.


Figure 3: capa explorer editor pane

The Preview pane is an editable textbox containing the final rule text. You can edit any of the text displayed. The rule generator automatically formats any changes that you make in the Preview pane and adds them to the Editor pane. Figure 4 shows how to edit a rule directly in the Preview pane.


Figure 4: capa explorer preview pane

As you make edits the rule generator lints your rule and notifies you of any errors using messages displayed underneath the Preview pane. Once you've finished writing your rule you can save it to your capa rules directory by clicking the Save button. The rule generator saves exactly what is displayed in the Preview pane. It’s that simple!

We’ve found that using the capa explorer rule generator significantly reduces the amount of time spent writing new capa rules. This tool not only automates most of the rule writing process but also eliminates the need to context switch between IDA Pro and your favorite text editor allowing you to codify your malware knowledge while it’s fresh in your mind.

To learn more about capa explorer and the rule generator check out the README.

Library Function Identification Using FLIRT

As we wrote hundreds of capa rules and inspected thousands of capa results, we recognized that the tool sometimes shows distracting results due to embedded library code. We believe that capa needs to focus its attention on the programmer’s logic and ignore supporting library code. For example, highly optimized C/C++ runtime routines and open-source library code enable a programmer to quickly build a product but are not the product itself. Therefore, capa results should reflect the programmer’s intent for the program rather than a categorization of every byte in the program.

Compare the capa v1.6 results in Figure 5 versus capa v2.0 results in Figure 6. capa v2.0 identifies and skips almost 200 library functions and produces more relevant results.


Figure 5: capa v1.6 results without library code recognition


Figure 6: capa v2.0 results ignoring library code functions

So, we searched for a way to differentiate a programmer’s code from library code.

After experimenting with a few strategies, we landed upon the Fast Library Identification and Recognition Technology (FLIRT) developed by Hex-Rays. Notably, this technique has remained stable and effective since 1996, is fast, requires very limited code analysis, and enjoys a wide community in the IDA Pro userbase. We figured out how IDA Pro matches FLIRT signatures and re-implemented a matching engine in Rust with Python bindings. Then, we built an open-source signature set that covers many of the library routines encountered in modern malware. Finally, we updated capa to use the new signatures to guide its analysis.

capa uses these signatures to differentiate library code from a programmer’s code. While capa can extract and match against the names of embedded library functions, it will skip finding capabilities and behaviors within the library code. This way, capa results better reflect the logic written by a programmer.

Furthermore, library function identification drastically improves capa runtime performance: since capa skips processing of library functions, it can avoid the costly rule matching steps across a substantial percentage of real-world functions. Across our testbed of 206 samples, 28% of the 186,000 total functions are recognized as library code by our function signatures. As our implementation can recognize around 100,000 functions/sec, library function identification overhead is negligible and capa is approximately 25% faster than in 2020!

Finally, we introduced a new feature class that rule authors can use to match recognized library functions: function-name. This feature matches at the file-level scope. We’ve already started using this new capability to recognize specific implementations of cryptography routines, such as AES provided by Crypto++, as shown in the example rule in Figure 7.


Figure 7: Example rule using function-name to recognize AES via Crypto++

As we developed rules for interesting behaviors, we learned a lot about where uncommon techniques are used legitimately. For example, as malware analysts, we most commonly see the cpuid instruction alongside anti-analysis checks, such as in VM detection routines. Therefore, we naively crafted rules to flag this instruction. But, when we tested it against our testbed, the rule matched most modern programs because this instruction is often legitimately used in high-optimized routines, such as memcpy, to opt-in to newer CPU features. In hindsight, this is obvious, but at the time it was a little surprising to see cpuid in around 15% of all executables. With the new FLIRT support, capa recognizes the optimized memcpy routine embedded by Visual Studio and won’t flag the embedded cpuid instruction, as it's not part of the programmer’s code.

When a user upgrades to capa 2.0, they’ll see that the tool runs faster and provides more precise results.

Signature Generation

To provide the benefits of python-flirt to all users (especially those without an IDA Pro license) we have spent significant time to create a comprehensive FLIRT signature set for the common malware analysis use-case. The signatures come included with capa and are also available at our GitHub under the Apache 2.0 license. We believe that other projects can benefit greatly from this. For example, we expect the performance of FLOSS to improve once we’ve incorporated library function identification. Moreover, you can use our signatures with IDA Pro to recognize more library code.

Our initial signatures include:

  • From Microsoft Visual Studio (VS), for all major versions from VS6 to VS2019:
    • C and C++ run-time libraries
    • Active Template Library (ATL) and Microsoft Foundation Class (MFC) libraries
  • The following open-source projects as compiled with VS2015, VS2017, and VS2019:
    • CryptoPP
    • curl
    • Microsoft Detours
    • Mbed TLS (previously PolarSSL)
    • OpenSSL
    • zlib

Identifying and collecting the relevant library and object files took a lot of work. For the older VS versions this was done manually. For newer VS versions and the respective open-source projects we were able to automate the process using vcpgk and Docker.

We then used the IDA Pro FLAIR utilities to convert gigabytes of executable code into pattern files and then into signatures. This process required extensive research and much trial and error. For instance, we spent two weeks testing and exploring the various FLAIR options to understand the best combination. We appreciate Hex-Rays for providing high-quality signatures for IDA Pro and thank them for sharing their research and tools with the community.

To learn more about the pattern and signature file generation check out the siglib repository. The FLAIR utilities are available in the protected download area on Hex-Rays’ website.

Rule Updates

Since the initial release, the community has more than doubled the total capa rule count from 260 to over 570 capability detection rules! This means that capa recognizes many more techniques seen in real-world malware, certainly saving analysts time as they reverse engineer programs. And to reiterate, we’ve surfed a wave of support as almost 30 colleagues from a dozen organizations have volunteered their experience to develop these rules. Thank you!

Figure 8 provides a high-level overview of capabilities capa currently captures, including:

  • Host Interaction describes program functionality to interact with the file system, processes, and the registry
  • Anti-Analysis describes packers, Anti-VM, Anti-Debugging, and other related techniques
  • Collection describes functionality used to steal data such as credentials or credit card information
  • Data Manipulation describes capabilities to encrypt, decrypt, and hash data
  • Communication describes data transfer techniques such as HTTP, DNS, and TCP


Figure 8: Overview of capa rule categories

More than half of capa’s rules are associated with a MITRE ATT&CK technique including all techniques introduced in ATT&CK version 9 that lie within capa’s scope. Moreover, almost half of the capa rules are currently associated with a Malware Behavior Catalog (MBC) identifier.

For more than 70% of capa rules we have collected associated real-world binaries. Each binary implements interesting capabilities and exhibits noteworthy features. You can view the entire sample collection at our capa test files GitHub page. We rely heavily on these samples for developing and testing code enhancements and rule updates.

Python 3 Support

Finally, we’ve spent nearly three months migrating capa from Python 2.7 to Python 3. This involved working closely with vivisect and we would like to thank the team for their support. After extensive testing and a couple of releases supporting two Python versions, we’re excited that capa 2.0 and future versions will be Python 3 only.

Conclusion

Now that you’ve seen all the recent improvements to capa, we hope you’ll upgrade to the newest capa version right away! Thanks to library function identification capa will report faster and more relevant results. Hundreds of new rules capture the most interesting malware functionality while the improved capa explorer plugin helps you to focus your analysis and codify your malware knowledge while it’s fresh.

Standalone binaries for Windows, Mac, and Linux are available on the capa Releases page. To install capa from PyPi use the command pip install flare-capa. The source code is available at our capa GitHub page. The project page on GitHub contains detailed documentation, including thorough installation instructions and a walkthrough of capa explorer. Please use GitHub to ask questions, discuss ideas, and submit issues.

We highly encourage you to contribute to capa’s rule corpus. The improved IDA Pro plugin makes it easier than ever before. If you have any issues or ideas related to rules, please let us know on the GitHub repository. Remember, when you share a rule with the community, you scale your impact across hundreds of reverse engineers in dozens of organizations.

Incident Response with NTFS INDX Buffers – Part 1: Extracting an INDX Attribute

18 September 2012 at 23:23

By William Ballenthin & Jeff Hamm

On August 30, 2012, we presented a webinar on how to use INDX buffers to assist in an incident response investigation. During the Q&A portion of the webinar we received many questions; however, we were not able to answer all of them. We're going to attempt to answer the remaining questions by posting a four part series on this blog. This series will address:

Part 1: Extracting an INDX Record

An INDX buffer in the NTFS file system tracks the contents of a folder. INDX buffers can be resident in the $MFT (Master File Table) as an index root attribute (attribute type 0x90) or non-resident as an index allocation attribute (attribute 0xA0) (non-resident meaning that the content of the attribute is in the data area on the volume.)

INDX root attributes have a dynamic size in the MFT, so as the contents change, the size of the attributes change. When an INDX root attribute shrinks, the surrounding attributes shift and overwrite any old data. Therefore, it is not possible to recover slack entries from INDX root attributes. On the other hand, the file system driver allocates INDX allocation attributes in multiples of 4096, even though the records may only be 40 bytes. As file system activity adds and removes INDX records from an allocation attribute, old records may still be recoverable in the slack space found between the last valid entry and the end of the 4096 chunk. This is very interesting to a forensic investigator. Fortunately, many forensic tools support extracting the INDX allocation attributes from images of an NTFS file system.

Scenario

Let's say that during your investigation you identified a directory of interest that you want to examine further. In the scenario we used during the webinar, we identified a directory as being of interest because we did a keyword search for "1.rar". The results of the search indicated that the slack space of an INDX attribute contained the suspicious filename "1.rar". The INDX attribute had the $MFT record number 49.

Before we can parse the data, we need to extract the valid index attribute's content. Using various forensic tools, we are capable of this as demonstrated below.

The SleuthKit

We can use the SleuthKit tools to extract both the INDX root and allocation data. To extract the INDX attribute using the SleuthKit, the first step is to identify the $MFT record IDs for the attributes of the inode. We want the content of the index root attribute (attribute type 0x90 or 144d) and the index allocation attribute (attribute type 0xA0 or 160d).

To identify the attribute IDs, run the command:

istat -f ntfs ntfs.dd 49

The istat command returns inode information from the $MFT. In the command we are specifying the NTFS file system with the "-f" switch. The tool reads a raw image named "ntfs.dd" and locates record number 49. The result of our output (truncated) was as follows:

....
Attributes:
Type: $STANDARD_INFORMATION (16-0) Name: Resident size: 72

...

Type: $I30 (144-6) Name: $I30 Resident size: 26

Type: $I30 (160-7) Name: $I30 Non-Resident size: 4096

The information returned for the attribute list includes the index root - $I30 (144-6) - and an index allocation - $I30 (160-7). The attribute identifier is the integer listed after the dash. Therefore, the index root attribute 144 has an identifier of 6, and the index allocation attribute 160 has an identifier of 7.

With this information, we can gather the content of the attributes with the SleuthKit commands:

icat -f ntfs ntfs.dd 49-144-6 > INDX_ROOT.bin

icat -f ntfs ntfs.dd 49-160-7 > INDX_ALLOCATION.bin

The icat command uses the NTFS module to identify the record (49) attribute (144-6 and 144-7), and outputs the attribute data into the respective files INDX_ROOT.bin and INDX_ALLOCATION.bin.

EnCase

We can use EnCase to extract the INDX allocation data. To use EnCase version 6.x to gather the content of the INDX buffers, in the explorer tree, right click the folder icon. The "Copy/UnErase..." option applied to a directory will copy the content of the INDX buffer as a binary file. Specify a location to save the file. Note that the "Copy Folders..." option will copy the directory and its contents and will NOT extract the INDX structure.

FTK

We can use the Forensic Toolkit (FTK) to extract the INDX allocation data. Using FTK or FTK Imager, the INDX allocation attributes appear in the file list pane. These have the name "$I30" because the stream name is identified as $I30 in the index root and index allocation attributes. To extract the content of an index attribute, in the explorer pane, highlight the folder. In the file list pane, right click the relevant $I30 file and choose the option to "export". This will prompt you for a location to save the binary content.

Mandiant Intelligent Response®

The Mandiant Intelligent Response® (MIR) agent v.2.2 has the ability to extract INDX records natively. To generate a list of INDX buffers in MIR, run a RAW file audit. One of the options in the audit is to "Parse NTFS INDX Buffers". You can run this recursively, or you can target specific directories. We recommend the latter because this option will generate numerous entries when done recursively.

To display a list of parsed INDX buffers, you can filter a file listing in MIR by choosing the "FileAttributes" are "like" "*INDX*". The MIR agent recognizes "INDX" as an attribute because the files listed in the indices may or may not be deleted.

Results

Regardless of which method is used, your binary file should begin with the string "INDX" if you grabbed the correct data stream. You can verify the results quickly in a hex editor. Ensure that the first four bytes of the binary data is the string "INDX".

Conclusion

This example demonstrates three ways to use various tools to extract INDX attribute content. Our next post will detail the internal structures of a file name attribute. A file name attribute will exist for each file tracked in a directory. These structures include the MACb (Modified, Accessed, Changed, and birth) times of a file and can be a valuable timeline source in an investigation.

Incident Response with NTFS INDX Buffers – Part 1: Extracting an INDX Attribute

18 September 2012 at 23:23

By William Ballenthin & Jeff Hamm

On August 30, 2012, we presented a webinar on how to use INDX buffers to assist in an incident response investigation. During the Q&A portion of the webinar we received many questions; however, we were not able to answer all of them. We're going to attempt to answer the remaining questions by posting a four part series on this blog. This series will address:

Part 1: Extracting an INDX Record

An INDX buffer in the NTFS file system tracks the contents of a folder. INDX buffers can be resident in the $MFT (Master File Table) as an index root attribute (attribute type 0x90) or non-resident as an index allocation attribute (attribute 0xA0) (non-resident meaning that the content of the attribute is in the data area on the volume.)

INDX root attributes have a dynamic size in the MFT, so as the contents change, the size of the attributes change. When an INDX root attribute shrinks, the surrounding attributes shift and overwrite any old data. Therefore, it is not possible to recover slack entries from INDX root attributes. On the other hand, the file system driver allocates INDX allocation attributes in multiples of 4096, even though the records may only be 40 bytes. As file system activity adds and removes INDX records from an allocation attribute, old records may still be recoverable in the slack space found between the last valid entry and the end of the 4096 chunk. This is very interesting to a forensic investigator. Fortunately, many forensic tools support extracting the INDX allocation attributes from images of an NTFS file system.

Scenario

Let's say that during your investigation you identified a directory of interest that you want to examine further. In the scenario we used during the webinar, we identified a directory as being of interest because we did a keyword search for "1.rar". The results of the search indicated that the slack space of an INDX attribute contained the suspicious filename "1.rar". The INDX attribute had the $MFT record number 49.

Before we can parse the data, we need to extract the valid index attribute's content. Using various forensic tools, we are capable of this as demonstrated below.

The SleuthKit

We can use the SleuthKit tools to extract both the INDX root and allocation data. To extract the INDX attribute using the SleuthKit, the first step is to identify the $MFT record IDs for the attributes of the inode. We want the content of the index root attribute (attribute type 0x90 or 144d) and the index allocation attribute (attribute type 0xA0 or 160d).

To identify the attribute IDs, run the command:

istat -f ntfs ntfs.dd 49

The istat command returns inode information from the $MFT. In the command we are specifying the NTFS file system with the "-f" switch. The tool reads a raw image named "ntfs.dd" and locates record number 49. The result of our output (truncated) was as follows:

....
Attributes:
Type: $STANDARD_INFORMATION (16-0) Name: Resident size: 72

...

Type: $I30 (144-6) Name: $I30 Resident size: 26

Type: $I30 (160-7) Name: $I30 Non-Resident size: 4096

The information returned for the attribute list includes the index root - $I30 (144-6) - and an index allocation - $I30 (160-7). The attribute identifier is the integer listed after the dash. Therefore, the index root attribute 144 has an identifier of 6, and the index allocation attribute 160 has an identifier of 7.

With this information, we can gather the content of the attributes with the SleuthKit commands:

icat -f ntfs ntfs.dd 49-144-6 > INDX_ROOT.bin

icat -f ntfs ntfs.dd 49-160-7 > INDX_ALLOCATION.bin

The icat command uses the NTFS module to identify the record (49) attribute (144-6 and 144-7), and outputs the attribute data into the respective files INDX_ROOT.bin and INDX_ALLOCATION.bin.

EnCase

We can use EnCase to extract the INDX allocation data. To use EnCase version 6.x to gather the content of the INDX buffers, in the explorer tree, right click the folder icon. The "Copy/UnErase..." option applied to a directory will copy the content of the INDX buffer as a binary file. Specify a location to save the file. Note that the "Copy Folders..." option will copy the directory and its contents and will NOT extract the INDX structure.

FTK

We can use the Forensic Toolkit (FTK) to extract the INDX allocation data. Using FTK or FTK Imager, the INDX allocation attributes appear in the file list pane. These have the name "$I30" because the stream name is identified as $I30 in the index root and index allocation attributes. To extract the content of an index attribute, in the explorer pane, highlight the folder. In the file list pane, right click the relevant $I30 file and choose the option to "export". This will prompt you for a location to save the binary content.

Mandiant Intelligent Response®

The Mandiant Intelligent Response® (MIR) agent v.2.2 has the ability to extract INDX records natively. To generate a list of INDX buffers in MIR, run a RAW file audit. One of the options in the audit is to "Parse NTFS INDX Buffers". You can run this recursively, or you can target specific directories. We recommend the latter because this option will generate numerous entries when done recursively.

To display a list of parsed INDX buffers, you can filter a file listing in MIR by choosing the "FileAttributes" are "like" "*INDX*". The MIR agent recognizes "INDX" as an attribute because the files listed in the indices may or may not be deleted.

Results

Regardless of which method is used, your binary file should begin with the string "INDX" if you grabbed the correct data stream. You can verify the results quickly in a hex editor. Ensure that the first four bytes of the binary data is the string "INDX".

Conclusion

This example demonstrates three ways to use various tools to extract INDX attribute content. Our next post will detail the internal structures of a file name attribute. A file name attribute will exist for each file tracked in a directory. These structures include the MACb (Modified, Accessed, Changed, and birth) times of a file and can be a valuable timeline source in an investigation.

❌