Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

Kernel Reving

31 December 2020 at 05:00
Kernel Stuff So hunting in explorere.exe is all well and good, and I’ve been enjoying it. However, I need to get ready for a course I’m giving on the 31st of January! If you’re not familiar with our HTP green belt course (https://www.hyperiongray.com/htp) we focus heavily on Windows 10 kernel...

Fusssing

28 December 2020 at 05:00
Alright tired today but doing two things. One fuzzing: /* SHSTDAPI SHParseDisplayName( PCWSTR pszName, IBindCtx *pbc, PIDLIST_ABSOLUTE *ppidl, SFGAOF sfgaoIn, SFGAOF *psfgaoOut ); */ #include <shlobj_core.h> #include <shlobj.h> #include <shlwapi.h> #include <iostream> #include <objbase.h> #include <string.h> #include <stdio.h> #include <stdlib.h> int OohBabyIneedSomeFuzz(const uint8_t *Data) { ULONG *ulong; LPCWSTR str2 =...

Moar Fuzz 3 - Electric Tree!!!

17 December 2020 at 05:00
Sorry for the nonsensical title. I’m a little drunk. Anyway, here’s a crash: ==15384==ERROR: AddressSanitizer: attempting to call malloc_usable_size() for pointer which is not owned: 0x0000004df3e0 #0 0x7ff6c5231fd4 in __sanitizer::BufferedStackTrace::UnwindImpl(unsigned __int64, unsigned __int64, void *, bool, unsigned int) C:\src\llvm_package_1100-final\llvm-project\compiler-rt\lib\asan\asan_stack.cpp:77 #1 0x7ff6c524d646 in __asan::asan_malloc_usable_size(void const *, unsigned __int64, unsigned __int64) C:\src\llvm_package_1100-final\llvm-project\compiler-rt\lib\asan\asan_allocator.cpp:986...

Journal

27 March 2020 at 04:00
layout: post title: Browser Fuzzing tags: [hacking] — Well it fucking happened. I stopped writing to this blog for a while. Who saw that coming? Anyway I’m making a comeback. The delay in posts was caused by 🥁 - me being in the fucking hospital. Some highlights: perfortated intestine, lost...

The Rise of Deep Learning for Detection and Classification of Malware

13 August 2021 at 00:50

Co-written by Catherine Huang, Ph.D. and Abhishek Karnik 

Artificial Intelligence (AI) continues to evolve and has made huge progress over the last decade. AI shapes our daily lives. Deep learning is a subset of techniques in AI that extract patterns from data using neural networks. Deep learning has been applied to image segmentation, protein structure, machine translation, speech recognition and robotics. It has outperformed human champions in the game of Go. In recent years, deep learning has been applied to malware analysis. Different types of deep learning algorithms, such as convolutional neural networks (CNN), recurrent neural networks and Feed-Forward networks, have been applied to a variety of use cases in malware analysis using bytes sequence, gray-scale image, structural entropy, API call sequence, HTTP traffic and network behavior.  

Most traditional machine learning malware classification and detection approaches rely on handcrafted features. These features are selected based on experts with domain knowledge. Feature engineering can be a very time-consuming process, and handcrafted features may not generalize well to novel malware. In this blog, we briefly describe how we apply CNN on raw bytes for malware detection and classification in real-world data. 

  1. CNN on Raw Bytes 

Figure 1: CNNs on raw bytes for malware detection and classification

The motivation for applying deep learning is to identify new patterns in raw bytes. The novelty of this work is threefold. First, there is no domain-specific feature extraction and pre-processing. Second, it is an end-to-end deep learning approach. It can also perform end-to-end classification. And it can be a feature extractor for feature augmentation. Third, the explainable AI (XAI) provides insights on the CNN decisions and help human identify interesting patterns across malware families. As shown in Figure 1, the input is only raw bytes and labels. CNN performs representation learning to automatically learn features and classify malware.  

2. Experimental Results 

For the purposes of our experiments with malware detection, we first gathered 833,000 distinct binary samples (Dirty and Clean) across multiple families, compilers and varying “first-seen” time periods. There were large groups of samples from common families although they did utilize varying packers, obfuscators. Sanity checks were performed to discard samples that were corrupt, too large or too small, based on our experiment. From samples that met our sanity check criteria, we extracted raw bytes from these samples and utilized them for conducting multiple experiments. The data was randomly divided into a training and a test set with an 80% / 20% split. We utilized this data set to run the three experiments.  

In our first experiment, raw bytes from the 833,000 samples were fed to the CNN and the performance accuracy in terms of area under receiver operating curve (ROC) was 0.9953.  

One observation with the initial run was that, after raw byte extraction from the 833,000 unique samples, we did find duplicate raw byte entries. This was primarily due to malware families that utilized hash-busting as an approach to polymorphism. Therefore, in our second experiment, we deduplicated the extracted raw byte entries. This reduced the raw byte input vector count to 262,000 samples. The test area under ROC was 0.9920. 

In our third experiment, we attempted multi-family malware classification. We took a subset of 130,000 samples from the original set and labeled 11 categories – the 0th were bucketed as Clean, 1-9 of which were malware families, and the 10th were bucketed as Others. Again, these 11 buckets contain samples with varying packers and compilers. We performed another 80 / 20% random split for the training set and test set. For this experiment, we achieved a test accuracy of 0.9700. The training and test time on one GPU was 26 minutes.  

3. Visual Explanation 

Figure 2: visual explanation using T-SNE and PCA before and after the CNN training
Figure 2: A visual explanation using T-SNE and PCA before and after the CNN training

To understand the CNN training process, we performed a visual analysis for the CNN training. Figure 2 shows the t-Distributed Stochastic Neighbor Embedding (t-SNE) and Principal Component Analysis (PCA) for before and after CNN training. We can see that after training, CNN is able to extract useful representations to capture characteristics of different types of malware as shown in different clusters. There was a good separation for most categories, lending us to believe that the algorithm was useful as a multi-class classifier. 

We then performed XAI to understand CNN’s decisions. Figure 3 shows XAI heatmaps for one sample of Fareit and one sample of Emotet. The brighter the color is the more important the bytes contributing to the gradient activation in neural networks. Thus, those bytes are important to CNN’s decisions. We were interested in understanding the bytes that weighed in heavily on the decision-making and reviewed some samples manually. 

Figure 3: XAI heatmaps on Fareit (left) and Emotet (right)
Figure 3: XAI heatmaps on Fareit (left) and Emotet (right)

4. Human analysis to understand the ML decision and XAI  

Figure 4: Human analysis on CNN’s predictions
Figure 4: Human analysis on CNN’s predictions

To verify if the CNN can learn new patterns, we fed a few never before seen samples to the CNN, and requested a human expert to verify the CNN’s decision on some random samples. The human analysis verified that the CNN was able to correctly identify many malware familiesIn some cases, it identified samples accurately before the top 15 AV vendors based on our internal tests. Figure 4 shows a subset of samples that belong to the Nabucur family that were correctly categorized by the CNN despite having no vendor detection at that point in timeIt’s also interesting to note that our results showed that the CNN was able to currently categorize malware samples across families utilizing common packers into an accurate family bucket. 

Figure 5: domain analysis on sample compiler
Figure 5: domain analysis on sample compiler

We ran domain analysis on the same sample complier VB files. As shown in Figure 5, CNN was able to identify two samples of a threat family before other vendors. CNN agreed with MSMP/other vendors on two samples. In this experiment, the CNN incorrectly identified one sample as Clean.  

Figure 6: Human analysis on an XAI heatmap. Above is the resulting disassembly of part of the decryption tea algorithm from the Hiew tool.
Figure 6: Human analysis on an XAI heatmap. Above is the resulting disassembly of part of the decryption tea algorithm from the Hiew tool.
Above is XAI heatmap for one sample.
Above is XAI heatmap for one sample.

We asked a human expert to inspect an XAI heatmap and verify if those bytes in bright color are associated with the malware family classification. Figure 6 shows one sample which belongs to the Sodinokibi family. The bytes identified by the XAI (c3 8b 4d 08 03 d1 66 c1) are interesting because the byte sequence belongs to part of the Tea decryption algorithm. This indicates these bytes are associated with the malware classification, which confirms the CNN can learn and help identify useful patterns which humans or other automation may have overlooked. Although these experiments were rudimentary, they were indicative of the effectiveness of the CNN in identifying unknown patterns of interest.  

In summary, the experimental results and visual explanations demonstrate that CNN can automatically learn PE raw byte representations. CNN raw byte model can perform end-to-end malware classification. CNN can be a feature extractor for feature augmentation. The CNN raw byte model has the potential to identify threat families before other vendors and identify novel threats. These initial results indicate that CNN’s can be a very useful tool to assist automation and human researcher in analysis and classification. Although we still need to conduct a broader range of experiments, it is encouraging to know that our findings can already be applied for early threat triage, identification, and categorization which can be very useful for threat prioritization.  

We believe that McAfee’s ongoing AI research, such as deep learning-based approaches, leads the security industry to tackle the evolving threat landscape, and we look forward to continuing to share our findings in this space with the security community. 

The post The Rise of Deep Learning for Detection and Classification of Malware appeared first on McAfee Blog.

XLSM Malware with MacroSheets

6 August 2021 at 20:29

Written by: Lakshya Mathur

Excel-based malware has been around for decades and has been in the limelight in recent years. During the second half of 2020, we saw adversaries using Excel 4.0 macros, an old technology, to deliver payloads to their victims. They were mainly using workbook streams via the XLSX file format. In these streams, adversaries were able to enter code straight into cells (that’s why they were called macro-formulas). Excel 4.0 also used API level functions like downloading a file, creation of files, invocation of other processes like PowerShell, cmd, etc.  

With the evolution of technology, AV vendors started to detect these malicious Excel documents effectively and so to have more obfuscation and evasion routines attackers began to shift to the XLSM file format. In the first half of 2021, we have seen a surge of XLSM malware delivering different family payloads (as shown in below infection chart). In XLSM adversaries make use of Macrosheets to enter their malicious code directly into the cell formulas. XLSM structure is the same as XLSX, but XLSM files support VBA macros which are more advanced technology of Excel 4.0 macros. Using these macrosheets, attackers were able to access powerful windows functionalities and since this technique is new and highly obfuscated it can evade many AV detections. 

Excel 4.0 and XLSM are both known to download other malware payloads like ZLoader, Trickbot, Qakbot, Ursnif, IcedID, etc. 

Field hits for XLSM macrosheet malware detection
Field hits for XLSM macrosheet malware detection

The above figure shows the Number of samples weekly detected by the detected name “Downloader-FCEI” which specifically targets XLSM macrosheet based malware. 

Detailed Technical Analysis 

XLSM Structure 

XLSM files are spreadsheet files that support macros. A macro is a set of instructions that performs a record of steps repeatedly. XLSM files are based upon Open XLM formats that were introduced in Microsoft Office 2007. These file types are like XLSX but in addition, they support macros. 

Talking about the XLSM structure when we unzip the file, we see four basic contents of the file, these are shown below. 

Figure-1: Content inside XLSM file
Figure-1: Content inside XLSM file
  • _rels contains the starting package-level relationship. 
  • docProps contains the metadata of the excel file. 
  • xl folder contains the actual contents of the file. 
  • [Content_Types].xml has references to the XML files present within the above folders. 

We will focus more on the “xl” folder contents. This folder contains all the excel file main contents like all the worksheets, media files, styles.xml file, sharedStrings.xml file, workbook.xml file, etc. All these files and folders have data related to different aspects of the excel file. But for XLSM files we will focus on one unique folder called macrosheets. 

These XLSM files contain macrosheets as shown in figure-2 which are nothing but XML sheet files that can support macros. These sheets are not available in other Excel file formats. In the past few months, we have seen a huge surge in XLSM file-type malware in which attackers store malicious strings hidden within these macrosheets. We will see more details about such malware in this blog. 

Figure-2: Macrosheets folder inside xl folder
Figure-2: Macrosheets folder inside xl folder

To explain further how attackers uses XLSM files we have taken a Qakbot sample with SHA 91a1ba70132139c99efd73ca21c4721927a213bcd529c87e908a9fdd71570f1e. 

Infection Chain

Figure-3: Infection chain for Qakbot Malware
Figure-3: Infection chain for Qakbot Malware

The infection chain for both Excel 4.0 Qakbot and XLSM Qakbot is similar. They both downloads dll and execute it using rundll32.exe with DllResgisterServer as the export function. 

XLSM Threat Analysis 

On opening the XLSM file there is an image that prompts the user to enable the content. To look legitimate and clean malicious actors use a very official-looking template as shown below.

Figure-4: Image of Xlsm file face
Figure-4 Image of Xlsm file face

On digging deeper, we see its internal workbook.xml file. 

Figure-5: workbook.xml content
Figure-5: workbook.xml content

Now as we can see in the workbook.xml file (Figure-5), there is a total of 6 sheets and their state is hidden. Also, two cells have a predefined name and one of them is Sheet2323!$A$1 defined as “_xlnm.Auto_Open” which is similar to Sub Auto_Open() as we generally see in macro files. It automatically runs the macros when the user clicks on Enable Content.  

As we saw in Figure-3 on opening the file, we only see the enable content image. Since the state of sheets was hidden, we can right-click on the main sheet tab and we will see unhide option there, then we can select each sheet to unhide it. On hiding the sheet and change the font color to red we saw some random strings as seen in figure 6. 

Figure-6: Sheet face of xlsm file
Figure-6: Sheet face of xlsm file

These hidden sheets contain malicious strings in an obfuscated manner. So, on analyzing more we observed that sheets inside the macrosheets folder contain these malicious strings. 

Figure-7: Content of macrosheet XML file
Figure-7: Content of macrosheet XML file

Now as we can in figure-7 different tags are used in this XML sheet file. All the malicious strings are present in two tags <f> and <v> tags inside <sheetdata> tags. Now let’s look more in detail about these tags. 

<v> (Cell Value) tags are used to store values inside the cell. <f> (Cell Formula) tags are used to store formulas inside the cell. Now in the above sheet <v> tags contain the cached formula value based on the last time formula was calculated. Formula cells contain formulas like “GOTO(Sheet2!H13)”, now as we can see here attackers can store different formulas while referencing cells from different sheets. These operations are done to produce more and more obfuscated sheets and evade AV signatures. 

When the user clicks on the enable content button the execution starts from the Auto_Open cell, after which each sheet formula will start to execute one by one. The final deobfuscated string is shown below. 

Figure-8: Final De-Obfuscated strings from the file
Figure-8: Final De-Obfuscated strings from the file

Here the URLDownloadToFIleA API is used to download the payload and the string “JJCCBB” is used to specify data types to call the API. There are multiple URI’s and from one of them, the DLL payload gets downloaded and saved as ..\\lertio.cersw. This DLL payload is then executed using rundll32. All these malicious activities get carried out using various excel based formulas like REGISTER, EXEC, etc. 

Coverage and prevention guidance: 

McAfee’s Endpoint products detect this variant of malware as below: 

The main malicious document with SHA256 (91a1ba70132139c99efd73ca21c4721927a213bcd529c87e908a9fdd71570f1e) is detected as “Downloader-FCEI” with current DAT files. 

Additionally, with the help of McAfee’s Expert rule feature, customers can add a custom behavior rule, specific to this infection pattern. 

Rule { 

    Process { 

        Include OBJECT_NAME { -v “EXCEL.exe” } 

    } 

Target { 

        Match PROCESS { 

            Include OBJECT_NAME { -v “rundll32.exe” } 

                      Include PROCESS_CMD_LINE { -v “* ..\\*.*,DllRegisterServer” }  

                            Include -access “CREATE” 

         } 

  } 

} 

McAfee advises all users to avoid opening any email attachments or clicking any links present in the mail without verifying the identity of the sender. Always disable the Macro execution for Office files. We advise everyone to read our blog on these types of malicious XLSM files and their obfuscation techniques to understand more about the threat. 

Different techniques & tactics are used by the malware to propagate, and we mapped these with the MITRE ATT&CK platform. 

  • T1064(Scripting): Use of Excel 4.0 macros and different excel formulas to download the malicious payload. 
  • Defense Evasion (T1218.011): Execution of Signed binary to abuse Rundll32.exe and proxy executes the malicious code is observed in this Qakbot variant.  
  • Defense Evasion (T1562.001): Office file tries to convince a victim to disable security features by using a clean-looking image. 
  • Command and Control(T1071): Use of Application Layer Protocol HTTP to connect to the web and then downloads the malicious payload. 

Conclusion 

XLSM malware has been seen delivering many malware families. Many major families like Trickbot, Gozi, IcedID, Qakbot are using these XLSM macrosheets in high quantity to deliver their payloads. These attacks are still evolving and keep on using various obfuscated strings to exploit various windows utilities like rundll32, regsvr32, PowerShell, etc. 

Due to security concerns, macros are disabled by default in Microsoft Office applications. We suggest it is only safe to enable them when the document received is from a trusted source and macros serve an expected purpose. 

The post XLSM Malware with MacroSheets appeared first on McAfee Blog.

Analysis of a Heap Buffer-Overflow Vulnerability in Microsoft Windows Address Book

5 August 2021 at 12:26

By Eneko Cruz Elejalde

Overview

This post analyzes a heap-buffer overflow in Microsoft Windows Address Book. Microsoft released an advisory for this vulnerability for the 2021 February patch Tuesday. This post will go into detail about what Microsoft Windows Address Book is, the vulnerability itself, and the steps to craft a proof-of-concept exploit that crashes the vulnerable application.

Windows Address Book

Windows Address Book is a part of the Microsoft Windows operating system and is a service that provides users with a centralized list of contacts that can be accessed and modified by both Microsoft and third party applications. The Windows Address Book maintains a local database and interface for finding and editing information about contacts, and can query network directory servers using Lightweight Directory Access Protocol (LDAP). The Windows Address Book was introduced in 1996 and was later replaced by Windows Contacts in Windows Vista and subsequently by the People App in Windows 10.

The Windows Address Book provides an API that enables other applications to directly use its database and user interface services to enable services to access and modify contact information. While Microsoft has replaced the application providing the Address Book functionality, newer replacements make use of old functionality and ensure backwards compatibility. The Windows Address Book functionality is present in several Windows Libraries that are used by Windows 10 applications, including Outlook and Windows Mail. In this way, modern applications make use of the Windows Address Book and can even import address books from older versions of Windows.

CVE-2021-24083

A heap-buffer overflow vulnerability exists within the SecurityCheckPropArrayBuffer() function within wab32.dll when processing nested properties of a contact. The network-based attack vector involves enticing a user to open a crafted .wab file containing a malicious composite property in a WAB record.

Vulnerability

The vulnerability analysis that follows is based on Windows Address Book Contacts DLL (wab32.dll) version 10.0.19041.388 running on Windows 10 x64.

The Windows Address Book Contacts DLL (i.e. wab32.dll) provides access to the Address Book API and it is used by multiple applications to interact with the Windows Address Book. The Contacts DLL handles operations related to contact and identity management. Among others, the Contacts DLL is able to import an address book (i.e, a WAB file) exported from an earlier version of the Windows Address Book.

Earlier versions of the Windows Address Book maintained a database of identities and contacts in the form of a .wab file. While current versions of Windows do not use a .wab file by default anymore, they allow importing a WAB file from an earlier installation of the Windows Address Book.

There are multiple ways of importing a WAB file into the Windows Address Book, but it was observed that applications rely on the Windows Contacts Import Tool (i.e, C:\Program Files\Windows Mail\wabmig.exe) to import an address book. The Import Tool loads wab32.dll to handle loading a WAB file, extracting relevant contacts, and importing them into the Windows Address Book.

WAB File Format

The WAB file format (commonly known as Windows Address Book or Outlook Address Book) is an undocumented and proprietary file format that contains personal identities. Identities may in turn contain contacts, and each contact might contain one or more properties.

Although the format is undocumented, the file-format has been partially reverse-engineered by a third party. The following structures were obtained from a combination of a publicly available third-party application and the disassembly of wab32.dll. Consequently, there may be inaccuracies in structure definitions, field names, and field types.

The WAB file has the following structure:

Offset      Length (bytes)    Field                   Description
---------   --------------    --------------------    -------------------
0x0         16                Magic Number            Sixteen magic bytes
0x10        4                 Count 1                 Unknown Integer
0x14        4                 Count 2                 Unknown Integer
0x18        16                Table Descriptor 1      Table descriptor
0x28        16                Table Descriptor 2      Table descriptor
0x38        16                Table Descriptor 3      Table descriptor
0x48        16                Table Descriptor 4      Table descriptor
0x58        16                Table Descriptor 5      Table descriptor
0x68        16                Table Descriptor 6      Table descriptor

All multi-byte fields are represented in little-endian byte order unless otherwise specified. All string fields are in Unicode, encoded in the UTF16-LE format.

The Magic Number field contains the following sixteen bytes: 9c cb cb 8d 13 75 d2 11 91 58 00 c0 4f 79 56 a4. While some sources list the sequence of bytes 81 32 84 C1 85 05 D0 11 B2 90 00 AA 00 3C F6 76 as a valid magic number for a WAB file, it was found experimentally that replacing the sequence of bytes prevents the Windows Address Book from processing the file.

Each of the six Table Descriptor fields numbered 1 through 6 has the following structure:

Offset    Length    Field    Description
          (bytes)
-------   --------  -------  -------------------
0x0       4         Type     Type of table descriptor
0x4       4         Size     Size of the record described
0x8       4         Offset   Offset of the record described relative to the beginning of file
0xC       4         Count    Number of records present at offset

The following are examples of some known types of table descriptor:

  • Text Record (Type: 0x84d0): A record containing a Unicode string.
  • Index Record (Type: 0xFA0): A record that may contain several descriptors to WAB records.

Each text record has the following structure:

Offset   Length (bytes)    Field          Description
------   --------------    ------------   -------------------
0x0      N                 Content        Text content of the record; a null terminated UNICODE string
0x0+N    0x4               RecordId       A record identifier for the text record 

Similarly, each index record has the following structure

Offset      Length (bytes)    Field        Description
---------   --------------    ----------   -------------------
0x0         4                 RecordId     A record identifier for the index record
0x4         4                 Offset       Offset of the record relative to the beginning of the file

Each entry in the index record (i.e, each index record structure in succession) has an offset that points to a WAB record.

WAB Records

A WAB record is used to describe a contact. It contains fields such as email addresses and phone numbers stored in properties, which may be of various types such as string, integer, GUID, and timestamp. Each WAB record has the following structure:

Offset      Length   Field              Description
---------   ------   ---------------    -------------------
0x0         4        Unknown1           Unknown field
0x4         4        Unknown2           Unknown field
0x8         4        RecordId           A record identifier for the WAB record
0xC         4        PropertyCount      The number of properties contained in RecordProperties
0x10        4        Unknown3           Unknown field 
0x14        4        Unknown4           Unknown field 
0x18        4        Unknown5           Unknown field 
0x1C        4        DataLen            The length of the RecordProperties field (M)
0x20        M        RecordProperties   Succession of subproperties belonging to the WAB record

The following fields are relevant:

  • The RecordProperties field is a succession of record property structures.
  • The PropertyCount field indicates the number of properties within the RecordProperties field.

Record properties can be either simple or composite.

Simple Properties

Simple properties have the following structure:

Offset      Length (bytes)    Field       Description
---------   --------------    ---------   -------------------
0x0         0x2               Tag         A property tag describing the type of the contents
0x2         0x2               Unknown     Unknown field
0x4         0x4               Size        Size in bytes of Value member (X)
0x8         X                 Value       Property value or content

Tags of simple properties are smaller than 0x1000, and include the following:

Tag Name        Tag Value    Length      Description
                             (bytes)
---------       -----------  ---------   -------------------
PtypInteger16   0x00000002   2           A 16-bit integer
PtypInteger32   0x00000003   4           A 32-bit integer
PtypFloating32  0x00000004   4           A 32-bit floating point number
PtypFloating64  0x00000005   8           A 64-bit floating point number
PtypBoolean     0x0000000B   2           Boolean, restricted to 1 or 0
PtypString8     0x0000001E   Variable    A string of multibyte characters in externally specified
                                         encoding with terminating null character (single 0 byte)
PtypBinary      0x00000102   Variable    A COUNT field followed by that many bytes
PtypString      0x0000001F   Variable    A string of Unicode characters in UTF-16LE format encoding
                                         with terminating null character (0x0000).
PtypGuid        0x00000048   16          A GUID with Data1, Data2, and Data3 filds in little-endian
PtypTime        0x00000040   8           A 64-bit integer representing the number of 100-nanosecond
                                         intervals since January 1, 1601
PtypErrorCode   0x0000000A   4           A 32-bit integer encoding error information

Note the following:

  • The aforementioned list is not exhaustive. For more property tag definitions, see this.
  • The value of PtypBinary is prefixed by a COUNT field, which counts 16-bit words.
  • In addition to the above, the following properties also exist; their usage in WAB is unknown.
    • PtypEmbeddedTable (0x0000000D): The property value is a Component Object Model (COM) object.
    • PtypNull (0x00000001): None: This property is a placeholder.
    • PtypUnspecified (0x00000000): Any: this property type value matches any type;

Composite Properties

Composite properties have the following structure:

Offset  Length     Field             Description
        (bytes)
------  ---------  ----------------- -------------------
0x0     0x2        Tag               A property tag describing the type of the contents
0x2     0x2        Unknown           Unknown field
0x4     0x4        NestedPropCount   Number of nested properties contained in the current WAB property
0x8     0x4        Size              Size in bytes of Value member (X)
0xC     X          Value             Property value or content

Tags of composite properties are greater than or equal to 0x1000, and include the following:

Tag Name                Tag Value
---------               ----------
PtypMultipleInteger16   0x00001002
PtypMultipleInteger32   0x00001003
PtypMultipleString8     0x0000101E
PtypMultipleBinary      0x00001102
PtypMultipleString      0x0000101F
PtypMultipleGuid        0x00001048
PtypMultipleTime        0x00001040


The Value field of each composite property contains NestedPropCount number of Simple properties of the corresponding type.

In case of fixed-sized properties (PtypMultipleInteger16, PtypMultipleInteger32, PtypMultipleGuid, and PtypMultipleTime), the Value field of a composite property contains NestedPropCount number of the Value field of the corresponding Simple property.

For example, in a PtypMultipleInteger32 structure with NestedPropCount of 4:

  • The Size is always 16.
  • The Value contains four 32-bit integers.

In case of variable-sized properties (PtypMultipleString8, PtypMultipleBinary, and PtypMultipleString), the Value field of the composite property contains NestedPropCount number of Size and Value fields of the corresponding Simple property.

For example, in a PtypMultipleString structure with NestedPropCount of 2 containing the strings “foo” and “bar” in Unicode:

  • The Size is 14 00 00 00.
  • The Value field contains a concatenation of the following two byte-strings:
    • “foo” encoded with a four-byte length: 06 00 00 00 66 00 6f 00 6f 00.
    • “bar” encoded with a four-byte length: 06 00 00 00 62 00 61 00 72 00.

Technical Details

The vulnerability in question occurs when a malformed Windows Address Book in the form of a WAB file is imported. When a user attempts to import a WAB file into the Windows Address Book, the method WABObjectInternal::Import() is called, which in turn calls ImportWABFile(). For each contact inside the WAB file, ImportWABFile() performs the following nested calls: ImportContact(), CWABStorage::ReadRecord(), ReadRecordWithoutLocking(), and finally HrGetPropArrayFromFileRecord(). This latter function receives a pointer to a file as an argument and reads the contact header and extracts PropertyCount and DataLen. The function HrGetPropArrayFromFileRecord() in turn calls SecurityCheckPropArrayBuffer() to perform security checks upon the imported file and HrGetPropArrayFromBuffer() to read the contact properties into a property array.

The function HrGetPropArrayFromBuffer() relies heavily on the correctness of the checks performed by SecurityCheckPropArrayBuffer(). However, the function fails to implement security checks upon certain property types. Specifically, SecurityCheckPropArrayBuffer() may skip checking the contents of nested properties where the property tag is unknown, while the function HrGetPropArrayFromBuffer() continues to process all nested properties regardless of the security check. As a result, it is possible to trick the function HrGetPropArrayFromBuffer() into parsing an unchecked contact property. As a result of parsing such a property, the function HrGetPropArrayFromBuffer() can be tricked into overflowing a heap buffer.

Code Analysis

The following code blocks show the affected parts of methods relevant to this vulnerability. Code snippets are demarcated by reference markers denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker.

The following is the pseudocode of the function HrGetPropArrayFromFileRecord:

[1]

if ( !(unsigned int)SecurityCheckPropArrayBuffer(wab_buffer_full, HIDWORD(uBytes[1]), wab_buffer[3]) )
  {

[2]
    result = 0x8004011b;        // Error
    goto LABEL_25;              // Return prematurely
  }

[3]
  result = HrGetPropArrayFromBuffer(wab_buffer_full, HIDWORD(uBytes[1]), wab_buffer[3], 0, a7);

At [1] the function SecurityCheckPropArrayBuffer() is called to perform a series of security checks upon the buffer received and the properties contained within. If the check is positive, then the input is trusted and processed by calling HrGetPropArrayFromBuffer() at [3]. Otherwise, the function returns with an error at [2].

The following is the pseudocode of the function SecurityCheckPropArrayBuffer():

    __int64 __fastcall SecurityCheckPropArrayBuffer(unsigned __int8 *buffer_ptr, unsigned int buffer_length, int header_dword_3)
    {
      unsigned int security_check_result; // ebx
      unsigned int remaining_buffer_bytes; // edi
      int l_header_dword_3; // er15
      unsigned __int8 *ptr_to_buffer; // r9
      int current_property_tag; // ecx
      __int64 c_dword_2; // r8
      unsigned int v9; // edi
      int VA; // ecx
      int VB; // ecx
      int VC; // ecx
      int VD; // ecx
      int VE; // ecx
      int VF; // ecx
      int VG; // ecx
      int VH; // ecx
      signed __int64 res; // rax
      _DWORD *ptr_to_dword_1; // rbp
      unsigned __int8 *ptr_to_dword_0; // r14
      unsigned int dword_2; // eax
      unsigned int v22; // edi
      int v23; // esi
      int v24; // ecx
      unsigned __int8 *c_ptr_to_property_value; // [rsp+60h] [rbp+8h]
      unsigned int v27; // [rsp+68h] [rbp+10h]
      unsigned int copy_dword_2; // [rsp+70h] [rbp+18h]

      security_check_result = 0;
      remaining_buffer_bytes = buffer_length;
      l_header_dword_3 = header_dword_3;
      ptr_to_buffer = buffer_ptr;
      if ( header_dword_3 )                      
      {
        while ( remaining_buffer_bytes > 4 )        
        {

[4]

          if ( *(_DWORD *)ptr_to_buffer & 0x1000 )  
          {

[5]

            current_property_tag = *(unsigned __int16 *)ptr_to_buffer;
            if ( current_property_tag == 0x1102 ||                    
                 (unsigned int)(current_property_tag - 0x101E) <= 1 ) 
            {                         

[6]
                                      
              ptr_to_dword_1 = ptr_to_buffer + 4;                     
              ptr_to_dword_0 = ptr_to_buffer;
              if ( remaining_buffer_bytes < 0xC )                     
                return security_check_result;                         
              dword_2 = *((_DWORD *)ptr_to_buffer + 2);
              v22 = remaining_buffer_bytes - 0xC;
              if ( dword_2 > v22 )                                     
                return security_check_result;                         
              ptr_to_buffer += 12;
              copy_dword_2 = dword_2;
              remaining_buffer_bytes = v22 - dword_2;
              c_ptr_to_property_value = ptr_to_buffer;                
              v23 = 0;                                                
              if ( *ptr_to_dword_1 > 0u )
              {
                while ( (unsigned int)SecurityCheckSingleValue(
                                        *(_DWORD *)ptr_to_dword_0,
                                        &c_ptr_to_property_value,
                                        ©_dword_2) )
                {
                  if ( (unsigned int)++v23 >= *ptr_to_dword_1 )       
                  {                                                   
                    ptr_to_buffer = c_ptr_to_property_value;
                    goto LABEL_33;
                  }
                }
                return security_check_result;
              }
            }
            else                                                         
            {
             
[7]

              if ( remaining_buffer_bytes < 0xC )
                return security_check_result;
              c_dword_2 = *((unsigned int *)ptr_to_buffer + 2);       
              v9 = remaining_buffer_bytes - 12;
              if ( (unsigned int)c_dword_2 > v9 )                     
                return security_check_result;                         
              remaining_buffer_bytes = v9 - c_dword_2;
              VA = current_property_tag - 0x1002;                     
              if ( VA )
              {
                VB = VA - 1;
                if ( VB && (VC = VB - 1) != 0 )
                {
                  VD = VC - 1;
                  if ( VD && (VE = VD - 1) != 0 && (VF = VE - 1) != 0 && (VG = VF - 13) != 0 && (VH = VG - 44) != 0 )
                    res = VH == 8 ? 16i64 : 0i64;
                  else
                    res = 8i64;
                }
                else
                {
                  res = 4i64;
                }
              }
              else
              {
                res = 2i64;
              }
              if ( (unsigned int)c_dword_2 / *((_DWORD *)ptr_to_buffer + 1) != res ) 
                return security_check_result;                                        
                                                                                     

              ptr_to_buffer += c_dword_2 + 12;
            }
          }
          else                                     
          {                                        

[8]

            if ( remaining_buffer_bytes < 4 )       
              return security_check_result;
            v24 = *(_DWORD *)ptr_to_buffer;         
            c_ptr_to_property_value = ptr_to_buffer + 4;// new exe: v13 = buffer_ptr + 4;
            v27 = remaining_buffer_bytes - 4;       
            if ( !(unsigned int)SecurityCheckSingleValue(v24, &c_ptr_to_property_value, &v27) )
              return security_check_result;
            remaining_buffer_bytes = v27;
            ptr_to_buffer = c_ptr_to_property_value;
          }
    LABEL_33:
          if ( !--l_header_dword_3 )
            break;
        }
      }
      if ( !l_header_dword_3 )
        security_check_result = 1;
      return security_check_result;
    }

At [4] the tag of the property being processed is checked. The checks performed depend on whether the property processed in each iteration is a simple or a composite property. For simple properties (i.e, properties with tag lower than 0x1000), execution continues at [8]. The following checks are done for simple properties:

  1. If the remaining number of bytes in the buffer is fewer than 4, the function returns with an error.
  2. A pointer to the property value is obtained and SecurityCheckSingleValue() is called to perform a security check upon the simple property and its value. SecurityCheckSingleValue() performs a security check and increments the pointer to point at the next property in the buffer, so that SecurityCheckPropArrayBuffer() can check the next property on the next iteration.
  3. The number of total properties is decremented and compared to zero. If equal to zero, then the function returns successfully. If different, the next iteration of the loop checks the next property.

Similarly, for composite properties (i.e, properties with tag equal or higher than 0x1000) execution continues at [5] and the following is done.

For variable length composite properties (if the property tag is equal to 0x1102 (PtypMultipleBinary) or equal or smaller than 0x101f (PtypMultipleString)), the code at [6] does the following:

  1. The number of bytes left to read in the buffer is compared with 0xC to avoid overrunning the buffer.
  2. The Size field of the property is compared to the remaining buffer length to avoid overrunning the buffer.
  3. For each nested property, the function SecurityCheckSingleValue() is called. It:
    1. Performs a security check on the nested property.
    2. Advances the pointer to the buffer held by the caller, in order to point to the next nested property.
  4. The loop runs until the number of total properties in the contact (decremented in each iteration) is zero.

For fixed-length composite properties (if the property tag in question is different from 0x1102 (PtypMultipleBinary) and larger than 0x101f (PtypMultipleString)), the following happens starting at [7]:

  1. The number of bytes left to read in the buffer is compared with 0xC to avoid overrunning the buffer.
  2. The Size is compared to the remaining buffer length to avoid overrunning the buffer.
  3. The size of each nested property, which depends only on the property tag, is calculated from the parent property tag.
  4. The Size is divided by NestedPropCount to obtain the size of each nested property.
  5. The function returns with an error if the calculated subproperty size is different from the property size deduced from parent property tag.
  6. The buffer pointer is incremented by the size of the parent property value to point to the next property.

Unknown or non-processable property types are assigned the nested property size 0x0.

It was observed that if the calculated property size is zero, the buffer pointer is advanced by the size of the property value, as described by the header. The buffer is advanced regardless of the property size and by advancing the buffer, the security check permits the value of the parent property (which may include subproperties) to stay unchecked. For the security check to pass the result of the division performed on Step 4 for fixed-length composite properties must be zero. Therefore for an unknown or non-processable property to pass the security check, NestedPropCount must be larger than Size. Note that since the size of any property in bytes is at least two, NestedPropCount must always be no larger than half of Size, and therefore, the aforementioned division must never be zero in benign cases.

After the checks have concluded, the function returns zero for a failed check and one for a passed check.

Subsequently, the function HrGetPropArrayFromFileRecord() calls HrGetPropArrayFromBuffer(), which aims to collect the properties into an array of _SPropValue structs and return it to the caller. The _SPropValue array has a length equal of the number of properties (as given by the contact header) and is allocated in the heap through a call to LocalAlloc(). The number of properties is multiplied by sizeof(_SPropValue) to yield the total buffer size. The following fragment shows the allocation taking place:

    if ( !property_array_r )
    {
        ret = -2147024809;
        goto LABEL_71;
    }
    *property_array_r = 0i64;
    header_dword_3_1 = set_to_zero + header_dword_3;

[9]

    if ( (unsigned int)header_dword_3_1 < header_dword_3       
      || (unsigned int)header_dword_3_1 > 0xAAAAAAA            
      || (v10 = (unsigned int)header_dword_3_1,                               
          property_array = (struct _SPropValue *)LocalAlloc(
                                                   0x40u,
                                                   0x18 * header_dword_3_1),
                                                   // sizeof(_SPropValue) * n_properties_in_binary
        (*property_array_r = property_array) == 0i64) )
    {
        ERROR_INSUFICIENT_MEMORY:
        ret = 0x8007000E;
        goto LABEL_71;
    }

An allocation of sizeof(_SPropValue) * n_properties_in_binary can be observed at [9]. Immediately after, each of the property structures are initialized and their property tag member is set to 1. After initialization, the buffer, on which security checks have already been performed, is processed property by property, advancing the property a pointer to the next property with the property header and value sizes provided by the property in question.

If the property processed by the specific loop iteration is a simple property, the following code is executed:

    if ( !_bittest((const signed int *)¤t_property_tag, 0xCu) )
    {
      if ( v16 < 4 )
        break;
      dword_1 = wab_ulong_buffer_full[1];
      ptr_to_dword_2 = (char *)(wab_ulong_buffer_full + 2);
      v38 = v16 - 4;
      if ( (unsigned int)dword_1 > v38 )
        break;
      current_property_tag = (unsigned __int16)current_property_tag;
      if ( (unsigned __int16)current_property_tag > 0xBu )
      {

[10]

        v39 = current_property_tag - 0x1E;
        if ( !v39 )
          goto LABEL_79;
        v40 = v39 - 1;
        if ( !v40 )
          goto LABEL_79;
        v41 = v40 - 0x21;
        if ( !v41 )
          goto LABEL_56;
        v42 = v41 - 8;
        if ( v42 )
        {
          if ( v42 != 0xBA )
            goto LABEL_56;
          v43 = dword_1;
          (*property_array_r)[p_idx].Value.bin.lpb = (LPBYTE)LocalAlloc(0x40u, dword_1);
          if ( !(*property_array_r)[p_idx].Value.bin.lpb )
            goto ERROR_INSUFICIENT_MEMORY;
          (*property_array_r)[p_idx].Value.l = dword_1;
          v44 = *(&(*property_array_r)[p_idx].Value.at + 1);
        }
        else
        {
    LABEL_79:

[11]
                                           
          v43 = dword_1;
          (*property_array_r)[p_idx].Value.cur.int64 = (LONGLONG)LocalAlloc(0x40u, dword_1);
          v44 = (*property_array_r)[p_idx].Value.dbl;
          if ( v44 == 0.0 )
            goto ERROR_INSUFICIENT_MEMORY;
        }
        memcpy_0(*(void **)&v44, ptr_to_dword_2, v43);
        wab_ulong_buffer_full = (ULONG *)&ptr_to_dword_2[v43];
      }
      else
      {
    LABEL_56:               

[12]
           
        memcpy_0(&(*property_array_r)[v15].Value, ptr_to_dword_2, dword_1);
        wab_ulong_buffer_full = (ULONG *)&ptr_to_dword_2[dword_1];
      }
      remaining_bytes_to_process = v38 - dword_1;
      goto NEXT_PROPERTY;
    }

[Truncated]

    NEXT_PROPERTY:
        ++p_idx;
        processed_property_count = (unsigned int)(processed_property_count_1 + 1);
        processed_property_count_1 = processed_property_count;
        if ( (unsigned int)processed_property_count >= c_header_dword_3 )
          return 0;
      }

At [10] the property tag is extracted and compared with several constants. If the property tag is 0x1e (PtypString8), 0x1f (PtypString), or 0x48 (PtypGuid), then execution continues at [11]. If the property tag is 0x40 (PtypTime) or is not recognized, execution continues at [12]. The memcpy call at [12] is prone to a heap overflow.

Conversely, if the property being processed in the specific loop iteration is not a simple property, the following code is executed. Notably, when the following code is executed, the pointer DWORD* wab_ulong_buffer_full points to the property tag of the property being processed. Regardless of which composite property is being processed, before the property tag is identified the buffer is advanced to point to the beginning of the property value, which is at the 4th 32-bit integer.

[13]

    if ( v16 < 4 )
    break;
    c_dword_1 = wab_ulong_buffer_full[1];
    v19 = v16 - 4;
    if ( v19 < 4 )
    break;
    dword_2 = wab_ulong_buffer_full[2];
    wab_ulong_buffer_full += 3;                     

    remaining_bytes_to_process = v19 - 4;

[14]

    if ( (unsigned __int16)current_property_tag >= 0x1002u )
    {
    if ( (unsigned __int16)current_property_tag <= 0x1007u || (unsigned __int16)current_property_tag == 0x1014 )
        goto LABEL_80;
    if ( (unsigned __int16)current_property_tag == 0x101E )
    {
        [Truncated]
        
    }
    if ( (unsigned __int16)current_property_tag == 0x101F )
    {
        [Truncated]        
    }
    if ( ((unsigned __int16)current_property_tag - 0x1040) & 0xFFFFFFF7 )
    {
        if ( (unsigned __int16)current_property_tag == 0x1102 )
        {
        [Truncated]
        }
    }
    else
    {
    LABEL_80:

[15]

        (*property_array_r)[p_idx].Value.bin.lpb = (LPBYTE)LocalAlloc(0x40u, dword_2);
        if ( !(*property_array_r)[p_idx].Value.bin.lpb )
        goto ERROR_INSUFICIENT_MEMORY;
        (*property_array_r)[p_idx].Value.l = c_dword_1;
        if ( (unsigned int)dword_2 > remaining_bytes_to_process )
        break;
        memcpy_0((*property_array_r)[p_idx].Value.bin.lpb, wab_ulong_buffer_full, dword_2);
        wab_ulong_buffer_full = (ULONG *)((char *)wab_ulong_buffer_full + dword_2);
        remaining_bytes_to_process -= dword_2;
    }
    }

    NEXT_PROPERTY:
        ++p_idx;
        processed_property_count = (unsigned int)(processed_property_count_1 + 1);
        processed_property_count_1 = processed_property_count;
        if ( (unsigned int)processed_property_count >= c_header_dword_3 )
        return 0;
    }

After the buffer has been advanced at [13], the property tag is compared with several constants starting at [14]. Finally, the code fragment at [15] attempts to process a composite property (i.e. >= 0x1000) with a tag not contemplated by the previous constants.

Although the processing logic of each type of property is irrelevant, an interesting fact is that if the property tag is not recognized, the buffer pointer has still been advanced to the end of the end of its header, and it’s never retracted. This happens if all of the following conditions are met:

  • The property tag is larger or equal than 0x1002.
  • The property tag is larger than 0x1007.
  • The property tag is different from 0x1014.
  • The property tag is different from 0x101e.
  • The property tag is different from 0x101f.
  • The property tag is different from 0x1102.
  • The result of subtracting 0x1040 from the property tag, and performing a bitwise AND of the result with 0xFFFFFFF7 is nonzero.

Interestingly, if all of the above conditions are met, the property header of the composite property is skipped, and the next loop iteration will interpret its property body as a different property.

Therefore, it is possible to overflow the _SPropValue array allocated in the heap by HrGetPropArrayFromBuffer() by using the following observations:

  • A specially crafted composite unknown or non-processable property can be made to bypass security checks if NestedPropCount is larger than Size.
  • HrGetPropArrayFromBuffer() can be made to interpret the Value of a specially crafted property as a separate property.

Proof-of-Concept

In order to create a malicious WAB file from a benign WAB file, export a valid WAB file from an instance of the Windows Address Book. It is noted that Outlook Express on Windows XP includes the ability to export contacts as a WAB file.

The benign WAB file can be modified to make it malicious by altering a contact inside it to have the following characteristics:

  • A nested property containing the following:
  • A tag of an unknown or unprocessable type, for example the tag 0x1058, with the following conditions:
    • Must be larger or equal than 0x1002.
    • Must be larger than 0x1007.
    • Must be different from 0x1014, 0x101e, 0x101f, and 0x1102.
    • The result of subtracting 0x1040 from the property tag, and performing a bitwise AND of the result with 0xFFFFFFF7 is non-zero.
    • Must be different from 0x1002, 0x1003, 0x1004, 0x1005, 0x1006, 0x1007, 0x1014, 0x1040, and 0x1048.
    • NestedPropCount is larger than Size.
    • The Value of the composite property is empty.
    • A malicious simple property containing the following:
    • A property tag different from 0x1e, 0x1f, 0x40 and 0x48. For example, the tag 0x0.
    • The Size value is larger than 0x18 x NestedPropCount in order to overflow the _SPropValue array buffer.
    • An unspecified number of trailing bytes, that will overflow the _SPropValue array buffer.

Finally, when an attacker tricks an unsuspecting user into importing the specially crafted WAB file, the vulnerability is triggered and code execution could be achieved. Failed exploitation attempts will most likely result in a crash of the Windows Address Book Import Tool.

Due to the presence of ASLR and a lack of a scripting engine, we were unable to obtain arbitrary code execution in Windows 10 from this vulnerability.

Conclusion

Hopefully you enjoyed this dive into CVE-2021-24083, and if you did, go ahead and check out our other blog post on a use-after-free vulnerability in Adobe Acrobat Reader DC. If you haven’t already, make sure to follow us on Twitter to keep up to date with our work. Happy hacking!

The post Analysis of a Heap Buffer-Overflow Vulnerability in Microsoft Windows Address Book appeared first on Exodus Intelligence.

New Class: COM Programming

1 August 2021 at 15:54

Today I’m announcing a new training – COM (Component Object Model) Programming, to be held in November.

COM is a well established technology, debuted back in 1993, and is still very much in use. In fact, the Windows Runtime is based on an enhanced version of COM. There is quite a bit of confusion and misconceptions about COM, which is one reason I decided to offer this class.

The syllabus for the 3 day class can be found here. This is the first time I will be offering this class, and also will try a new format: 6 half-days instead of 3 full days.

When: November: 8, 9, 10, 11, 15, 16. 2pm to 6pm, UT. (Technically it’s more than 3 days, as with a full day there is a lunch break not present in half days). The class will be conducted remotely using Microsoft Teams or a similar platform.

What you need to know before the class: You should be comfortable using Windows on a Power User level. Concepts such as processes, threads, DLLs, and virtual memory should be understood fairly well. You should have experience writing code in C and some C++. You don’t have to be an expert, but you must know C and basic C++ to get the most out of the class. In case you have doubts, talk to me.

Obviously, participants in my Windows Internals and (especially) Windows System Programming classes have the required knowledge for the class.

We’ll start by looking at why COM was created in the first place, and then build clients and servers, digging into various mechanisms COM provides. See the syllabus for more details.

Registration will be different than previous classes:
Early bird (paid by September 30th): 500 USD (if paid by an individual), 1100 USD (if paid by a company).
Normal (paid after September 30th): 700 USD (if paid by an individual), 1300 USD (if paid by a company).

To register, send an email to [email protected] with the title “COM Training”, and write the name(s), email(s) and time zone(s) of the participants. Multiple participants from the same company get a discount. Previous participants (individuals) of my classes get 10% off. I will reply with further instructions.

I hope to see you in November!

COMReuse

zodiacon

Babuk: Biting off More than they Could Chew by Aiming to Encrypt VM and *nix Systems?

29 July 2021 at 04:01

Co-written with Northwave’s Noël Keijzer.

Executive Summary

For a long time, ransomware gangs were mostly focused on Microsoft Windows operating systems. Yes, we observed the occasional dedicated Unix or Linux based ransomware, but cross-platform ransomware was not happening yet. However, cybercriminals never sleep and in recent months we noticed that several ransomware gangs were experimenting with writing their binaries in the cross-platform language Golang (Go).

Our worst fears were confirmed when Babuk announced on an underground forum that it was developing a cross-platform binary aimed at Linux/UNIX and ESXi or VMware systems. Many core backend systems in companies are running on these *nix operating systems or, in the case of virtualization, think about the ESXi hosting several servers or the virtual desktop environment.

We touched upon this briefly in our previous blog, together with the many coding mistakes the Babuk team is making.

Even though Babuk is relatively new to the scene, its affiliates have been aggressively infecting high-profile victims, despite numerous problems with the binary which led to a situation in which files could not be retrieved, even if payment was made.

Ultimately, the difficulties faced by the Babuk developers in creating ESXi ransomware may have led to a change in business model, from encryption to data theft and extortion.

Indeed, the design and coding of the decryption tool are poorly developed, meaning if companies decide to pay the ransom, the decoding process for encrypted files can be really slow and there is no guarantee that all files will be recoverable.

Coverage and Protection Advice

McAfee’s EPP solution covers Babuk ransomware with an array of prevention and detection techniques.

McAfee ENS ATP provides behavioral content focusing on proactively detecting the threat while also delivering known IoCs for both online and offline detections. For DAT based detections, the family will be reported as Ransom-Babuk!. ENS ATP adds 2 additional layers of protection thanks to JTI rules that provide attack surface reduction for generic ransomware behaviors and RealProtect (static and dynamic) with ML models targeting ransomware threats.

Updates on indicators are pushed through GTI, and customers of Insights will find a threat-profile on this ransomware family that is updated when new and relevant information becomes available.

Initially, in our research the entry vector and the complete tactics, techniques and procedures (TTPs) used by the criminals behind Babuk remained unclear.

However, when its affiliate recruitment advertisement came online, and given the specific underground meeting place where Babuk posts, defenders can expect similar TTPs with Babuk as with other Ransomware-as-a-Service families.

In its recruitment posting Babuk specifically asks for individuals with pentest skills, so defenders should be on the lookout for traces and behaviors that correlate to open source penetration testing tools like winPEAS, Bloodhound and SharpHound, or hacking frameworks such as CobaltStrike, Metasploit, Empire or Covenant. Also be on the lookout for abnormal behavior of non-malicious tools that have a dual use, such as those that can be used for things like enumeration and execution, (e.g., ADfind, PSExec, PowerShell, etc.) We advise everyone to read our blogs on evidence indicators for a targeted ransomware attack (Part1Part2).

Looking at other similar Ransomware-as-a-Service families we have seen that certain entry vectors are quite common amongst ransomware criminals:

  • E-mail Spearphishing (T1566.001). Often used to directly engage and/or gain an initial foothold, the initial phishing email can also be linked to a different malware strain, which acts as a loader and entry point for the ransomware gangs to continue completely compromising a victim’s network. We have observed this in the past with Trickbot and Ryuk, Emotet and Prolock, etc.
  • Exploit Public-Facing Application (T1190) is another common entry vector; cyber criminals are avid consumers of security news and are always on the lookout for a good exploit. We therefore encourage organizations to be fast and diligent when it comes to applying patches. There are numerous examples in the past where vulnerabilities concerning remote access software, webservers, network edge equipment and firewalls have been used as an entry point.
  • Using valid accounts (T1078) is and has been a proven method for cybercriminals to gain a foothold. After all, why break the door if you have the keys? Weakly protected Remote Desktop Protocol (RDP) access is a prime example of this entry method. For the best tips on RDP security, we would like to highlight our blog explaining RDP security.
  • Valid accounts can also be obtained via commodity malware such as infostealers, that are designed to steal credentials from a victim’s computer. Infostealer logs containing thousands of credentials are purchased by ransomware criminals to search for VPN and corporate logins. As an organization, robust credential management and multi-factor authentication on user accounts is an absolute must have.

When it comes to the actual ransomware binary, we strongly advise updating and upgrading your endpoint protection, as well as enabling options like tamper protection and rollback. Please read our blog on how to best configure ENS 10.7 to protect against ransomware for more details.

Summary of the Threat

  • A recent forum announcement indicates that the Babuk operators are now expressly targeting Linux/UNIX systems, as well as ESXi and VMware systems
  • Babuk is riddled with coding mistakes, making recovery of data impossible for some victims, even if they pay the ransom
  • We believe these flaws in the ransomware have led the threat actor to move to data theft and extortion rather than encryption

Learn more about how Babuk is transitioning away from an encryption/ransom model to one focused on pure data theft and extortion in our detailed technical analysis.

The post Babuk: Biting off More than they Could Chew by Aiming to Encrypt VM and *nix Systems? appeared first on McAfee Blog.

Root Cause Analysis of a Printer’s Drivers Vulnerability CVE-2021-3438

By: voidsec
28 July 2021 at 12:00

Last week SentinelOne disclosed a “high severity” flaw in HP, Samsung, and Xerox printer’s drivers (CVE-2021-3438); the blog post highlighted a vulnerable strncpy operation with a user-controllable size parameter but it did not explain the reverse engineering nor the exploitation phase of the issue. With this blog post, I would like to analyse the vulnerability […]

The post Root Cause Analysis of a Printer’s Drivers Vulnerability CVE-2021-3438 appeared first on VoidSec.

Bitdefender: UPX Unpacking Featuring Ten Memory Corruptions

10 November 2020 at 12:00
This post breaks the two-year silence of this blog, showcasing a selection of memory corruption vulnerabilities in Bitdefender’s anti-virus engine. The goal of binary packing is to compress or obfuscate a binary, usually to save space/bandwidth or to evade malware analysis. A packed binary typically contains a compressed/obfuscated data payload. When the binary is executed, a loader decompresses this payload and then jumps to the actual entry point of the (inner) binary.

F-Secure Anti-Virus: Remote Code Execution via Solid RAR Unpacking

5 June 2018 at 12:00
As I briefly mentioned in my last two posts about the 7-Zip bugs CVE-2017-17969, CVE-2018-5996, and CVE-2018-10115, the products of at least one antivirus vendor were affected by those bugs. Now that all patches have been rolled out, I can finally make the vendor’s name public: It is F-Secure with all of its Windows-based endpoint protection products (including consumer products such as F-Secure Anti-Virus as well as corporate products such as F-Secure Server Security).

7-Zip: From Uninitialized Memory to Remote Code Execution

1 May 2018 at 12:00
After my previous post on the 7-Zip bugs CVE-2017-17969 and CVE-2018-5996, I continued to spend time on analyzing antivirus software. As it happens, I found a new bug that (as the last two bugs) turned out to affect 7-Zip as well. Since the antivirus vendor has not yet published a patch, I will add the name of the affected product in an update to this post as soon as this happens.

7-Zip: Multiple Memory Corruptions via RAR and ZIP

23 January 2018 at 13:00
In my previous posts about the two Bitdefender bugs related to 7z, I explicitly mentioned that Igor Pavlov’s 7-Zip reference implementation was not affected. Unfortunately, I cannot do the same for the bugs described in this blog post. I found these bugs in a prominent antivirus product and then realized that 7-Zip itself was affected. As the antivirus vendor has not yet published a patch, I will add the name of the affected product in an update to this post as soon as this happens.

Bitdefender: Heap Buffer Overflow via 7z LZMA

22 August 2017 at 12:00
A few days after having published the post about the Bitdefender stack buffer overflow via 7z PPMD, I discovered a new bug in Bitdefender’s product. While this is a 7z bug, too, it has nothing to do with the previous bug or with the PPMD codec. Instead, it concerns dynamic memory management. In contrast to the previous post, which described an arbitrary free vulnerability in F-Secure’s anti-virus product, this post presents the first heap buffer overflow of this blog series.

F-Secure Anti-Virus: Arbitrary Free Vulnerability via TNEF

8 August 2017 at 12:40
The previous posts of this blog series have been about stack based buffer overflows. With this post, I want to move on to bugs that involve dynamic memory management. Since there are not that many publicly documented arbitrary free vulnerabilities in prominent software products, I thought it would be worth sharing this one. The Transport Neutral Encapsulation Format (TNEF) is an e-mail attachment format developed by Microsoft. It can be used to represent complicated messages and attachments, consisting of many different files and file types, as a flattened stream.

Bitdefender: Remote Stack Buffer Overflow via 7z PPMD

18 July 2017 at 12:25
If you read my previous blog post and were bored by it, then this might be for you. With the second post of the series, I am delivering on the promise of discussing a bug that occurs in a more complex setting. A bug in a software module that extracts a prominent archive format (such as 7z) needs to be treated with great caution. It is often critical not only for the software itself, but also for many different software products that are sharing the same library or are based on the same reference implementation.

Avast Antivirus: Remote Stack Buffer Overflow with Magic Numbers

27 June 2017 at 12:05
If I told you I found a remotely triggerable stack-based buffer overflow in a conventional anti-virus product, in what part of the software would you expect it to be? A reasonable guess may be: “Probably in the parsing code of some complicated and likely obsolete file format”. In fact, the most recent anti-virus stack buffer overflows clearly show that the implementation of a parser for complex file formats is extremely challenging.

Announcing a New Blog Series on Anti-Virus Software

22 June 2017 at 17:40
In the past couple of years, we have seen quite a few critical security issues concerning anti-virus software. A significant number of bugs has been found by Google’s Project Zero, but there is also a lot of effort from other companies as well as from private researchers. Observing those issues, the following question arises naturally: Does anti-virus software decrease the security of our systems? This is a question that has been discussed for many years now, and for many experts the answer seems to be clear.

Customizing Semgrep Rules for Flask/Django and Other Popular Web Frameworks

We customize and use Semgrep a lot during our security assessments at IncludeSec because it helps us quickly locate potential areas of concern within large codebases. Static analysis tools (SAST) such as Semgrep are great for aiding our vulnerability hunting efforts and usually can be tied into Continuous Integration (CI) pipelines to help developers catch potential vulnerabilities early in the development process.  In a previous post, we compared two static analysis tools: Brakeman vs. Semgrep. A key takeaway from that post is that when it comes to custom rules, we found that Semgrep was easy to use.

The lovely developers of Semgrep, as well as the general open source community provide pre-written rules for many frameworks that can be used with extreme ease–all it requires is a command line switch and it works. For example:

semgrep --config "p/flask"

Running this on its own can catch bad practices and mistakes. However, writing custom rules can expand Semgrep’s out-of-the-box functionality significantly and is done by advanced security assessors who understand code level security concerns. Whether you want to add rules that look for more specific problems or similar rules with a bigger scope, it’s up to the end-user rule writer to expand in whichever direction they want.

In this post, we walk through some scenarios to write custom Semgrep rules for two popular Python frameworks: Django and Flask.

Why Write Custom Rules for Frameworks?

We see a lot of applications built on top of frameworks like Django and Flask and wanted to prevent duplicative manual effort to identify similar patterns of security concerns on every assessment. While the default community rules are very good in Semgrep, at IncludeSec we needed more than that. Making use of Semgrep’s powerful rules system makes it possible to extend these to cover even more sources of bugs related to framework usage, such as:

  • vulnerabilities caused by use of specific deprecated APIs
  • vulnerabilities caused by lack of error checking in specific patterns
  • vulnerabilities introduced due to lack of locking/mutexes
  • specific combinations of API calls that can cause inefficiencies or loss of performance, or even introduce race conditions

If any of these issues occur frequently on specific APIs, Semgrep is ideal as a one time investment will pay off dividends in the development process.

Making Use of Frameworks 

For developers, using frameworks like Django and Flask make coding easier and more secure. But they aren’t foolproof. If you use them incorrectly, it is still possible to make mistakes. And for each framework, these mistakes tend to follow common patterns.

SAST tools like Semgrep offer the possibility of automating checks for some of these patterns of mistakes to find vulnerabilities that may be common within a framework. 

An analogy for SAST tooling is a compiler whose warnings/errors you can configure extremely easily. This makes it a perfect fit when programming specific frameworks, as you can catch potentially dangerous usages of APIs & unsafe operations before code is ever committed. For auditors it is extremely helpful when working with large codebases, which can be daunting at first due to the sheer amount of code. SAST tooling can locate security “codesmells”, and where there is codesmell, there are often leads to possible security issues.

Step 1. Find patterns of mistakes

In order to write custom rules for a framework, you first have to do some research to identify where in the framework mistakes might occur.

The first place to look when identifying bad habits is the official documentation — often one can find big blocks of formatting with the words WARNING, ERROR, MISTAKE. These blocks can often clue you into common problems with examples, avoiding time wasted searching forums/Stack Overflow posts for common bugs.

The next place to search where one can find real world practical examples would be bug bounty platforms, such as HackerOne, BugCrowd, etc. Searching these platforms can result in quite niche but severe mistakes that might not be in official documentation but can occur in live production applications.

Finally, intentionally vulnerable “hack me” applications such as django.nV, which explain common vulnerabilities that might occur. With concise, straightforward exercises that one can do to learn and also hammer in the impact of the bugs at hand.

For example, in the Flask documentation for logins https://flask-login.readthedocs.io/en/latest/#login-example , a warning block mentions that 

Warning: You MUST validate the value of the next parameter. If you do not, your application will be vulnerable to open redirects. For an example implementation of is_safe_url see this Flask Snippet.

This block warns us about open redirects in the specific login situation it presents, we’ll use something similar for our vulnerable code example: an open redirect where the redirect parameter comes from a url encoded GET request.

Step 2. Identify the pieces of information and the markers in your code

When writing rules, we have to identify the pieces of information that the specific code encodes. This way we can ensure that the patterns we write will be as accurate as possible. Let’s look at an example from Flask:

from flask import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

In this example code, we can see a piece of Flask code that contains an open redirect vulnerability. We can dissect it into its various properties and see how we can match this in Semgrep. First we’ll mention the specific semantics of this function and what exactly we want to match.

Properties:

1. @app.route("/redirect/") – Already on the first line we see that our target functions have a route decorator that tells us that this function is used to handle a request, or that it directly receives user input by virtue of being an endpoint handler. Matching route/endpoint handlers is effective because input to an endpoint handler is unsanitized and could be a potential area of concern: 

from flask import redirect 
 
def do_redirect(uri):
    if is_logging_enabled():
        log(uri)
    
    return redirect(uri)
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    
    if unsafe_uri(uri):
        return redirect_to_index()
    
    return do_redirect(uri)

In the listing above if we were to match every function that includes do_redirect instead of only route handlers that include do_redirect we could end up with false positives where an input to a function has already been sanitized. Already here we have some added complexity that does not bode well with other static analysis tools. In this case we would match do_redirect even though the URI it receives has already been sanitized in the function unsafe_uri(uri). This brings us to our first constraint: we need to match route handlers. 

2.    def handle_request(uri):here it’s important that we match a function right below the function decorator, and that this function takes in a parameter. We could match any function that has a route decorator which also contains a redirect, but then we could possibly match a function where the redirect input is constant or comes from sanitized storage. Matching a route handler with a parameter guarantees that it receives unsanitized user input. We can be sure of this because Flask does not do any URL sanitization. Specifying this results in more accurate matching and finer detection and brings us to our second constraint: that we need to match route handlers with 1 or more parameters

3.    return redirect(uri)here it may seem obvious, all we have to do is match redirect, right? Sadly, it is not that easy. Many APIs can have generic names that may collide with other modules using a generic text/regex search, this can be especially problematic in languages that support function overloading, where a specific overloaded instance of a function may have problems, but other overloaded instances are fine. Not accounting for these may result in many false positives. For example, consider the following snippet:

from robot import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

If we only matched redirect, we would match the redirect function from a module named robot which could be a false positive. An even more horrifying scenario to match would be an API or module that is imported under another name, e.g.:

from flask import redirect as rd

Thankfully, specifying the origin of the function allows Semgrep to handle all these cases and we’ll go more into detail on this when developing the patterns.

What does a good pattern account for?

A good pattern depends on your goals and how you use rules: finding performance bottlenecks, enforcing better programming practices, or finding security concerns as an auditor, everyone’s needs are different.

For a security assessment, it is important to find potential areas of concern, for example often areas that do not include sanitization are potentially dangerous. Ideally we want to eliminate as many false positives as possible and we can do this by excluding functions with sanitization. This brings us to our final constraint: we don’t want to match any functions containing sanitization keywords.

The Constraints

So far we have the following constraints:

  • match a route handler
  • match a function that takes in 1 or more parameters
  • match a redirect in the function that takes in a parameter from the function
  • IDEALLY: don’t match a function containing sanitization keywords

Step 3. Developing The Pattern

Now that we know all the constraints, and the semantics of the code we want to match we can finally start writing the pattern. I’ll put the end pattern for display, and we’ll dissect it together. Semgrep takes YAML files that describe multiple rules. Each rule contains a specific pattern to match.

 rules:
- id: my_pattern_id
  languages:
  - python
  message: found open redirect
  severity: ERROR
  patterns:
  - pattern-inside: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify) 

rules: – Every Semgrep rule file has to start with the rules tag, this is an array of rules as a Semgrep rule file may contain multiple rules.

- id: my_pattern_id Every Semgrep rule in the rules array has an id, this is essentially the name of the rule and must be unique.

languages: 
  - python

The language this rule works with. This determines how it parses the pattern & which files it checks.

message: found open redirect the message displayed when the Semgrep search matches a pattern, you can think of this like a compiler warning message.

severity: ERROR determines the color and other aspects of the messages upon a successful match. You can think of this as a compiler error, except it’s just a more severe warning, this is good for filtering through different levels of matches with Semgrep, or to cut down on time by searching only for erroneous patterns.

patterns:
  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify)

This is the final part of the rule and contains the actual logic of the pattern, a rule has to contain a top-level pattern element. In order for a match to be successful the final result of all the logic has to be true. In this case the top level element is a patterns, which only returns true if all the elements underneath it return true.

  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)

This pattern searches for code that satisfies the first 3 constraints, with the ellipsis representing anything. @app.route(...) will match any call to that function with any number of arguments (including none).

def $X(..., $URI_VAR, ...):

matches any function, and stores its name in the variable $X. It then matches any argument in this function, whether it be in the middle or at the end and stores it in $URI_VAR.

The Ellipsis following matches any code in this function until the next statement in the pattern which in this case is flask.redirect($URI_VAR) which matches redirect only if its arguments come from the function variable $URI_VAR. If these constraints are all satisfied, it then passes the text it matches onto the next pattern and it returns true.

One amazing feature of Semgrep is its ability to match fully qualified function names, even when they are imported with an alias. In this case, matching flask.redirect($URI_VAR) would match only redirects from flask, even if they are imported with another name (such as redir or rd).

- pattern-not-regex: (sanitize|validate|safe|check|verify)

This pattern is responsible for eliminating potential false positives. It’s very simple: it runs a regex against the matched text and if the regex comes back with any matches, it returns false otherwise it returns true. This element is responsible for checking if sanitization elements exist in the function code. The text that is used to check for these sanitization elements is obviously not perfect, but it can be tailored to the project you are working on and can always be extended to include more possible keywords.

Step 4. Testing & Debugging

Now that we’ve made our pattern, we can test it on the online Semgrep playground to see if it works. Here we can make small changes and get instant feedback in order to improve our patterns. Below is an example of the rules at work matching the unsanitized open redirect and ignoring the safe redirect.

https://semgrep.dev/s/65lY

Trade Offs, Quantity vs Quality

When designing these patterns, it’s possible to spend all your time trying to write the best pattern that catches every situation, filters out all the false-positives and what not, but this is an almost futile endeavor and can lead into rabbit holes. Also, overly precise rules may filter things that weren’t even meant to be filtered. The dilemma always comes down to how many false positives are you willing to handle–this tradeoff is up to Semgrep users to decide for themselves. When absolutely critical it may be better to have more false positives but to catch everything, whereas from an auditor’s perspective it may be better to have a more precise ruleset to start with a good lead and to be efficient, and then audit unmatched code later. Or perhaps a graduated approach where higher false positive rules are enabled for subsequent runs of SAST tooling.

Return on Investment

When it comes to analysis tools, it’s important to understand how much you need to set up & maintain to truly get value back. If they are complicated to update and maintain sometimes it’s just not worth it. The great upside to Semgrep is the ease of use–one can start developing patterns after doing the 20 minute tutorial and make a significant amount of rules in a day, and the benefits can be felt immediately. It requires no fiddling with versions or complicated compiler setup, and once a ruleset has been developed it’ll work on any supported languages. 

Showcase – Django.nV

Django.nV is a very well-made intentionally vulnerable application that uses the Django framework to introduce a variety of bugs for learning framework-specific penetration testing, from XSS to more framework specific bugs. Thanks to nVisium for making a great training application open source!

We used Django.nV to test IncludeSec’s inhouse rules and came up with 4 new instances of vulnerabilities that the community rulesets missed:

django.nV/taskManager/settings.py
severity:warning rule:MD5Hasher for password: use a more secure hashing algorithm for password
124:PASSWORD_HASHERS = ['django.contrib.auth.hashers.MD5PasswordHasher']
 
django.nV/taskManager/templates/taskManager/base_backend.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
58:                        <span class="username"><i class="fa fa-user fa-fw"></i> {{ user.username|safe }}</span>
 
django.nV/taskManager/templates/taskManager/tutorials/base.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
54:                        <span class="username">{{ user.username|safe }}</span>
 
django.nV/taskManager/views.py
severity:warning rule:django open redirect: unvalidated open redirect
394:    return redirect(request.GET.get('redirect', '/taskManager/'))

MD5Hashing – detects that the MD5Hasher has been used for passwords, which is cryptographically insecure.

Unsafe template usage in HTML – detects the use of user parameters with the safe keyword in html, which could introduce XSS.

Open redirect – very similar to the example patterns we already discussed. It detects an open redirect in the logout view.

We’ve collaborated with the kind developers of Semgrep and the people over at returntocorp (ret2c) to get certain rules in the default Django Semgrep rule repository.

Conclusion

In conclusion, Semgrep makes it relatively painless to write custom static analysis rules to audit applications. Improper usage of framework APIs can be a common source of bugs, and we at IncludeSec found that a small amount of up front investment learning the syntax paid dividends when auditing applications using these frameworks.

The post Customizing Semgrep Rules for Flask/Django and Other Popular Web Frameworks appeared first on Include Security Research Blog.

❌
❌