❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMcAfee Blogs

Vulnerability Discovery in Open Source Libraries Part 1: Tools of the Trade

12 August 2020 at 15:59

Executive Summary

Open source has become the foundation for modern software development. Vendors use open source software to stay competitive and improve the speed, quality, and cost of the development process. At the same time, it is critical to maintain and audit open source libraries used in products as they can expose a significant volume of risk.

The responsibility of auditing code for potential security risks lies with the organization using it. We have seen one of the highest impact vulnerabilities originate with open source software in the past. The famous Equifax data breach was due to a vulnerability in open source component Apache Struts, widely used in mainstream web frameworks. Furthermore, the 2020 Open Source Security Risk and Analysis report states that out of the applications audited in 2019, 99% of codebases contained open source components and 75% of codebases contained vulnerabilities, with 49% of codebases containing high risk vulnerabilities.

Graphics libraries have a rich history of vulnerabilities and the volume of exploitable issues are especially magnified when the code base is relatively older and has not been recompiled recently. It turns out that graphics libraries on Linux are widely used in many applications but are not sufficiently audited and tested for security issues. This eventually became a driving force for us to test multiple vector graphics and GDI libraries on Linux, one of which was libEMF, a Linux C++ library written for a similar purpose and used in multiple graphics tools that support graphics conversion into other vector formats. We tested this library for several days and found multiple vulnerabilities, ranging from multiple denial-of-service issues, integer overflow, out-of-bounds memory access, use-after-free conditions, and uninitialized memory use.

All the vulnerabilities were locally exploitable. We reported them to the code’s maintainer, leading to two new versions of the library being released in a matter of weeks. This reflects McAfee’s commitment to protecting its customers from upcoming security threats, including defending them against those found in open source software. Through collaboration with McAfee researchers, all issues in this library were fixed in a timely manner.

In this blog we will emphasize why it is critical to audit the third-party code we often use in our products and outline general practices for security researchers to test it for security issues.

Introduction

Fuzzing is an extremely popular technique used by security researchers to discover potential zero-day vulnerabilities in open, as well as closed source software. Accepted as a fundamental process in product testing, it is employed by many organizations to discover vulnerabilities earlier in the product development lifecycle. At the same time, it is substantially overlooked. A well designed fuzzer typically comprises of a set of tools to make the fuzzing process relatively more efficient and fast enough to discover exploitable bugs in a short period, helping developers patch them early.

Several of the fuzzers available today help researchers guide the fuzzing process by measuring code coverage, by using static or dynamic code instrumentation techniques. This eventually results in more efficient and relevant inputs to the target software, exercising more code paths, leading to more vulnerabilities discovered in the target. Modern fuzzing frameworks also come with feedback-driven channels for maximizing the code coverage of the target software, by learning the input format along the way and comparing the code coverage of the input via feedback mechanisms, resulting in more efficient mutated inputs. Some of the state-of-the-art fuzzing frameworks available are American Fuzzy Lop (AFL), LibFuzzer and HongFuzz.

Fuzzers like AFL on Linux come with compiler wrappers (afl-gcc, afl-clang, etc.). With the assembly parsing module afl-as, AFL parses the generated assembly code to add compile-time instrumentation, helping in visualizing the code coverage. Additionally, modern compilers come with sanitizer modules like Address Sanitizers (ASAN), Memory Sanitizers (MSAN), Leak Sanitizers (LSAN), Thread Sanitizers (TSAN), etc., which can further increase the fuzzer’s bug finding abilities. Below highlights the variety of memory corruption bugs that can be discovered by sanitizers when used with fuzzers.

ASAN MSAN UBSAN TSAN LSAN
Use After Free Vulnerabilities Uninitialized Memory Reads Null Pointer Dereferences Race Conditions Run Time Memory Leak Detection
Heap Buffer Overflow Β  Signed Integer Overflows Β 
Stack Buffer Overflow Typecast Overflows
Initialization Order Bugs Divide by Zero Errors
Memory Leaks
Out of Bounds Access

Β 

One of the McAfee Vulnerability Research Team goals is to fuzz multiple open and closed source libraries and report vulnerabilities to the vendors before they are exploited. Over the next few sections of this blog, we aim to highlight the vulnerabilities we discovered and reported while researching one open source library, LibEMF (ECMA-234 Metafile Library).

Using American Fuzzy Lop (AFL)

Much of the technical detail and working of this state-of-the-art feedback-driven fuzzer is available in its documentation. While AFL has many use cases, its most common is to fuzz programs written in C / C++ since they are susceptible to widely exploited memory corruption bugs, and that is where AFL and its mutation strategies are extremely useful. AFL gave rise to several forks like AFLSmart , AFLFast and Python AFL, differing in their mutation strategies and extensions to increase performance. Eventually, AFL was also imported to the Windows platform, WinAFL, using a dynamic instrumentation approach predominantly for closed source binary fuzzing.

The fuzzing process primarily comprises the following tasks:

Fuzzing libEMF (ECMA-234 Metafile Library) with AFL

LibEMF (Enhanced Metafile Library) is an EMF parsing library written in C/C++ and provides a drawing toolkit based on ECMA-234. The purpose of this library is to create vector graphic files. Documentation of this library is available here and is maintained by the developer.

We chose to fuzz this LibEMF with AFL fuzzer because of its compile time instrumentation capabilities and good mutation strategies as mentioned earlier. We have the source code compiled in hardened mode, which will add code hardening options while invoking the downstream compiler, which helps with discovering memory corruption bugs.

Compiling the Source

To use the code instrumentation capabilities of AFL, we must compile the source code with the AFL compiler wrapper afl-gcc/afl-g++ and, with an additional address sanitizer flag enabled, use the following command:

./configure CXX=afl-g++ CFLAGS=”-fsanitize=address -ggdb” CXXFLAGS=”-fsanitize=address -ggdb” LDFLAGS=”-fsanitize=address”

Below is a snapshot of the compilation process showing how the instrumentation is added to the code:

Pwntools python package comes with a good utility script, checksec, that can examine the binary security properties. Executing checksec over the library confirms the code is now ASAN instrumented. This will allow us to discover non-crashing memory access bugs as well:

Test harness is a program that will use the APIs from the library to parse the file given to the program as the command line argument. AFL will use this harness to pass its mutated files as an argument to this program, resulting in several executions per second. While writing the harness, it is extremely important to release the resources before returning to avoid excessive usage which can eventually crash the system. Our harness for parsing EMF files using APIs from the libEMF library is shown here:

AFL will also track the code coverage with every input that it passes to the program and, if the mutations result in new program states, it will add the test case to the queue. We compiled our test harness using the following command:

afl-g++ -o playemffile playemffile.c -g -O2 -D_FORTIFY_SOURCE=0 -fsanitize=address -I /usr/local/include/libEMF/ -L/usr/local/lib/libEMF -lEMF

Collecting Test Cases

While a fuzzer can learn and generate the input format even from an empty seed file, gathering the intial corpus of input files is a significant step in an effective fuzzing process and can save huge amounts of CPU cycles. Depending upon the popularity of the file format, crawling the web and downloading the initial set of input files is one of the most intuitive approaches. In this case, it is not a bad idea to manually construct the input files with a variety of EMF record structures, using vector graphic file generation libraries or Windows GDI APIs. Pyemf is one such available library with Python bindings which can be used to generate EMF files. Below shows example code of generating an EMF file with an EMR_EXTEXTOUTW record using Windows APIs. Constructing these files with the different EMR records will ensure functionally different input files, exercising different record handlers in the code.

Running the Fuzzer

Running the fuzzer is just running the afl-fuzz command with the parameters as shown below. We would need to provide the input corpus of EMF files ( -i EMFs/ ) , output directory ( -o output/ ) and the path to the harness binary with @@, meaning the fuzzer will pass the file as an argument to the binary. We also need to use -m none since the ASAN instrumented binary needs a huge amount of memory.

afl-fuzz -m none -i EMFs/ -o output/ β€” /home/targets/libemf-1.0.11/tests/playemffile @@

However, we can make multiple tweaks to the running AFL instance to increase the number of executions per second. AFL provides a persistent mode which is in-memory fuzzing. This avoids forking a new process on every run, resulting in increased speed. We can also run multiple AFL instances, one on every core, to increase the speed. Beyond this, AFL also provides a file size minimization tool that can be used to minimize the test case size. We applied some of these optimization tricks and, as we can see below, there is a dramatic increase in the execution speed reaching ~500 executions per second.

After about 3 days of fuzzing this library, we had more than 200 unique crashes, and when we triaged them we noticed 5 unique crashes. We reported these crashes to the developer of the library along with MITRE, and after being acknowledged, CVE-2020-11863, CVE-2020-11864, CVE-2020-11865, CVE-2020-11866 and CVE-2020-13999 were assigned to these vulnerabilities. Below we discuss our findings for some of these vulnerabilities.

CVE-2020-11865 – Out of Bounds Memory Access While Enumerating the EMF Stock Objects

While triaging one of the crashes produced by the fuzzer, we saw SIGSEGV (memory access violation) for one of the EMF files given as an input. When the binary is compiled with the debugging symbols enabled, ASAN uses LLVM Symbolizer to produce the symbolized stack traces. As shown below, ASAN outputs the stack trace which helps in digging into this crash further.

Looking at the crash point in the disassembly clearly indicates the out of bounds memory access in GLOBALOBJECTS::find function.

Further analyzing this crash, it turned out that the vulnerability was in accessing of the global object vector which had pointers to stock objects. Stock objects are primarily logical graphics objects that can be used in graphics operations. Each of the stock objects used to perform graphical operations have their higher order bit set, as shown below from the MS documentation. During the metafile processing, the index of the relevant stock object can be determined by masking the higher order bit and then using that index to access the pointer to the stock object. Metafile processing code tries to retrieve the pointer from the global object vector by attempting to access the index after masking the higher order bit, as seen just above the crash point instruction, but does not check the size of the global object vector before accessing the index, leading to out of bounds vector access while processing a crafted EMF file.

Shown below is the vulnerable and fixed code where the vector size check was added:

CVE-2020-13999 – Signed Integer Overflow While Processing EMR_SCALEVIEWPORTEX Record

Another crash in the code that we triaged turned out to be a signed integer overflow condition while processing an EMR_SCALEVIEWPORTEXT record in the metafile. This record specifies the viewport in the current device context and is calculated by computing ratios. An EMR_SCALEVIEWPORTEXTEX record looks like this, as per the record specification. A new viewport is calculated as shown below:

As part of AFL’s binary mutation strategy, it applies a deterministic approach where certain hardcoded sets of integers replace the existing data. Some of these are MAX_INT, MAX_INT-1, MIN_INT, etc., which increases the likelihood of triggering edge conditions while the application processes binary data. One such mutation done by AFL in the EMF record structure is shown below:

This resulted in the following crash while performing the division operation.

Below we see how this condition, eventually leading to a denial-of-service, was fixed by adding division overflow checks in the code:

CVE-2020-11864 – Memory Leaks While Processing Multiple Metafile Records

Leak Sanitizer (LSAN) is yet another important tool which is integrated with the ASAN and can be used to detect runtime memory leaks. LSAN can also be used in standalone mode without ASAN. While triaging generated crashes, we noticed several memory leaks while processing multiple EMF record structures. One of them is as shown below while processing the EXTTEXTOUTA metafile record, which was later fixed in the code by releasing the memory buffer when there are exceptions reading the corrupted metafiles.

Apparently, memory leaks can lead to excessive resource usage in the system when the memory is not freed after it is no longer needed. This eventually leads to the denial-of-service. We found memory leak issues while libEMF processed several such metafile records. The same nature of fix, releasing the memory buffer, was applied to all the vulnerable processing code:

Additionally, we also reported multiple use-after-free conditions and denial-of-service issues which were eventually fixed in the newer version of the library released here.

Conclusion

Fuzzing is an important process and fundamental to testing the quality of a software product. The process becomes critical, especially when using third-party libraries in a product which may come with exploitable vulnerabilities. Auditing them for security issues is crucial. We believe the vulnerabilities that we reported are just the tip of the iceberg. There are several legacy libraries which likely require a thorough audit. Our research continues with several other similar Windows and Linux libraries and we will continue to report vulnerabilities through our coordinated disclosure process. We believe this also highlights that it is critical to maintain a good level of collaboration between vulnerability researchers and the open source community to have these issues reported and fixed in a timely fashion. Additionally, modern compilers come with multiple code instrumentation tools which can help detect a variety of memory corruption bugs when used early in the development cycle. Using these tools is recommended when auditing code for security vulnerabilities.

The post Vulnerability Discovery in Open Source Libraries Part 1: Tools of the Trade appeared first on McAfee Blog.

Vulnerability Discovery in Open Source Libraries: Analyzing CVE-2020-11863

1 September 2020 at 16:09

Open Source projects are the building blocks of any software development process. As we indicated in our previous blog, as more and more products use open source code, the increase in the overall attack surface is inevitable, especially when open source code is not audited before use. Hence it is recommended to thoroughly test it for potential vulnerabilities and collaborate with developers to fix them, eventually mitigating the attacks. We also indicated that we were researching graphics libraries in Windows and Linux, reporting multiple vulnerabilities in Windows GDI as well as Linux vector graphics library libEMF. We are still auditing many other Linux graphics libraries since these are legacy code and have not been strictly tested before.

In part 1 of this blog series, we described in detail the significance of open source research, by outlining the vulnerabilities we reported in the libEMF library. We also highlighted the importance of compiling the code with memory sanitizers and how it can help detect a variety of memory corruption bugs. In summary, the Address Sanitizer (ASAN) intercepts the memory allocation / deallocation functions like malloc () / free() and fills out the memory with the respective fill bytes (malloc_fill_byte / free_fill_byte). It also monitors the read and write to these memory locations, helping detect erroneous access during run time.

In this blog, we provide a more detailed analysis for one of the reported vulnerabilities, CVE-2020-11863, which was due to the use of uninitialized memory. This vulnerability is related to CVE-2020-11865, a global object vector out of bounds memory access in the GlobalObject::Find() function in libEMF. However, the crash call stack turned out to be different, which is why we decided to examine this further and produce this deep dive blog.

The information provided by the ASAN was sufficient to reproduce the vulnerability crash outside of the fuzzer. From the ASAN information, the vulnerability appeared to be a null pointer dereference, but this was not the actual root cause, as we will discuss below.

Looking at the call stack, it appears that the application crashed while dynamically casting the object, for which there could be multiple reasons. Out of those possible reasons that seem likely, either the application attempted to access the non-existent virtual table pointer, or the object address returned from the function was a wild address accessed when the application crashed. Getting more context about this crash, we came across an interesting register value while debugging. Below shows the crash point in the disassembly indicating the non-existent memory access.

If we look at the state of the registers at the crash point, it is particularly interesting to note that the register rdi has an unusual value of 0xbebebebebebebebe. We wanted to dig a little deeper to check out how this value got into the register, resulting in the wild memory access. Since we had the source of the library, we could check right away what this register meant in terms of accessing the objects in memory.

Referring to the Address Sanitizer documentation, it turns out that the ASAN writes 0xbe to the newly allocated memory by default, essentially meaning this 64-bit value was written but the memory was not initialized. The ASAN calls this as the malloc_fill_byte. It also does the same by filling the memory with the free_fill_byte when it is freed. This eventually helps identify memory access errors.

This nature of the ASAN can also be verified in the libsanitizer source here. Below is an excerpt from the source file.

Looking at the stack trace at the crash point as shown below, the crash occurred in the SelectObject() function. This part of the code is responsible for processing the EMR_SELECTOBJECT record structure of the Enhanced Meta File (EMF) file and the graphics object handle passed to the function is 0x80000018. We want to investigate the flow of the code to check if this is something which comes directly from the input EMF file and can be controlled by an attacker.

In the SelectObject() function, while processing the EMR_SELECTOBJECT record structure, the handle to the GDI object is passed to GlobalObjects.find() as shown in the above code snippet, which in turn accesses the global stock object vector by masking the higher order bit from the GDI object handle and converting it into the index, eventually returning the stock object reference from the object vector using the converted index number. Stock object enumeration specifies the indexes of predefined logical graphics objects that can be used in graphics operations documented in the MS documentation. For instance, if the object handle is 0x8000018, this will be ANDed with 0x7FFFFFFF, resulting in 0x18, which will be used as the index to the global stock object vector. This stock object reference is then dynamically cast into the graphics object, following which EMF::GRAPHICSOBJECT member function getType ( ) is called to determine the type of the graphics object and then, later in this function, it is again cast into an appropriate graphics object (BRUSH, PEN, FONT, PALETTE, EXTPEN), as shown in the below code snippet.

EMF::GRAPHICSOBJECT is the class derived from EMF::OBJECT and the inheritance diagram of the EMF::OBJECT class is as shown below.

However, as mentioned earlier, we were interested in knowing if the object handle, passed as an argument to the SelectObject function, can controlled by an attacker. To be able to get context on this, let us look at the format of the EMR_SELECTOBJECT record as shown below.

As we notice here, ihObject is the 4-byte unsigned integer specifying the index to the stock object enumeration. In this case the stock object references are maintained in the global objects vector. Here, the object handle of 0x80000018 implies that index 0x18 will be used to access the global stock object vector. If, during this time, the length of the object vector is less then 0x18 and the length check is not done prior to accessing the object vector, it will result in out of bounds memory access.

Below is the visual representation of processing the EMR_SELECTOBJECT metafile record.

While debugging this issue, we enable a break point at GlobalObjects.find () and continue until we have object handle 0x80000018; essentially, we reach the point where the above highlighted EMR_SELECTOBJECT record is being processed. As shown below, the object handle is converted into the index (0x18 = 24) to access the object vector of size (0x16 = 22), resulting into out of bounds access, which we reported as CVE-2020-11865.

Further stepping into the code, it enters the STL vector library stl_vector.h which implements the dynamic expansion of the std::vectors. Since the objects vector at this point of time has only 22 elements, the STL vector will expand the vector to the size indicated by the parameter highlighted, accessing the vector by passed index, and will return the value at that object reference, as shown in the below code snippet, which comes out to be 0xbebebebebebebebe as filled by the ASAN.

Β 

The code uses the std:allocator to manage the vector memory primarily used for memory allocation and deallocation. On further analysis, it turns out that the value returned, 0xbebebebebebebebe in this case, is the virtual pointer of the non-existent stock object, which is dereferenced during dynamic casting, resulting in a crash.

As mentioned in our earlier blog, the fixes to the library have been released in a subsequent version, available here.

Conclusion

While using third party code in products certainly saves time and increases development speed, it potentially comes with an increase in the volume of vulnerabilities, especially when the code remains unaudited and integrated into products without any testing. It is extremely critical to perform fuzz testing of the open source libraries used, which can help in discovering vulnerabilities earlier in the development cycle and provides an opportunity to fix them before the product is shipped, consequently mitigating attacks. However, as we emphasized in our previous blog, it is critical to strengthen the collaboration between vulnerability researchers and the open source community to continue responsible disclosures, allowing the maintainers of the code to address them in a timely fashion.

The post Vulnerability Discovery in Open Source Libraries: Analyzing CVE-2020-11863 appeared first on McAfee Blog.

An Inside Look into Microsoft Rich Text Format and OLE Exploits

24 January 2020 at 18:09

There has been a dramatic shift in the platforms targeted by attackers over the past few years. Up until 2016, browsers tended to be the most common attack vector to exploit and infect machines but now Microsoft Office applications are preferred, according to a report published here during March 2019. Increasing use of Microsoft Office as a popular exploitation target poses an interesting security challenge. Apparently, weaponized documents in email attachments are a top infection vector.

Object Linking and Embedding (OLE), a technology based on Component Object Model (COM), is one of the features in Microsoft Office documents which allows the objects created in other Windows applications to be linked or embedded into documents, thereby creating a compound document structure and providing a richer user experience. OLE has been massively abused by attackers over the past few years in a variety of ways. OLE exploits in the recent past have been observed either loading COM objects to orchestrate and control the process memory, take advantage of the parsing vulnerabilities of the COM objects, hide malicious code or connecting to external resources to download additional malware.

Microsoft Rich Text Format is heavily used in the email attachments in phishing attacks. It has been gaining massive popularity and its wide adoption in phishing attacks is primarily attributed to the fact that it has an ability to contain a wide variety of exploits and can be used efficiently as a delivery mechanism to target victims. Microsoft RTF files can embed various forms of object types either to exploit the parsing vulnerabilities or to aid further exploitation. The Object Linking and Embedding feature in Rich Text Format files is largely abused to either link the RTF document to external malicious code or to embed other file format exploits within itself and use it as the exploit container. Apparently, the RTF file format is very versatile.

In the below sections, we attempt to outline some of the exploitation and infection strategies used in Microsoft Rich Text format files over the recent past and then towards the end , we introspect on the key takeaways that can help automate the analysis of RTF exploits and set the direction for the generic analysis approach.

RTF Control Words

Rich Text Format files are heavily formatted using control words. Control words in the RTF files primarily define the way the document is presented to the user. Since these RTF control words have the associated parameters and data, parsing errors for them can become a target for exploitation. Exploits in the past have been found using control words to embed malicious resources as well. Consequently, it becomes significant to examine a destination control word that consumes data and extract the stream. RTF specifications describe several hundred control words consuming data.

RTF parsers must also be able to handle the control word obfuscation mechanisms commonly used by attackers, to further aid the analysis process. Below is one of the previous instances’ exploits using control word parameters to introduce executable payloads inside the datastore control word.

Overlay Data in RTF Files

Overlay data is the additional data which is appended to the end of RTF documents and is predominantly used by exploit authors to embed decoy files or additional resources, either in the clear, or encrypted form which is usually decrypted when the attacker-controlled code is executed. Overlay data of the volume beyond a certain size should be deemed suspicious and must be extracted and analysed further. However, Microsoft Word RTF parser will ignore the overlay data while processing RTF documents. Below are some instances of RTF exploits with a higher volume of overlay data appended at the end of the file, with CVE-2015-1641 embedding both the decoy document and multi-staged shellcodes with markers.

Object Linking and Embedding in RTF Files

Linked or embedded objects in RTF documents are represented as RTF objects, precisely to the RTF destination control word β€œobject”. The data for the embedded or linked object is stored as the parameter to the RTF sub-destination control word β€œobjdata” in the hex-encoded OLESaveToStream format. Modifier control word β€œobjclass” determines the type of the object embedded in the RTF files and helps the client application to render the object. However, the hex encoded object data as the argument to the β€œobjdata” control word can also be heavily obfuscated, either to make the reverse engineering and analysis effort more time consuming or to break the immature RTF parsers. Apparently, OLE has been one of the dominant attack vectors in the recent past, with many instances of OLE based exploits used in targeted attacks, essentially implying robust RTF document parsers for the extraction of objects, along with deeper inspection of object data is extremely critical.

Object Linking – Linking RTF to External Resource

Using object linking, it is possible to link the RTF files to the remote object which could be the link to the malicious resource hosted on the remote server. This leads the resulting RTF file to behave as a downloader and subsequently execute the downloaded resource by invoking the registered application-specific resource handlers. Inspecting the modifier RTF control words to β€œobject”, linked objects are indicated by another nested control word β€œobjautlink”, as represented below in the RTF document.

As indicated in the above representation, object data as the argument to the RTF control word β€œobjdata” is OLE1.0NativeStream in the OLESaveToStream format which is followed by the NativeDataSize indicating the size of the OLE2.0 Compound document that is wrapped in the NativeStream. As per the Rich Text Format specifications, if the object is linked to the container application, which in this case is the RTF document, the Root Storage directory entry of the compound document will have the CLSID of the StdOleLink indicating the linked object. Also, when the object is in the OLE2.0 format, the linked source data is specified in the MonikerStream of the OLESteam structure. As highlighted below, while parsing the object data, the ole32.OleConvertOLESTREAMToIStorage function is responsible for converting the OLE1.0 NativeStream data to OLE2.0 structured storage format. Following the pointer to the OLE stream lpolestream will allow us to visualize the parsed extracted native data. Below is a memory snapshot from when an RTF document with a linked object was parsed by the winword.exe process.

Launching the RTF document with the link to external object will throw up a dialogue box asking to update the data from the linked object, as shown below.

However, this is not the ideal exploitation strategy to target victims. This error can be eliminated by inserting another modifier control word β€œobjupdate”, which internally calls link object’s IOleObject::Update method to update the link’s source.

Subsequently the urlmon.dll, which is the registered server for the URL Moniker, is instantiated.

Once the COM object is instantiated, the connection is initiated to the external resource and, based on the content-type header returned by the server in the response, URL Moniker consults the Mime database in the registry and invokes registered application handlers.

Details on how URL Moniker is executed and an algorithm to determine which appropriate handlers to invoke is described by Microsoft here. Β We have had multiple such RTF exploits in the past including CVE-2017-0199, CVE-2017-8756 and others using Monikers to download and execute remote code.

However, COM objects used in the mentioned exploits had been blacklisted by Microsoft in the newer versions, but similar techniques could be used in future which essentially necessitates the analysis of OLE structured storage streams.

Object Embedding – RTF Containing OLE Controls

As indicated earlier, embedded objects are represented in the container documents in the OLE2 format. When the object is stored in the OLE2 format, the container application (here Rich Text Format files) creates the OLE Compound File Storage for each of the objects embedded and the respective object data is stored in the OLE Compound File Stream Objects. Layout of the container documents storing embedded objects is as represented below and described in the Microsoft documentation here.

RTF exploits historically have been found embedding and loading multiple OLE controls in order to bypass exploit mitigations and to take advantage of memory corruption vulnerabilities by loading vulnerable OLE controls. Embedded OLE controls in the RTF document are usually indicated by nested control word β€œobjocx” or β€œobjemb” followed by the β€œobjclass” with the argument as the name of the OLE control to render the object. Below is one of the examples of the previous exploit used in the targeted attacks, which exploited a vulnerability in the COM object and loaded another OLE control to aid the exploitation process which had the staged malicious code embedded. Apparently, it is critical to extract this object data, extract the OLE2 compound file storage and extract each of the stream objects for further inspection of hidden malicious shellcodes.

Object Embedding – RTF Containing Other Documents

Malicious RTF documents can use the OLE functionality to embed other file formats like Flash files and Word documents, either to exploit respective file format vulnerabilities or to further assist and set up the stage for the successful exploitation process. Multiple RTF exploits have been observed in the past embedding OOXML documents using OLE functionality to manipulate the process heap memory and bypass Windows exploit mitigations. In RTF files, embedded objects are usually indicated by nested control word β€œobjemb” with a version-dependent β€œProgID” string as the argument to the nested control word β€œobjclass”. One such RTF exploit used in targeted attacks in the recent past, is as indicated below.

Below is another instance where the PDF file was physically embedded within the compound document. As mentioned, the embedded object is stored physically along with all the information required to render it.

In the embedded object, the creating application’s identifier is stored in the CLSID field of the compound file directory entry of the CFB storage object. If we take a look at the previous instance, when the object data is extracted and inspected manually, the following CLSID is observed in the CFB storage object, which corresponds to the CLSID_Microsoft_Word_Document.

When OLE2 stream objects are parsed and the embedded OOXML is extracted and analysed after deflating the contents, we see the suspicious ActiveX object loading activity and embedded malicious code in one of the binary files. Apparently, it is significant to extract the embedded files in RTF and perform further analysis.

OLE Packages in RTF Files

RTF documents can also embed other file types like scripts (VBSsript, JavaScript, etc.), XML files and executables via OLE packages. An OLE package in an RTF file is indicated by the ProgID string β€œpackage” as the argument to the nested control word β€œobjclass”. Packager format is the legacy format that does not have an associated OLE server. Looking at the associated CLSID in the registry, there is no specific data format mapped with Packages.

This essentially implies that OLE packages can store multiple file types and, if a user clicks the object, it will lead to execution of it and, eventually, infection of the machine if they are malicious scripts. RTF documents have been known to deliver malware by embedding scripts via OLE packages and then using Monikers, as described in the previous sections, to drop files in the desired directory and then execute them. One such instance of a malicious RTF document exploiting CVE-2018-0802, embedding an executable file, is shown below.

Since many RTF documents have been found delivering malware via OLE packages, it is critical to look for these embedded objects and analyse them for such additional payloads. Embedded executables / scripts within RTF could be malicious. Looking for OLE packages and extracting embedded files should be a trivial task.

The above exploit delivery strategies can allow us to take a step towards building analysis frameworks for RTF documents. Primarily, inspecting the linked or embedded objects turns out to be the critical aspect of automated analysis tasks along with the RTF control words inspection. The following are the key takeaways:

  • Using the RTF file as the container, many other file format exploits can be embedded inside using the Object Linking and Embedding feature, essentially weaponizing the RTF documents.
  • Extract and analysing embedded or linked objects for malicious code, payload or resource handler invocations becomes very essential.
  • If RTF document has a higher volume of appended data, it must be further looked at.
  • Non-OLE control words and OLE packages must also be analysed for any malicious content.

McAfee Response

As Microsoft Office vulnerabilities continue to surface, generic inspection methods will have to be improved and enhanced, consequently leading to better detection results. As a reminder, the McAfee Anti-Malware engine used on all our endpoints and most of our appliances has the potential to unpack Office, RTF and OLE documents, expose the streams of content and unpack these streams if necessary.

The post An Inside Look into Microsoft Rich Text Format and OLE Exploits appeared first on McAfee Blog.

Access Token Theft and Manipulation Attacks – A Door to Local Privilege Escalation

20 April 2021 at 15:27
how to run a virus scan

Executive Summary

Many malware attacks designed to inflict damage on a network are armed with lateral movement capabilities. Post initial infection, such malware would usually need to perform a higher privileged task or execute a privileged command on the compromised system to be able to further enumerate the infection targets and compromise more systems on the network. Consequently, at some point during its lateral movement activities, it would need to escalate its privileges using one or the other privilege escalation techniques. Once malware or an attacker successfully escalates its privileges on the compromised system, it will acquire the ability to perform stealthier lateral movement, usually executing its tasks under the context of a privileged user, as well as bypassing mitigations like User Account Control.

Process access token manipulation is one such privilege escalation technique which is widely adopted by malware authors. These set of techniques include process access token theft and impersonation, which eventually allows malware to advance its lateral movement activities across the network in the context of another logged in user or higher privileged user.

When a user authenticates to Windows via console (interactive logon), a logon session is created, and an access token is granted to the user. Windows manages the identity, security, or access rights of the user on the system with this access token, essentially determining what system resources they can access and what tasks can be performed. An access token for a user is primarily a kernel object and an identification of that user in the system, which also contains many other details like groups, access rights, integrity level of the process, privileges, etc. Fundamentally, a user’s logon session has an access token which also references their credentials to be used for Windows single sign on (SSO) authentication to access the local or remote network resources.

Once the attacker gains an initial foothold on the target by compromising the initial system, they would want to move around the network laterally to access more resource or critical assets. One of the ways for an attacker to achieve this is to use the identity or credentials of already logged-on users on the compromised machine to pivot to other systems or escalate their privileges and perform the lateral movement in the context of another logged on higher privileged user. Process access token manipulation helps the attackers to precisely accomplish this goal.

For our YARA rule, MITRE ATT&CK techniques and to learn more about the technical details of token manipulation attacks and how malware executes these attacks successfully at the code level, read our complete technical analysis here.

Coverage

McAfee On-Access-Scan has a generic detection for this nature of malware Β as shown in the below screenshot:

Additionally, the YARA rule mentioned at the end of the technical analysis document can also be used to detect the token manipulation attacks by importing the rule in the Threat detection solutions like McAfee Advance Threat Defence, this behaviour can be detected.

Summary of the Threat

Several types of malware and advanced persistent threats abuse process tokens to gain elevated privileges on the system. Malware can take multiple routes to achieve this goal. However, in all these routes, it would abuse the Windows APIs to execute the token stealing or token impersonation to gain elevated privileges and advance its lateral movement activities.

  • If the current logged on user on the compromised or infected machine is a part of the administrator group of users OR running a process with higher privileges (e.g., by using β€œrunas” command), malware can abuse the privileges of the process’s access token to elevate its privileges on the system, thereby enabling itself to perform privileged tasks.
  • Malware can use multiple Windows APIs to enumerate the Windows processes running with higher privileges (usually SYSTEM level privileges), acquire the access tokens of those processes and start new processes with the acquired token. This results in the new process being started in the context of the user represented by the token, which is SYSTEM.
  • Malware can also execute a token impersonation attack where it can duplicate the access tokens of the higher privileged SYSTEM level process, convert it into the impersonation token by using appropriate Windows functionality and then impersonate the SYSTEM user on the infected machine, thereby elevating its privileges.
  • These token manipulation attacks will allow malware to use the credentials of the current logged on user or the credentials of another privileged user to authenticate to the remote network resource, leading to advancement of its lateral movement activities.
  • These attack techniques allows malware to bypass multiple mitigations like UAC, access control lists, heuristics detection techniques and allowing malware to remain stealthier while moving laterally inside the network.

Β 

Access Token Theft and Manipulation Attacks – Technical Analysis

Access Token Theft and Manipulation Attacks – A Door to Local Privilege Escalation.

Read Now

Β 

The post Access Token Theft and Manipulation Attacks – A Door to Local Privilege Escalation appeared first on McAfee Blog.

❌
❌