❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMcAfee Blogs

Vulnerability Discovery in Open Source Libraries: Analyzing CVE-2020-11863

1 September 2020 at 16:09

Open Source projects are the building blocks of any software development process. As we indicated in our previous blog, as more and more products use open source code, the increase in the overall attack surface is inevitable, especially when open source code is not audited before use. Hence it is recommended to thoroughly test it for potential vulnerabilities and collaborate with developers to fix them, eventually mitigating the attacks. We also indicated that we were researching graphics libraries in Windows and Linux, reporting multiple vulnerabilities in Windows GDI as well as Linux vector graphics library libEMF. We are still auditing many other Linux graphics libraries since these are legacy code and have not been strictly tested before.

In part 1 of this blog series, we described in detail the significance of open source research, by outlining the vulnerabilities we reported in the libEMF library. We also highlighted the importance of compiling the code with memory sanitizers and how it can help detect a variety of memory corruption bugs. In summary, the Address Sanitizer (ASAN) intercepts the memory allocation / deallocation functions like malloc () / free() and fills out the memory with the respective fill bytes (malloc_fill_byte / free_fill_byte). It also monitors the read and write to these memory locations, helping detect erroneous access during run time.

In this blog, we provide a more detailed analysis for one of the reported vulnerabilities, CVE-2020-11863, which was due to the use of uninitialized memory. This vulnerability is related to CVE-2020-11865, a global object vector out of bounds memory access in the GlobalObject::Find() function in libEMF. However, the crash call stack turned out to be different, which is why we decided to examine this further and produce this deep dive blog.

The information provided by the ASAN was sufficient to reproduce the vulnerability crash outside of the fuzzer. From the ASAN information, the vulnerability appeared to be a null pointer dereference, but this was not the actual root cause, as we will discuss below.

Looking at the call stack, it appears that the application crashed while dynamically casting the object, for which there could be multiple reasons. Out of those possible reasons that seem likely, either the application attempted to access the non-existent virtual table pointer, or the object address returned from the function was a wild address accessed when the application crashed. Getting more context about this crash, we came across an interesting register value while debugging. Below shows the crash point in the disassembly indicating the non-existent memory access.

If we look at the state of the registers at the crash point, it is particularly interesting to note that the register rdi has an unusual value of 0xbebebebebebebebe. We wanted to dig a little deeper to check out how this value got into the register, resulting in the wild memory access. Since we had the source of the library, we could check right away what this register meant in terms of accessing the objects in memory.

Referring to the Address Sanitizer documentation, it turns out that the ASAN writes 0xbe to the newly allocated memory by default, essentially meaning this 64-bit value was written but the memory was not initialized. The ASAN calls this as the malloc_fill_byte. It also does the same by filling the memory with the free_fill_byte when it is freed. This eventually helps identify memory access errors.

This nature of the ASAN can also be verified in the libsanitizer source here. Below is an excerpt from the source file.

Looking at the stack trace at the crash point as shown below, the crash occurred in the SelectObject() function. This part of the code is responsible for processing the EMR_SELECTOBJECT record structure of the Enhanced Meta File (EMF) file and the graphics object handle passed to the function is 0x80000018. We want to investigate the flow of the code to check if this is something which comes directly from the input EMF file and can be controlled by an attacker.

In the SelectObject() function, while processing the EMR_SELECTOBJECT record structure, the handle to the GDI object is passed to GlobalObjects.find() as shown in the above code snippet, which in turn accesses the global stock object vector by masking the higher order bit from the GDI object handle and converting it into the index, eventually returning the stock object reference from the object vector using the converted index number. Stock object enumeration specifies the indexes of predefined logical graphics objects that can be used in graphics operations documented in the MS documentation. For instance, if the object handle is 0x8000018, this will be ANDed with 0x7FFFFFFF, resulting in 0x18, which will be used as the index to the global stock object vector. This stock object reference is then dynamically cast into the graphics object, following which EMF::GRAPHICSOBJECT member function getType ( ) is called to determine the type of the graphics object and then, later in this function, it is again cast into an appropriate graphics object (BRUSH, PEN, FONT, PALETTE, EXTPEN), as shown in the below code snippet.

EMF::GRAPHICSOBJECT is the class derived from EMF::OBJECT and the inheritance diagram of the EMF::OBJECT class is as shown below.

However, as mentioned earlier, we were interested in knowing if the object handle, passed as an argument to the SelectObject function, can controlled by an attacker. To be able to get context on this, let us look at the format of the EMR_SELECTOBJECT record as shown below.

As we notice here, ihObject is the 4-byte unsigned integer specifying the index to the stock object enumeration. In this case the stock object references are maintained in the global objects vector. Here, the object handle of 0x80000018 implies that index 0x18 will be used to access the global stock object vector. If, during this time, the length of the object vector is less then 0x18 and the length check is not done prior to accessing the object vector, it will result in out of bounds memory access.

Below is the visual representation of processing the EMR_SELECTOBJECT metafile record.

While debugging this issue, we enable a break point at GlobalObjects.find () and continue until we have object handle 0x80000018; essentially, we reach the point where the above highlighted EMR_SELECTOBJECT record is being processed. As shown below, the object handle is converted into the index (0x18 = 24) to access the object vector of size (0x16 = 22), resulting into out of bounds access, which we reported as CVE-2020-11865.

Further stepping into the code, it enters the STL vector library stl_vector.h which implements the dynamic expansion of the std::vectors. Since the objects vector at this point of time has only 22 elements, the STL vector will expand the vector to the size indicated by the parameter highlighted, accessing the vector by passed index, and will return the value at that object reference, as shown in the below code snippet, which comes out to be 0xbebebebebebebebe as filled by the ASAN.

Β 

The code uses the std:allocator to manage the vector memory primarily used for memory allocation and deallocation. On further analysis, it turns out that the value returned, 0xbebebebebebebebe in this case, is the virtual pointer of the non-existent stock object, which is dereferenced during dynamic casting, resulting in a crash.

As mentioned in our earlier blog, the fixes to the library have been released in a subsequent version, available here.

Conclusion

While using third party code in products certainly saves time and increases development speed, it potentially comes with an increase in the volume of vulnerabilities, especially when the code remains unaudited and integrated into products without any testing. It is extremely critical to perform fuzz testing of the open source libraries used, which can help in discovering vulnerabilities earlier in the development cycle and provides an opportunity to fix them before the product is shipped, consequently mitigating attacks. However, as we emphasized in our previous blog, it is critical to strengthen the collaboration between vulnerability researchers and the open source community to continue responsible disclosures, allowing the maintainers of the code to address them in a timely fashion.

The post Vulnerability Discovery in Open Source Libraries: Analyzing CVE-2020-11863 appeared first on McAfee Blog.

Vulnerability Discovery in Open Source Libraries Part 1: Tools of the Trade

12 August 2020 at 15:59

Executive Summary

Open source has become the foundation for modern software development. Vendors use open source software to stay competitive and improve the speed, quality, and cost of the development process. At the same time, it is critical to maintain and audit open source libraries used in products as they can expose a significant volume of risk.

The responsibility of auditing code for potential security risks lies with the organization using it. We have seen one of the highest impact vulnerabilities originate with open source software in the past. The famous Equifax data breach was due to a vulnerability in open source component Apache Struts, widely used in mainstream web frameworks. Furthermore, the 2020 Open Source Security Risk and Analysis report states that out of the applications audited in 2019, 99% of codebases contained open source components and 75% of codebases contained vulnerabilities, with 49% of codebases containing high risk vulnerabilities.

Graphics libraries have a rich history of vulnerabilities and the volume of exploitable issues are especially magnified when the code base is relatively older and has not been recompiled recently. It turns out that graphics libraries on Linux are widely used in many applications but are not sufficiently audited and tested for security issues. This eventually became a driving force for us to test multiple vector graphics and GDI libraries on Linux, one of which was libEMF, a Linux C++ library written for a similar purpose and used in multiple graphics tools that support graphics conversion into other vector formats. We tested this library for several days and found multiple vulnerabilities, ranging from multiple denial-of-service issues, integer overflow, out-of-bounds memory access, use-after-free conditions, and uninitialized memory use.

All the vulnerabilities were locally exploitable. We reported them to the code’s maintainer, leading to two new versions of the library being released in a matter of weeks. This reflects McAfee’s commitment to protecting its customers from upcoming security threats, including defending them against those found in open source software. Through collaboration with McAfee researchers, all issues in this library were fixed in a timely manner.

In this blog we will emphasize why it is critical to audit the third-party code we often use in our products and outline general practices for security researchers to test it for security issues.

Introduction

Fuzzing is an extremely popular technique used by security researchers to discover potential zero-day vulnerabilities in open, as well as closed source software. Accepted as a fundamental process in product testing, it is employed by many organizations to discover vulnerabilities earlier in the product development lifecycle. At the same time, it is substantially overlooked. A well designed fuzzer typically comprises of a set of tools to make the fuzzing process relatively more efficient and fast enough to discover exploitable bugs in a short period, helping developers patch them early.

Several of the fuzzers available today help researchers guide the fuzzing process by measuring code coverage, by using static or dynamic code instrumentation techniques. This eventually results in more efficient and relevant inputs to the target software, exercising more code paths, leading to more vulnerabilities discovered in the target. Modern fuzzing frameworks also come with feedback-driven channels for maximizing the code coverage of the target software, by learning the input format along the way and comparing the code coverage of the input via feedback mechanisms, resulting in more efficient mutated inputs. Some of the state-of-the-art fuzzing frameworks available are American Fuzzy Lop (AFL), LibFuzzer and HongFuzz.

Fuzzers like AFL on Linux come with compiler wrappers (afl-gcc, afl-clang, etc.). With the assembly parsing module afl-as, AFL parses the generated assembly code to add compile-time instrumentation, helping in visualizing the code coverage. Additionally, modern compilers come with sanitizer modules like Address Sanitizers (ASAN), Memory Sanitizers (MSAN), Leak Sanitizers (LSAN), Thread Sanitizers (TSAN), etc., which can further increase the fuzzer’s bug finding abilities. Below highlights the variety of memory corruption bugs that can be discovered by sanitizers when used with fuzzers.

ASAN MSAN UBSAN TSAN LSAN
Use After Free Vulnerabilities Uninitialized Memory Reads Null Pointer Dereferences Race Conditions Run Time Memory Leak Detection
Heap Buffer Overflow Β  Signed Integer Overflows Β 
Stack Buffer Overflow Typecast Overflows
Initialization Order Bugs Divide by Zero Errors
Memory Leaks
Out of Bounds Access

Β 

One of the McAfee Vulnerability Research Team goals is to fuzz multiple open and closed source libraries and report vulnerabilities to the vendors before they are exploited. Over the next few sections of this blog, we aim to highlight the vulnerabilities we discovered and reported while researching one open source library, LibEMF (ECMA-234 Metafile Library).

Using American Fuzzy Lop (AFL)

Much of the technical detail and working of this state-of-the-art feedback-driven fuzzer is available in its documentation. While AFL has many use cases, its most common is to fuzz programs written in C / C++ since they are susceptible to widely exploited memory corruption bugs, and that is where AFL and its mutation strategies are extremely useful. AFL gave rise to several forks like AFLSmart , AFLFast and Python AFL, differing in their mutation strategies and extensions to increase performance. Eventually, AFL was also imported to the Windows platform, WinAFL, using a dynamic instrumentation approach predominantly for closed source binary fuzzing.

The fuzzing process primarily comprises the following tasks:

Fuzzing libEMF (ECMA-234 Metafile Library) with AFL

LibEMF (Enhanced Metafile Library) is an EMF parsing library written in C/C++ and provides a drawing toolkit based on ECMA-234. The purpose of this library is to create vector graphic files. Documentation of this library is available here and is maintained by the developer.

We chose to fuzz this LibEMF with AFL fuzzer because of its compile time instrumentation capabilities and good mutation strategies as mentioned earlier. We have the source code compiled in hardened mode, which will add code hardening options while invoking the downstream compiler, which helps with discovering memory corruption bugs.

Compiling the Source

To use the code instrumentation capabilities of AFL, we must compile the source code with the AFL compiler wrapper afl-gcc/afl-g++ and, with an additional address sanitizer flag enabled, use the following command:

./configure CXX=afl-g++ CFLAGS=”-fsanitize=address -ggdb” CXXFLAGS=”-fsanitize=address -ggdb” LDFLAGS=”-fsanitize=address”

Below is a snapshot of the compilation process showing how the instrumentation is added to the code:

Pwntools python package comes with a good utility script, checksec, that can examine the binary security properties. Executing checksec over the library confirms the code is now ASAN instrumented. This will allow us to discover non-crashing memory access bugs as well:

Test harness is a program that will use the APIs from the library to parse the file given to the program as the command line argument. AFL will use this harness to pass its mutated files as an argument to this program, resulting in several executions per second. While writing the harness, it is extremely important to release the resources before returning to avoid excessive usage which can eventually crash the system. Our harness for parsing EMF files using APIs from the libEMF library is shown here:

AFL will also track the code coverage with every input that it passes to the program and, if the mutations result in new program states, it will add the test case to the queue. We compiled our test harness using the following command:

afl-g++ -o playemffile playemffile.c -g -O2 -D_FORTIFY_SOURCE=0 -fsanitize=address -I /usr/local/include/libEMF/ -L/usr/local/lib/libEMF -lEMF

Collecting Test Cases

While a fuzzer can learn and generate the input format even from an empty seed file, gathering the intial corpus of input files is a significant step in an effective fuzzing process and can save huge amounts of CPU cycles. Depending upon the popularity of the file format, crawling the web and downloading the initial set of input files is one of the most intuitive approaches. In this case, it is not a bad idea to manually construct the input files with a variety of EMF record structures, using vector graphic file generation libraries or Windows GDI APIs. Pyemf is one such available library with Python bindings which can be used to generate EMF files. Below shows example code of generating an EMF file with an EMR_EXTEXTOUTW record using Windows APIs. Constructing these files with the different EMR records will ensure functionally different input files, exercising different record handlers in the code.

Running the Fuzzer

Running the fuzzer is just running the afl-fuzz command with the parameters as shown below. We would need to provide the input corpus of EMF files ( -i EMFs/ ) , output directory ( -o output/ ) and the path to the harness binary with @@, meaning the fuzzer will pass the file as an argument to the binary. We also need to use -m none since the ASAN instrumented binary needs a huge amount of memory.

afl-fuzz -m none -i EMFs/ -o output/ β€” /home/targets/libemf-1.0.11/tests/playemffile @@

However, we can make multiple tweaks to the running AFL instance to increase the number of executions per second. AFL provides a persistent mode which is in-memory fuzzing. This avoids forking a new process on every run, resulting in increased speed. We can also run multiple AFL instances, one on every core, to increase the speed. Beyond this, AFL also provides a file size minimization tool that can be used to minimize the test case size. We applied some of these optimization tricks and, as we can see below, there is a dramatic increase in the execution speed reaching ~500 executions per second.

After about 3 days of fuzzing this library, we had more than 200 unique crashes, and when we triaged them we noticed 5 unique crashes. We reported these crashes to the developer of the library along with MITRE, and after being acknowledged, CVE-2020-11863, CVE-2020-11864, CVE-2020-11865, CVE-2020-11866 and CVE-2020-13999 were assigned to these vulnerabilities. Below we discuss our findings for some of these vulnerabilities.

CVE-2020-11865 – Out of Bounds Memory Access While Enumerating the EMF Stock Objects

While triaging one of the crashes produced by the fuzzer, we saw SIGSEGV (memory access violation) for one of the EMF files given as an input. When the binary is compiled with the debugging symbols enabled, ASAN uses LLVM Symbolizer to produce the symbolized stack traces. As shown below, ASAN outputs the stack trace which helps in digging into this crash further.

Looking at the crash point in the disassembly clearly indicates the out of bounds memory access in GLOBALOBJECTS::find function.

Further analyzing this crash, it turned out that the vulnerability was in accessing of the global object vector which had pointers to stock objects. Stock objects are primarily logical graphics objects that can be used in graphics operations. Each of the stock objects used to perform graphical operations have their higher order bit set, as shown below from the MS documentation. During the metafile processing, the index of the relevant stock object can be determined by masking the higher order bit and then using that index to access the pointer to the stock object. Metafile processing code tries to retrieve the pointer from the global object vector by attempting to access the index after masking the higher order bit, as seen just above the crash point instruction, but does not check the size of the global object vector before accessing the index, leading to out of bounds vector access while processing a crafted EMF file.

Shown below is the vulnerable and fixed code where the vector size check was added:

CVE-2020-13999 – Signed Integer Overflow While Processing EMR_SCALEVIEWPORTEX Record

Another crash in the code that we triaged turned out to be a signed integer overflow condition while processing an EMR_SCALEVIEWPORTEXT record in the metafile. This record specifies the viewport in the current device context and is calculated by computing ratios. An EMR_SCALEVIEWPORTEXTEX record looks like this, as per the record specification. A new viewport is calculated as shown below:

As part of AFL’s binary mutation strategy, it applies a deterministic approach where certain hardcoded sets of integers replace the existing data. Some of these are MAX_INT, MAX_INT-1, MIN_INT, etc., which increases the likelihood of triggering edge conditions while the application processes binary data. One such mutation done by AFL in the EMF record structure is shown below:

This resulted in the following crash while performing the division operation.

Below we see how this condition, eventually leading to a denial-of-service, was fixed by adding division overflow checks in the code:

CVE-2020-11864 – Memory Leaks While Processing Multiple Metafile Records

Leak Sanitizer (LSAN) is yet another important tool which is integrated with the ASAN and can be used to detect runtime memory leaks. LSAN can also be used in standalone mode without ASAN. While triaging generated crashes, we noticed several memory leaks while processing multiple EMF record structures. One of them is as shown below while processing the EXTTEXTOUTA metafile record, which was later fixed in the code by releasing the memory buffer when there are exceptions reading the corrupted metafiles.

Apparently, memory leaks can lead to excessive resource usage in the system when the memory is not freed after it is no longer needed. This eventually leads to the denial-of-service. We found memory leak issues while libEMF processed several such metafile records. The same nature of fix, releasing the memory buffer, was applied to all the vulnerable processing code:

Additionally, we also reported multiple use-after-free conditions and denial-of-service issues which were eventually fixed in the newer version of the library released here.

Conclusion

Fuzzing is an important process and fundamental to testing the quality of a software product. The process becomes critical, especially when using third-party libraries in a product which may come with exploitable vulnerabilities. Auditing them for security issues is crucial. We believe the vulnerabilities that we reported are just the tip of the iceberg. There are several legacy libraries which likely require a thorough audit. Our research continues with several other similar Windows and Linux libraries and we will continue to report vulnerabilities through our coordinated disclosure process. We believe this also highlights that it is critical to maintain a good level of collaboration between vulnerability researchers and the open source community to have these issues reported and fixed in a timely fashion. Additionally, modern compilers come with multiple code instrumentation tools which can help detect a variety of memory corruption bugs when used early in the development cycle. Using these tools is recommended when auditing code for security vulnerabilities.

The post Vulnerability Discovery in Open Source Libraries Part 1: Tools of the Trade appeared first on McAfee Blog.

❌
❌