πŸ”’
❌
There are new articles available, click to refresh the page.
Before yesterdayHaboob

CVE-2020-24427: Adobe Reader CJK Codecs Memory Disclosure Vulnerability

15 March 2022 at 08:36

Overview

Over the past year, the team spent sometime looking into Adobe Acrobat. Multiple vulnerabilities were found with varying criticality. A lot of them are worth talking about. There's one specific interesting vulnerability that's worth detailing in public.

Back in 2020, the team found an interesting vulnerability that affected Adobe Reader (2020.009.20074). The bug existed while handling CJK codecs that are used to decode Japanese, Chinese and Korean scripts, namely: Shift JIS, Big5, GBK and UHC. The bug was caused by an unexpected program state during the CJK to UCS-2/UTF-161 decoding process. In this short blog, we will discuss the bug and study one execution path where it was leveraged to disclose memory to leak JavaScript object addresses and bypass ASLR.

  1. BACKGROUND

Before diving into details, let us see a typical use of the functions streamFromString() and stringFromStream() to encode and decode strings:

The function stringFromStream() expects a ReadStream object obtained by a call to streamFromString(). This object is implemented natively in C/C++. It is quite common for clients of native objects to expect certain behavior and overlook some unexpected cases. We tried to see what will happen when stringFromStream() receives an object that that satisfies the ReadStream interface but behaviors unexpectedly like retuning invalid data that can’t be decoded back using –for example– Shift JIS, and this is how the bug was initially discovered.

2. PROOF OF CONCEPT

The following JavaScript is proof of concept demonstrates the bug:

It passes an object with a read() method to stringFromStream(). This function returns invalid Shift JIS byte sequence which begins with the bytes 0xfc and 0x23. After running the code, some random memory data was dumped to the debug console which may include some recognizable strings (the output will differ on different machines):

Surprisingly, this bug does not trigger an access violation or crashes the process – we will see why. Perhaps one useful heuristic to automatically detect such bug is to measure the entropy of the function output. Typically, the output entropy will be high if we pass input with high entropy. An output with low entropy could be an indication of a memory disclosure.


3. ROOT CAUSE ANALYSIS

In order to find the root of the bug, we will trace the call of stringFromStream() which is implemented natively in the EScript.api plugin. This is a decompiled pseudocode of the function:

This function decodes the hex string returned by ReadStream’s read() and checks if the encoding is a CJK encoding – among other single-byte encodings such as Windows-1256 (Arabic). It then creates an ASText object from the encoded string using ASTextFromSizedScriptText(). The exact layout of ASText object is undocumented and we had to reverse engineer it:

The u_str field is a pointer to a Unicode UCS-2/UTF-16 encoded string, and mb_str stores the non-Unicode encoded string. ASTextFromSizedScriptText() initializes mb_str. The string mb_str points to is lazily converted to u_str only if needed.

It worth noting that ASTextFromSizedScriptText() does not validate the encoded data apart from looking for the end of the string by locating the null byte. This works fine because 0x00 maps to the same codepoint in all the supported encodings as they are all supersets2 of ASCII and no multibyte codepoint uses 0x00.

Once the ASText object is created, it is passed to create_JSValue_from_ASText() which converts the ASText object to SpiderMonkey’s string JSValue to pass it to JavaScript:

The function ASTextGetUnicode() is implemented in AcroRd32.dll lazily converts mb_str first to u_str if u_str is NULL and returns the value of u_str:

The function we named convert_mb_to_unicode() is where the conversion happens. It is referenced by many functions to perform the lazy conversion:

The initial call to Host2UCS() computes the size of the buffer required to perform the decoding. Then, it allocates memory, calls Host2UCS() again for the actual decoding and terminates the decoded string. The function change_u_endianness() swaps the byte order of the decoded data. We need to keep this in mind for exploitation.

The initial call to Host2UCS() computes the size of the buffer needed for decoding:

First, Host2UCS() calls MultiByteToWideChar() to get the size of the buffer required for decoding with the flag MB_ERR_INVALID_CHARS set. This flag makes MultiByteToWideChar() fails if it encountered invalid byte sequence. This call will fail with our invalid input data. Next, it calls MultiByteToWideChar() again but without this flag. Which means the function will successfully return to convert_mb_to_unicode().

When the first call to Host2UCS() returns, convert_mb_to_unicode() allocates the buffer and calls Host2UCS() again for the actual decoding. In this call, Host2UCS() will try to decode the data with MultiByteToWideChar() again with the flag MB_ERR_INVALID_CHARS set, and this will fail as we have seen earlier.

This time it will not call MultiByteToWideChar() again because the u_str_size is not zero and the if condition is not met. This makes Adobe Reader falls back to its own decoder:

Initially, it calls PDEncConvAcquire() to allocate a buffer for holding the context data required for decoding. Then it calls PDEncConvSetEncToUCS() which looks up the character map for the codec. However, this call always fails and returns zero. Which means that the call to PDEncConvXLateString() is never reached and the function will return with u_str uninitialized.

The failing function, PDEncConvSetEncToUCS(), initially maps the codepage number to the name of Adobe Reader character map in the global array CJK_to_UC2_charmaps. For example, Shift JIS maps to 90ms-RKSJ-UCS2:

Once the character map name is resolved, it passes the character map name to sub_6079CCB6():

The function sub_6079CCB6() calls PDReadCMapResource() with the character map name as an argument inside an exception handler.

The function PDReadCMapResource() is where the exception is triggered. This function fetches a relatively large data structure stored in the current thread's local storage area:

It checks for a dictionary within this structure and creates one if it does not exist. Then, it checks for a STL-like vector and creates it too if it does not exist. This dictionary stores the decoder data and it entries are looked up by the character map name ASAtom atom string – 90ms-RKSJ-UCS2 in our case. The vector stores the names of the character maps as an ASAtom.

The code that follows is where the exception is triggered:

It looks up the dictionary using the character map name. If the character map is not in the dictionary, it is not expected to be in the vector too, otherwise it will trigger an exception. In our case, the character map 90ms-RKSJ-UCS2

– atom 0x1366 – is not in the dictionary so ASDictionaryFind() returns NULL. However, if we dumped the vector, we will find it there and this is what causes the exception:

Conclusion

In conclusion, we've demonstrated how we analyzed and root-caused the vulnerability in detail by reversing the code.
Encodings are generally hard to implement for developers. The constant need for encoders and encodings makes them a ripe area for vulnerability research as every format has its own encoders.

That’s it for today, hope you enjoyed the analysis. As always, happy hunting!


Disclose Timeline

10 – 8 – 2020 – Vulnerability reported to vendor.
31 – 10 – 2020 – Vendor confirms the vulnerability.
3 – 11 – 2020 – Vendor issues CVE-2020-24427 for the vulnerability.

CVE-2020-24427: Adobe Reader CJK Codecs Memory Disclosure Vulnerability

Sanding the 64-bit-Acrobat’s Sandbox

1 September 2022 at 11:56

Introduction

Through out the years, Adobe invested significantly in Acrobat’s security. One of their main security improvements was introducing sandboxing to Acrobat (Reader / Acrobat Pro).

No one can deny the significance of the sandbox introduced. It definitely made things more challenging from an attacker perspective. The sandbox itself is a big hurdle to bypass, thus forcing the attackers to jump directly to the kernel instead of looking for vulnerabilities in the sandbox.

The sandbox itself is nice challenge to tackle.

In a previous post, we covered how to enumerate the broker functions in the 32-bit version of Acrobat/Reader. Since the 64-bit version is out and about, we decided to migrate the scripts we wrote to enumerate the broker functions on the 64-bit version of Acrobat. Throughout this blog post, we’ll discuss how the migration went, hurdles we faced and the final outcome. We’ll also cover how we ported the 32-bit version of Sander, a tool used to communicate with the broker to 64-bit.

If you’d like to review the previous post please refer to our blog: Hunting adobe broker functions

Finding the Differences Between Adobe Reader and Acrobat

To make our IDAPython script operate on a 64-bit Acrobat version, we needed to verify the changes between 64-bit and 32-bit versions in IDA. Since we know that there is a broker function that calls β€œeula.exe”, we can start looking through strings for that specific function.

We can xreference that string to get to the broker function that is responsible for calling eula.exe, which we can then xreference to get to the functions database.

Β 

Here we see that the database, its very different than what we’re used to, when we first saw this, we had more questions than answers!

Where are the arguments?

Where is the function tag?

We knew the tags and arguments were in the rdata section, so we decided to skim through it for a similar structure (there's got to be a better way), (tag, function call, args). While skimming through the rdata section, we kept noticing the same bytes that were bundled and defined as 'xmmword' in the 32-bit version, so we decided to use our "cleaning()" function to undefine them.

Things began to make more sense after the packaged instructions were undefined.

Since the _text,### line appears to be pointing to a function, let's try to convert it to a QWORD since it's a 64-bit executable.

VOALLA! This appears to be exactly what we're looking for, a function pointer, a tag, and some arguments! To refresh our thoughts, The structure was made up of 1-byte tag, 52-bytes of arguments, and a function offset. Let's examine if it has the same structure.

We can see that the difference with the arguments is 4-bytes using simple math, and the structure in the 64-bit Acrobat is as follows: The tag is 1-byte long, the parameters are 56-bytes long, and the function offset is a QWORD rather than a DWORD.




Migrating our 32-bit IDAPython Script to 64-bit

We can now return to our IDAPython script from the previous blog and begin updating it using our new discoveries.

The first difference we notice is that the database for the functions appears to be different. We'll need to convert the byte that contains _text,### to QWORD, which should be simple to do using the IDAPyton create_qword(addr) function.



We're basically walking through the entire rdata section here. If we see '_text,140' on a line in the rdata section, we convert it to QWORD (140 because our base address for the executable in ida starts with 140).

The next difference is that the arguments are 56-bytes rather than 52-bytes. Since we already have the logic, all we need to do is modify the loop check condition from 52 to 56-bytes and the if condition to 57, which simply checks if the 57th byte instruction is a function pointer.


Sander 32-bit

To fuzz Adobe Reader or Acrobat, we need a tool that communicates with the broker to call a specified function. Bryan Alexander created a tool called sander for this purpose, which he mentions in his blogΒ digging the adobe sandbox internals, but the problem is that the utility only works on the 32-bit version of Acrobat. We wanted to use the tool to fuzz the 64-bit version, thus we had to upgrade the sander tool to allow it to call the 64-bit-Acrobat functions.

The tool has 5 options, one to monitor IPC calls, dump all channels, trigger a test call, and capture IPC traffic.

The tool calls functions from the broker directly. We also built another method to initiate IPC calls from the of the renderer, which we won't go into the details in this blog.

We'll try to go over all the steps of how we went from a 32-bit to a 64-bit version.

Upgrading Sander to 64-bit

The sander was written in C++, and the first function was used to start the process and locate the shared memory map containing the function call channels.

The find_memory_map method simply scans the process memory for shared memory. Because the dwMap variable was DWORD, we had to convert it to DWORD64 to store 64-bit addresses.

To be able to contain 64-bit addresses, we had to change current address from UINT to SIZE_T, the return type from UINT to DWORD64, and memory block information casting from UINT to SIZE_T in the find memory map method.

Following the execution of this function, the correct shared memory containing the channels will be returned.

The next step is to build an object that holds all of the channels' data. How many channels are there, is the channel busy or not, what kind of information is on it, and so on.

Those methods that read the structures have a lot of offsets, which are likely to change a lot with the 64-bit version, so we'll need to run Acrobat and look at the structures to see what offsets have changed.

Setting a breakpoint after the find_memory_map function call in Visual Studio will tell us the shared memory address we need to investigate.

Here we can see the shared memory as well as all of the data we'll need to finish our job.

In this code, it just sets certain variables in the object because dwSharedMem was DWORD, we also had to modify it to DWORD64.

Digging inside the Unpack() function, we can see some offsets

The first is channel_count, which does not require any changes because it is the initial four bytes of shared memory.

The offset of the first channel in memory is stored in dwAddr, which we altered from 0x8 to 0x10 because in the 32 bit version all information was stored as 4 bytes each. However in the 64 bit version, all information are stored as 8 bytes each.

Let's have a look at the channel control now. The Unpack function retrieves information about each channel and stores it in its own object.



The state has been changed to 0x8, the ping event has been changed to 0x10, the pong event has been changed to 0x18, and the ipc tag has been changed to 0x18. This was simple because all 32-bit values were converted to 64-bit.

lets now check the crosscallparams Unpack function which retrieves argument information:

This is the channel buffer memory layout; there are 5 arguments, type 1 for the first argument, offset b0 for the first argument, and size 42.

Inside the loop, we go over all parameters in this channel and extract their information. We may jump to the first parameter type by using offset 0x68 from the beginning of channel_buffer, the size is channel_buffer + 8, and the offset is channel_buffer + 4. We'll keep multiplying I to 0xc (12 bytes) in each loop to go over all the parameters because each parameter information is 12 bytes.

Finally, using ReadProcessMemory and the offset of that specific buffer, we read the parameter buffer.

We won't go over every change we made to the 64-bit version, but basically, we compare the memory layout of the 64-bit version to the 32-bit version and make the necessary changes. We did the same thing with the pack functions, which are the contrary of unpack in that instead of reading information from the memory, we write our own tag and function information to the memory and then signal the broker function to trigger a specific function.

As a test, we triggered the Update Acrobat function with the tag "0xbf". Thanks to our IDAPython script, we know how many parameters it requires and what type of parameter it accepts.

Conclusion

We can now proceed with a fuzzing strategy to find bugs in the Acrobat sandbox.

This is only the first step. Stay tuned for more posts about how we ended up fuzzing the Acrobat Sandbox.

Until then, happy hunting!

Permalink

Sanding the 64-bit-Acrobat’s Sandbox

  • There are no more articles
❌