Exploring Acrobat’s DDE attack surface



Adobe Acrobat have been our favorite target to poke at for bugs lately, knowing that it's one of the most popular and most versatile PDF readers available. In our previous research, we've been hammering Adobe Acrobat's JavaScript APIs by writing dharma grammars and testing them against Acrobat. As we continue investigating those APIs, we decided as a change of scenery to look into other features Adobe Acrobat has provided. Even though it has a rich attack surface yet we had to find which parts would be a good place to start looking for bugs.

While looking at the broker functions, we noticed that there’s a function that’s accessible through the renderer that triggers DDE calls. That by itself was a reason for us to start looking into the DDE component of Acrobat.

In this blog we'll dive into some of Adobe Acrobat attack surface starting with DDE within adobe using Adobe IAC.

DDE in Acrobat

To understand how DDE works let's first introduce the concept of inter-process communication (IPC).

So, what is IPC? It's a mechanism for processes to communicate with each other provided by the operating system. It could be that one process informs another about an event that has occurred, or it could be managing shared data between processes. In order for these processes to understand each other they have to agree on certain communication approach/protocol. There are several IPC mechanisms supported by windows such as: mailslots, pipes, DDE ... etc.

In Adobe Acrobat DDE is supported through Acrobat IAC which we will discuss later in this blog.

What is DDE?

In short DDE stands for Dynamic Data exchange which is a message-based protocol that is used for sending messages and transferring data between one process to another using shared memory.

In each inter-process communication with DDE, a client and a server engage in a conversation.

A DDE conversation is established using uniquely defined strings as follows:

  • Service name: a unique string defined by the application that implements the DDE server which will be used by both DDE Client and DDE server to initialize the communication.

  • Topic name: is a string that identifies a logical data context.

  • Item name: is a string that identifies a unit of data a server can pass to a client during a transaction.


DDE shares these strings by using it's Global Atom Table. For more details about Atoms. Also, DDE protocol defines how applications should use the wPram and lParam parameters to pass larger data pieces through shared memory handles and global atoms.

When is DDE used?


It is most appropriate for data exchanges that do not require ongoing user interaction. An application using DDE provides a way for the user to exchange data between the two applications. However, once the transfer is established, the applications continue to exchange data without further user intervention as in socket communication.

The ability to use DDE in an application running on windows can be added through DDMEL.

Introducing DDEML

The Dynamic Data Exchange Management Library DDEML by windows makes it easier to add DDE support to an application by providing an interface to simplify managing DDE conversations. Meaning that instead of sending, posting, and processing DDE messages directly, an application can use the DDEML functions to manage DDE conversations.

So, usually the following steps will happen when a DDE client wants to start conversation with the Server:


  1. Initialization

Before calling a DDE functionwe need to register our application with DDEML and specify the transaction filter flags for the callback function, the following functions used for the initialization part:

  •  DdeInitializeW()

  • DdeInitializeA()


    Note: "A" used to indicate "ANSI" A Unicode version with the letter "W" used to indicate "wide"


2. Establishing a Connection

In order to connect our client to a DDE Server we must use the Service and Topic names associated with the application. The following function will return a handle to our connection which will be used later for data transactions and connection termination:

  • DdeConnect()


3. Data Transaction

In order to send data from DDE client to DDE server we need to call the following function:

  • DdeClientTransaction()

4. Connection Termination

DDEML provides a function for terminating any DDE conversations and freeing any DDEML resources related:

  • DdeUninitialize()

Acrobat IAC

As we discussed before about Acrobat, Inter Application Communication (IAC) allows an external application to control and manipulate a PDF file inside Adobe Acrobat using several methods such as OLE and DDE.

For example, let's say you want to merge two PDF documents into one and save that document with a different name, what do we need to achieve that ?

  1. Obviously we need adobe acrobat DC pro .

  2. The service, topic names for acrobat.

    • Topic name is "Control"

    • Service Name:

      • AcroViewA21" here "A" means Acrobat and "21" refer to the version.

      • "AcroViewR21" here "R" for Reader.

    So, to retrieve the service name for your installation based on the product and the version you can check the registry key:

What is the item we are going to use ?

When we attempt to send a DDE command to the server implemented in acrobat the item will be NULL.

Acrobat Adobe Reader DC supports several DDE messages, but some of these messages require Adobe Acrobat Adobe DC Pro version in order to work.

The format of the message should be between brackets and it's case sensitive. e.g:

  • Displaying document: such as "[FileOpen()]" and "[DocOpen()]".

  • Saving and printing documents: such as "[DocSave()]" and "[DocPrint()]".

  • Searching document: such as "[DocFind()]".

  • Manipulating document such as: "[DocInsertPage()]" and "[DocDeletePages()]".

    Note: that in order to use Acrobat Adobe DDE messages that start with Doc, the file must be opened using [DocOpen()] message.

We started by defining Service and topic names for Adobe Acrobat and the DDE messages we want to send. In our case, we want to merge two Documents into one so we need three DDE methods "[DocOpen()]" , "[DocInsertPages()]" and "[DocSaveAs()]":

 Next, as we discussed before, we first need to register our application to DDEML using DdeInitialize():

After the initialization step we have to connect to the DDE server using Service and Topic that we defined earlier:

Now we need to send our message using DdeClientTransaction() and as we can see we used XTYPE_EXECUTE with NULL Item, and our command is stored in HDDEDATA handle by calling DdeCreateDataHandle(). After executing this part of code, Adobe Acrobat will open the PDF document and append the other document to it, and save it as new file then exit Adobe Acrobat:

The last part is closing the connection and cleaning the opened handles:

So we decided to take a look at adobe plugins to see who else is implementing DDE Server by searching for DdeInitilaize() call:

Great 😈 it seems we got five plugins that implement a DDE service, before we analyzing these plugins we went to search for more info about them and we found that the search and catalog plug-ins are documented by Adobe... good what next!


Search Plug-in

We started to read about the search plug-in and we summarized it in the following:

Acrobat has a feature which allows the user to search for a text inside PDF document. But we already mentioned a DDE method called DocFind() right? well, DocFind() will search the PDF document page by page while the search plug-in will perform an indexed search that allows to search a word in the form of a query, so in other word we can search a cataloged PDF 🙂.

So basically the search plug-in allows the client to send search queries and manipulate indexes.

When implementing a client that communicates with the search plug-in the service name and topic's name will be "Acrobat Search" instead of "Acroview".


Remember when we send a DDE request to Adobe Acrobat, the item was NULL, but in search plugin there are two types of items the client can use to submit a query data and one item for manipulating the index:


  • SimpleQuery item: Allows the user to send a query that support Boolean operation e.g if we want to search for  any occurrence of word "bye" or "hello" we can send "bye OR hello".

  • Query item: this allow different search query and we can specify the parser handling the query.


While the item name used to manipulate indexes is "Index” , the DDE transaction type will be "XTYPE_POKE" which is a single poke transaction.

So, we started by manipulating indexes. When we attempt to do an operation on indexes the data must be in the following form:

Where eAction represents the action to be made on the index:

  • Adding index

  • Deleting index

  • Enabling or Disabling index on the shelf.


The cbData[0] will store the index file path we want to do an action on - example: “C:\\XD\\test.pdx” and PDX file is an index file that is create by one or multiple IDX files.


So, we started analyzing the function responsible for handling the structure data sent by the client, and turned out there are no check on what data sent.

As we can see after calling DdeAccessData(), the EAX register will storea  pointer to our data and we can see it access whatever data at offset 4 . So if we want to trigger an access violation at "movsx eax,word ptr [ecx+4]" simply send a two byte string which result in Out-Of-Bound Read 🙂 as demonstrated in the following crash:


Catalog Plug-in

Acrobat DC has a feature that allows the user to create a full-text index file for one or multiple PDF documents that will be searchable using the search command. The file extension is PDX. It will store the text of all specified PDF documents.

Catalog Plug-in support several DDE methods such as:

  • [FileOpen(full path)] : Used to open an index file and display the edit index dialog box, the file name must end with PDX extension.

  • [FilePurge(full path)]:  Used to purge index definition file. The file name also must end with PDX extension.


The Topic name for Catalog is "Control" and the service name according to adobe documentation is "Acrobat", however if we check the registry key belonging to adobe catalog we can see that is "Acrocat" (meoww) instead of "Acrobat".

Using IDApro we can see the DDE methods that catalog plugin support along with Service and Topic names:



Since there are several DDE methods that we can send to the catalog plugin and these DDE methods accept one argument (except for "App related methods") which is a path to a file,  we started analyzing the function responsible for handling this argument and turned out 🙂:


The function will check the start of the string (supplied argument) for \xFE\xFF, if it's there then call Bug() function which will read the string as Unicode string, otherwise it will call sub_22007210() which will read the string as ANSI string.

So, if we can send "\xFE\xFF" or byte order mask at the start of ASCII string then probably we will end up with Out-of-bound Read since it will look for Unicode NULL terminator which is "\x00\x00" instead of ASCII NULL terminator.

We can see here the function handling Unicode string :


And 😎:

Here we can see a snippet of the POC:


That’s it for today. Stay tuned for more new attack surfaces blogs!

Happy Hunting!


Introduction to Dharma - Part 2 - Making Dharma More User-Friendly using WebAssembly as a Case-Study

In the first part of our Dharma blogpost, we utilized Dharma to write grammar files to fuzz Adobe Acrobat JavaScript API's. Learning how to generate JavaScript code using Dharma opened a whole new area of research for us. In theory, we can target anything that uses JavaScript. According to the 2020 Stack Overflow Developer Survey, JavaScript sits comfortably in the #1 rank spot of being the most commonly used language in the world:

In this blogpost, we'll focus more on fuzzing WebAssembly API's in Chrome. To start with WebAssembly, we went and read the documentation provided by MDN.

We'll start by walking through the basics and getting familiarized with the idea of WebAssembly and how it works with browsers. WebAssembly helps to resolve many issues by using pre-compiled code that gets executed directly, running at near native speed.

After we had the basic idea of WebAssembly and its uses, we started building some simple applications (Hello World!, Calculator, ..), by doing that, we started to get more comfortable with WebAssembly's APIs, syntax and semantics.

Now we can start thinking about fuzzing WebAssembly.

If we break a WebAssembly Application down, we'll notice that its made of three components:

  1. Pure JavaScript Code.

  2. WebAssembly APIs.

  3. WebAssembly Module.

Since we're trying to fuzz everything under the sun, we'll start with the first two components and then tackle the third one later.

JavaScript & WebAssembly API

This part contains a lot of JavaScript code. We need to pay attention to the syntactical part of the language or we'll end up getting logical and syntax errors that are just a headache to deal with. The best way to minimize errors, and easily generate syntactically (and hopefully logically) correct JavaScript code is using a grammar-based text generation tool, such as Domato or Dharma.

To start, we went to MDN and pulled all the WebAssembly APIs. Then we built a Dharma logic for each API. While doing so, we faced a lot of issues that could slow down or ruin our fuzzer. That said, we'll go over these issues later on in this blog.

To instantiate a WebAssembly module, we have to use WebAssembly.instantiate function, which takes a module (pre-compiled WebAssembly module) and optionally a buffer, here's how it looks as a JavaScript code:

The process is simple, we will'll have to test-try the code, understand how it works and then build Dharma logics for it. The same process applies to all the APIs. As a result, the function above can be translated to the following in Dharma:

The output should be similar to the following:

What we're trying to achieve is covering all possible arguments for that given function.

On a side note: The complexity and length of the Dharma file dramatically increased ever since we started working on this project. Thus, we decided to give code snippets rather than the whole code for brevity.

Coding Style

We had to follow a certain coding style during our journey in writing Dharma files for WebAssembly for different reasons.

First, in order to differentiate our logic from Dharma logic - Dharma provides a common.dg file which you can find in the following path: dharma/grammars/common.dg . This file contains helpful logic, such as digit which will give you a number between 0-9, and short_int which will give you a number between 0-65535. This file is useful but generic and sometimes we need something more specific to our logic. That said, we ended up creating our own logic:

We also decided to go with different naming conventions, so we can utilize the auto-complete feature of our text editor. Dharma uses snake_case for naming, we decided to go with Camel Case naming instead.

Also, for our coding style, we decided to use some sort of prefix and postfix to annotate the logic. Let's take variables for example, we start any variable with var followed by class or function name:

This is will make it easy to use later and would make it easier to understand in general.

We applied the same concept for parameters as well. We start with the function's name followed by Param as a naming convention:

Since we're mentioning parameters, let's go over an example of an idea we mentioned earlier. If a function has one or more optional parameters, we create a section for it to cover all the possibilities:

Therefor our coding style, we used comments to divide the file into sections so we can group and reach a certain function easily:

That said, you can easily find certain functions or parameters under its related section. This is a fairly good solution to make the file more manageable. At a certain point you have to make a file for each section, and group shared logic on an abstract shared file so you eliminate the duplication - maybe we'll talk about this on another blog (maybe not xD).

Testing and validation

After we finish the first version of our Dharma logic file we ran it, and noticed a lot of JavaScript logical errors. Small mistakes that we make normally do, like forgetting a bracket or a comma etc.. To solve these error we created a builder section were we build our logic there:

We had to go through each line one by one to eliminate all the possible logical errors. We also created a wrapper function that wraps the code with try-catch blocks:

By doing so, we made it much easier to isolate and test the possible output.

While we were working on the Dharma logic file we faced another issue. When you want your JavaScript to import something from the .wasm(eg. a table or a memory buffer) you have to provide it from the .wasm module. For that, we ended up making many modules that provide whatever we import from generated JS logic, and export whatever we import from .wasm modules. In brief, to do that we built a lot of .wasm modules, each one exports or imports what JavaScript needs to test an API. An example of this logic:

For that to work, you need the following .wasm file:

So if JavaScript is looking for the main function you should have a main function inside your .wasm module. Also, as we mentioned, there are many things to check like import/export table, import/export buffer, functions, and global variables. We'll have to combine many of them together, but some of them we couldn't like tables. You can only have one on your program either exported or imported. That said, we had to separate them into different modules and avoid some of them to reduce complexity.

After finishing our first version, we went to the chromium bug tracker which appears to be a great place to expand our logic to find more smart, complex tips and tricks. We used some of the snippets there as it is, and some of them with little modification. Also it's worth mentioning that, when you search you should apply the filter that is related to your area of interest. In our case we looked into all bugs that have Type of 'Bug-Security' and the component is Blink>JavaScript>WebAssembly, you can use this line on the search bar.

While we were reading these issues on the bug tracker, we found this bug that could be produced by our Dharma logic (if we were a bit faster xD)

WebAssembly Module

Now that we're done fuzzing the first two components, we can move on to the last component of WebAssembly, which is the module.

Everything that we did earlier was related to fuzzing the APIs and JavaScript's grammar, but we found two interesting functions used to compile and ensure the validity of that module, compile and validate functions. Both of these two function receive a .wasm module. The first function compiles WebAssembly binary code into a WebAssembly module, the second function returns whether the bytes from a .wasm module are valid (true) or not (false).

For both compile and validate, we made a .wasm corpus (by building or collecting), then we used Radamsa to mutate the binary of these files before we imported them from our two functions.

We improved the mutation by skipping the first part of the .wasm module which contains the header of the file (magic number and version), and start to mutate the actual wat instructions.

Stay tuned for the final part of our Dharma blog series, where we implement more advanced grammar files. Happy Hunting!!

Introduction to Dharma - Part 1

While targeting Adobe Acrobat JavaScript APIs, we were not only focusing on performance and the number of cases generated per second, but also on effective generation of valid inputs that cover different functionalities and uncover new vulnerabilities.

Obtaining inputs from mutational-based input generators helped us in quickly generating random inputs; however due to the randomness of the mutations that were generated, great majority of that input was invalid.

So, we utilized a well-known grammar-based input generator called Dharma to produce inputs that are semantically reasonable and follow the syntactic rules of JavaScript.

In this blog post, we will explain what Dharma is, how to set it up and finally demonstrate how to use it to generate valid Adobe Acrobat JavaScript API calls which can be wrapped in PDF file format.

So, What is dharma?

Dharma was created by Mozilla in 2015. It's a tool used to create test cases for fuzzing of structured text inputs, such as markup and script. Dharma takes a custom high-level grammar format as input and produces random well-formed test cases as output.

Dharma can be installed from the following GitHub repo.

Why use Dharma?

By using Dharma, a fuzzer can generate inputs that are valid according to that grammar requirements. To generate an input using Dharma, the input model must be stated. It will be difficult to write a grammar files for a model that is proprietary, unknown, or very complex.

However, we do have knowledge of APIs and objects that we're targeting, by using the publicly available JavaScript API documentation provided by Adobe.

How to use Dharma?

Using dharma is straight forward, it takes a grammar file with dg extension and starts generating random inputs based on the grammar file that is provided.

A grammar file generally needs to contain 3 sections, and they are:

  1. Value

  2. Variable

  3. Variance

Note that the Variable section is not mandatory. Each section has a purpose and specifications,

The syntax to declare a section: %section% := section

  • The "value" section is where we define values that are interchangeable - think of it as an OR/SWITCH.

a value can be referenced in the grammar file using +value+, for example +cName+.

  • The "variable" section is where we define variables to be used as a context to be used in generating different code.

a variable can be referenced from the value section by using two exclamation marks

  • The "variance" section is where we put everything together.

if we run the previous example of the three sections, one of the generated files will be similar to the following JS code:

Building Grammar Files

In this section we'll walk through an example of how to build a grammar file based on a real life scenario. We will try to build a grammar file for the Thermometer object from Adobe javascript documentation.

%section% := variable

The Thermometer objects can be referenced through "app.thermometer" - which is the first thing we need to implement:

The easiest way to get a reference to the Thermometer object is from the app object (app.therometer):

%section% := value

Looking at the documentation of the Thermometer object, we can see that it has four properties:

We need to assign values properties based on their types.

In this case, the cancelled property's type is a boolean, Duration is number, text is a string and the value property is a number. That said, we'll have to implement getters and setters for these properties. The setter implementation should look similar to the following:

Now that we have implemented setters for the properties, Dharma will pick random setter definition from the defined therometer_setters.

For the value property, it will set a random number using +common:number+, a random character for the text property using +common:character+, a random number from 0 to 10000 for the duration property and a Boolean value for the cancelled property using +common:bool+.

Those values were referenced from a grammar file shipped with dharma called common.dg.

We're now done with the setters, next up is implementing the getters which is fairly easy. We can create a value with all the properties, and then another value to pick a random property from thermometer_properties:

In the above grammar we used x+common:digit+ to generate random JavaScript variables to store the properties values in it, for example, x1, x2, x3, …etc.

We're officially done with properties. Next we'll have to implement the methods. The Thermometer object has two methods - begin and end. Luckily, those two functions do not require any arguments passed:

We have everything implemented. One last thing we need to implement in the value section is the wrapper. The wrapper simply try/catch's the code generated:

Finally the variance section - which invokes the wrapper from the main:

%section% := variance

Putting it all together:

Running our grammar file, generates the following output:

The generated JS code can be then embedded into PDF files for testing. Or we can dump the generated code to a JS file by using ">>" from the cmd

Now let's move on to a more complex example - building a grammar file for the spell object.

We will use the same methodology we used above, starting with implementing getters/setters for the properties followed by implementing the methods. Looking at the documentation of the spell object properties:

%section% := value

Note that we will constantly use +fuzzlogics+ keyword, which is a reference from another grammar file that our fuzzer will use to place some random values.

In this case, we'll make the getter/setter implementation simpler. We'll have the setter set random values to any properties regardless of the type. The getter is almost the same as the example above:

Now we're going to implement the methods. To avoid spoiling the fun for you, we'll not implement all the methods in the spell object, just a few for demonstration purposes :)

These are all the methods for the spell object, each method takes a certain number of arguments with different types, so we need a value for each method. Let's start with spell.addDictionary() arguments:

Looking at addDictionary method, it takes three arguments, cFile, cName and bShow. The last argument (bShow) is optional, so we implemented two logics for addDictionary arguments to cover as many scenarios as we can. One with all three arguments and another with only two arguments since the last one is optional.

For the cFile argument, we're referencing an ASCII Unicode value from the fuzzlogics.dg (the dictionary we customly implemented for this purpose).

Now let's implement the spell.check() arguments.

spell.check() function takes two optional arguments, aDomain and aDictionary. So we can either pass aDomain only, aDictionary only, both or no arguments at all.

The first logic "{}" is no argument, the second one is both aDictionary and aDomain, the third one is aDomain, the last one is aDictionary only.

The same methodology is used for the rest of the methods, so we're not going to cover all available methods. The last thing we need to implement is the wrapper:

As we mentioned earlier, the wrapper is used to wrap everything between a try/catch so that any error would be suppressed. Finally, the variance section:

In the next part we will expand further into Dharma, focusing on a specific case study where Dharma was essential to the process of vulnerability discovery. Hopefully this introduction catches you up to speed with grammar fuzzing and its inner workings.

As always, happy hunting :)

Chrome Exploitation: An old but good case-study

Since the announcement of the @hack event, one of the world’s largest infosec conferences which will start during Riyadh Season, Haboob’s R&D team submitted 3 talks. All of them got accepted.

One topic in particular is of interest for a lot of vulnerability researchers - browsers exploitation in general, and Chrome exploitation in particular. That said, we decided to present a Chrome exploitation talk which focuses on case-studies we’ve been working on. A generation-to-generation compression on the different era’s chrome exploitation has gone through. Throughout our research, we go through multiple components and analyse whether the techniques and concepts to develop exploits on Chrome has changed.

One of the vulnerabilities that we looked into, dates back to 2017. This vulnerability was demonstrated at Pwn2Own, specifically CVE-2017-15399. The bug existed in Chrome version 62.0.3202.62.

That said, let’s start digging into the bug.

But before we actually start, let's have a sneak-peak at the bug! The bug occurred in V8 Webassembly, the submitted POC:

Root Cause Analysis:

Running the POC on the vulnerable V8 engine triggers the crash, we can observe the crash context:

To accurately deduce which function triggered the crash, we print the stack trace at the time of the crash:

We noticed that the last four function calls inside the stack were not part of Chrome or any of its loaded modules.

So far, we can notice two interesting things, first, the instruction that triggered the bug was accessed on an address that is not mapped into the process. Which could mean that its part of JavaScript Ignition Engine. Secondly, the same address that triggered the crash is hardcoded inside of the Opcode itself:

These function calls were made from two RWX pages and got allocated during execution.

Since the POC uses ASM, the V8 compiles the asmJS module into an opcode using AOT (Ahead of Time Compilation) which is used to enhance performance. We notice that there’s hardcoded memory addresses that potentially could be what’s causing the bug.

A Quick Look Into asmJS:

For now, lets focus entirely on asmJS, and on the following snippet from the POC. We change the variables and function names in a way that could help us understand the snippet better:

The code above gets compiled into machine code using V8, its an asmjs which is basically a standard specified to browsers on how asmJS gets parsed.

When V8 parses a module that begins with use asm, it means that the rest of the code should be treated differently and then compiled into WASM (Webassembly Module). The interface for the asmJS function is:

So asmjs code, accepts three arguments:

  • stdlib: The stdlib object should contains references to a number of built-in functions to get used as the runtime.

  • foregien: used for user defined function

  • heap: heap gives you an ArrayBuffer which can be viewed through a number of different lenses, such as Int32Array and Float32Array.

In our POC the stdlib was a typed array function Uint32Array and we created heap memory using WASM memory using the following call:

memory = new WebAssembly.Memory({initial:1});

So, the complete function call should be as the following:

evil_f = module({Uint32Array:Uint32Array},{},memory.buffer);

Now, V8 will compile asmjs module using the hard-codded address of the backing store of JSArrayBuffer for memory.

JSArrayBuffer is std::shared_ptr<>  which is counting  the references but the address it self was already being compiled into an offset inside the machine code generated. So the reference isn't counted when it's a raw pointer access.

Based on wasm specs, when a memory needs to grow, it must detach the previous memory and its backing store and then free the previously allocated memory. memory.grow(1); // std::shared_ptr<BackingStore> and we can see this behaviour in the file src/wasm/

Now the HEAP pointer inside the asmjs module is invalid and pointing to a freed allocation, to trigger the access we just need to call the asmjs.

if we look inside DetachWebAssemblyMemoryBuffer we can see how it frees the backing store:

after that, if we call asmjs  module it will trigger the use after free bug.

The following comments should summarize how the use after free occurred:


To investigate further into our crash point and attempt to figure out where the hardcoded offset comes from, we tracked down the creation of WasmMemoryObject JSObject that got created in WebAssemblyMemory. Which is a C function that got called from the following javascript line.

evil_f = module({Uint32Array:Uint32Array},{},memory.buffer); // we save a hardcode address to it

We set a break point at NewArrayBuffer  which will call ShellArrayBufferAllocator::Allocate, this trace step was necessary to catch the initial created memory buffer (0x11CF0000h), afterwards we set a break on access on it (ba r1 11CF0000h) to catch any accessing attempt that will let us observe the crashing point before the use after free bug occurs.

After our on access break point was triggered, we inspected the assembly instructions around the break point. Which turned out to be the generated assembly instructions for the Asmjs f1 function in our original POC. We can see that it got compiled with Range checks to eliminate out of bounds accesses. We also noticed that the Initial memory allocation was hardcoded in the Opcode.

Executing memory.grow() will free the memory buffer but since it’s address was hardcoded inside the asmjs compiled function (dangling pointer), a use after free bug will occur. Chrome devs did not implement  a proper check in the grow process for WasmMemoryObject, They only implemented a check for WasmInstance object and since in our case is asmjs, our object was not treated as WasmInstance object and therefore did not go through the grow checks.

Now we have a clear UAF bug and we'll try to utilize it.


Since the UAF bug allocated memory falls under old space, we needed a way to control that memory region. As this is our first time exploiting such a bug, we found a lot of similarities between Adobe and Chrome in terms of exploiting concepts. But this was not an easy task since that memory management is totally different, and we had to dig deeper into the V8 engine and understand many things like JsObject anatomy for example. The plan was layout on the assumption that if we created another Wasm Instance and hijack it later for code execution is gonna work, so our plan was like the following:

  • Triggering UAF Bug.

  • Heap Spray and reclaim the freed address.

  • Smash & Find Controlled Array.

  • Achieve Read & Write Primitives.

  • Achieve Code Execution.

Triggering UAF Bug:

Triggering the bug by calling memory function Grow() for the buffer to be freed. Doing so results with the freed memory region falling under old space, this step is important to reclaim the memory and control the bug. We allocated a decent size for WasmMemory to make sure that the v8 heap will not be fragmented

Heap Spray:

Thanks to our long journey of controlling Adobe bugs, this step was easy to accomplish but the only difference is we don't require poking holes into our spray anymore, since the goal is reclaiming memory. Using JsArray and JSArrayBuffer  to spray the heap for achieving both Addrof and RW primitive later on.

Smash & Find:

In order to read forward from the initial controlled UAF memory, we first need a way to corrupt the size of our JsArrayBuffer to something huge. With the help of asmjs we can corrupt them and make a lookup function for that corrupted JsArrayBuffer index, and since we filled our spray with the number ‘4’ then it will act as our magic to search for in the  asmjs. Writing an asmjs code is really hectic because of pointer addressing but once you get used to it, it will be easy.

We implemented a lookup function to search for a magic values in the heap:

A simple lookup implementation in JS could look like this, where we are looking for the corrupted array with value 0x56565656 in our spray arrays:

Now that we have an offset to an array that can be used to store JSObjects, we can achieve addrof primitive using the asmjs AddrOf function and use it to leak the address of JSObjects to help us achieve code execution. Please consider that you may need to dig a bit deeper into an object's anatomy to understand what you really need to leak.

We implemented our addrof primitive using the following wrappers:

Achieving Read & Write Primitives:

We are missing one more thing to complete our rocket, which is RW primitives and what we really want is corrupting JsArrayBuffer’s length to give us a forward access to the memory. Since the second DWORD of JsArrayBuffer header contains the length we searched for our size (0x40) and corrupted its length with a bigger size.

Achieving Code Execution:

At last, the final stage of the launch requires two more components. First component is as an asmjs function to overwrite any provided offset and this will help us achieve a primitive write by changing the JsArrayBuffer backing store pointer to an executable memory page:

The second is a wasm instance to allocate PAGE_EXECUTE_READWRITE in v8 to be hijacked by us. A simple definition could look like this:

Putting things together with a simple calc.exe shellcode:

That’s everything, we started with a simple PoC and ended up with achieving code execution :D

Hope you enjoyed reading this post :) See you in @Hack!

Applying Fuzzing Techniques Against PDFTron: Part </a#x3E;2


In our first blog we covered the basics of how we fuzzed PDFtron using python. The results were quite interesting and yielded multiple vulnerabilities. Even with the number of the vulnerabilities we found, we were not fully satisfied. We eventually decided to take it a touch further by utilizing LibFuzzer against PDFTron.

Throughout this blog post, we will attempt to document our short journey with LibFuzzer, the successes and failures. Buckle up, here we go..


LibFuzzer is part of the LLVM package. It allows you to integrate the coverage-guided fuzzer logic into your harness. A crucial feature of LibFuzzer is its close integration with Sanitizer Coverage and bug detecting sanitizers, namely: Address Sanitizer (ASAN), Leak Sanitizer, Memory Sanitizer (MSAN), Thread Sanitizer (TSAN) and Undefined Behaviour Sanitizer (UBSAN).

The first step into integrating LibFuzzer in your project is to implement a fuzz target function – which is a function that accepts an array of bytes that will be mutated by LibFuzzer’s function (LLVMFuzzerTestOneInput):

When we integrate a harness with the function provided by LibFuzzer (LLVMFuzzerTestOneInput()), which is Libfuzzer's entry point, we can observe how LibFuzzer works internally.

Recent versions of Clang (starting from 6.0) includes LibFuzzer without having to install any dependencies. To build your harness with the integrated LibFuzzer function, use the -fsanitize=fuzzer flag during the compilation and linking. In some cases, you might want to combine LibFuzzer with AddressSanitizer (ASAN), UndefinedBehaviorSanitizer (UBSAN), or both. You can also build it with MemorySanitizer (MSAN):

In our short research, we used more options to build our harness since we targeted PDFTron, specifically to satisfy dependencies (header files etc..)

To properly benchmark our results, we decided to build the harness on both Linux and Windows.

Libfuzzer on Windows

To compile the harness, first, we need to download the LLMV package which contains the Clang compiler. To acquire a LLVM package, you can download it from the LLVM Snapshot Builds page (Windows).

Building the Harness - Windows:

To get accurate results and make the comparison fair, we targeted the same feature(s) we fuzzed during part1 (ImageExtract), which can be downloaded from here. PDFTron provides multiple implementations of their features in various programming languages, we went with the C++ implementation since our harness was developed in the same language.

When reviewing the source code sample for ImageExtract, we found the PDFDoc constructor, which by default takes the path for the PDF file we want to extract the images from. This constructor works perfectly in our custom fuzzer since our custom fuzzer was a file-based fuzzer. However, LibFuzzer is completely different since it’s an in-memory based fuzzer and it provides mutated test cases in-memory through LLVMFuzzerTestOneInput.

If PDFTron’s implementation of ImageExtract had only the option to extract an image from a PDF file in disk, we can easily workaround this constraint by using a simple trick:

dumping the test cases that LibFuzzer generated into the disk then pass it to the PDFDoc constructor.

Using this technique will reduce the overall performance of the fuzzer. You will always want to avoid using files and I/O operations as they’re the slowest. So, using such workarounds should always be a last resort.

In our search for an alternative solution (since I/O operations are lava!) we inspected the source code of the ImageExtract feature and in one of its headers we found multiple implementations for the PDFDoc constructor. One of the implementations was so perfect for us, we thought it was custom-made for our project.

The constructor accepts a buffer and its size (which will be provided by LibFuzzer). So, now we can use the new constructor in our harness without any performance penalties and minimal changes to our code.

Now all we have to do is change ImageExtract sample source code main function from accepting one argument (file path) to two arguments (buffer and size) then add the entry point function for LibFuzzer.

At this point our harness is primed and ready to be built.

Compiling and Running the Harness - Windows

Before compiling our harness, we need to provide the static library that PDFTron uses. We also need to provide PDFTron’s headers path to Clang so we can compile our harness without any issues. The options are:

  • -L : Add directory to library search path

  • -l : Name of the library

  • -I : Add directory to include search path.

The last option that we need to add is the harness fsanitize=fuzzer to enable fuzzing in our harness.

To run the harness, we need to provide the corpus folder that contains the initial test-cases that we want LibFuzzer to start mutating.

We tested the fsanitize=fuzzer,address (Address Sanitizer) option to see if our fuzzer would yield more crashes, but we realized that address sanitization was not behaving as it should under Windows. We ended up running our harness without the address sanitizer. We managed to trigger the same crashes we previously found using our custom fuzzer (part 1).

LibFuzzer on Linux

Since PDFTron also supports Linux, we decided to test run LibFuzzer on Linux so we can run our harness with the Address Sanitizer option enabled. We also targeted the same feature (ImageExtract) to avoid making any major changes. The only significant changes were the options provided during the build time.

Compiling and Running the Harness - Linux

The options that we used to compile the harness on Linux are pretty much the same as on Windows. We need to provide the headers path and the library PDFTron used:

  • -L : Add directory to library search path

  • -l : Name of the library (without .so and lib suffix)

  • -I : Add directory to the end of the list of include search paths

Now we need to add fuzzer option and the address option as an argument for -fsanitize value to enable fuzzing and the Address Sanitizer:

Our harness is now ready to roll. To keep our harness running, we had to add these two arguments on Linux:

  • -fork=1

  • -ignore_crashes=1

The -fork option allows us to spawn a concurrent child and provides it with a small random subset of the corpus.

The -ignore_crashes options allows Libfuzzer to continue running without exiting when a crash occurs.

After running our harness over a short period of time, we discovered 10 unique crashes in PDFTron.




Throughout our small research, we were able to uncover new vulnerabilities along with triggering the old ones we discovered previously.

Sadly, LibFuzzer under Windows does not seem to be fully mature yet to be used against targets like PDFTron. Nevertheless, using LibFuzzer on Linux was easy and stable.


Hope you enjoyed the short journey, until next time!

Happy hunting!


Applying Fuzzing Techniques Against PDFTron: Part 1


PDFTron SDK brings a wide variety of PDF parsing functionalities. It varies from reading and viewing PDF files to converting PDF files to different file formats. The provided SDK is widely used and supports multiple platforms, it also exposes a rich set of APIs that helps in automating PDFTron functionalities.

PDFtron was one of the targets we started looking into since we decided to investigate PDF readers and PDF convertors. Throughout this blog post, we will discuss the brief research that we did.

The blog will discuss our efforts which will break down the harnessing and fuzzing of different PDFTron functionalities.

How to Tackle the Beast: CLI vs Harnessing:

Since PDFTron provides well documented CLI’s, it was the obvious route for us to go, we considered this as a low-hanging fruit. Our initial thinking was to pick a command, try to craft random PDF files and feed them to the specific CLI, such as pdf2image. We were able to get some crashes this way, we thought it can’t get any better, right? Right???

But after a while, we wanted to take our project a step further, by developing a costume harness using their publicly available SDK.

Lucky enough, we found a great deal of projects on their website which includes small programs that were developed in C++, just ripe and ready to be compiled and executed. Each program does a very specific function, such as adding an image to a PDF file, extracting an image from a PDF file, etc.

We could easily integrate those code snippets into our project, feed them mutated inputs and monitor their execution.

For example, we harnessed the extract image functionality, but also we did minor modifications to the code by making it take two arguments:

1. The mutated file path.

2. Destination to where we want the image to be extracted.


 Following are the edited parts of PDFTron’s code:

How Does our Fuzzer Work?

We developed our own custom fuzzer that uses Radamsa as a mutator, then feed the harness the mutated files while monitoring the execution flow of the program. If and when any crash occurs, the harness will log all relative information such as the call stack and the registers state.

What makes our fuzzer generic, is that we made a config file in JSON format, that we specify as the following:

1- Mutation file format.

2- Harness path.

3- Test-cases folder path.

4- Output folder path.

5- Mutation tool path.

6- Hashes file path.

We fill these fields in based on our targeted application, so we don’t have to change our fuzzer source code for each different target.

The Components:

We divided the fuzzer into various components, each component has a specific functionality, the components of our fuzzer were:

A. Test Case Generator: Handled by Radamsa.

B. Execution Monitor: Handled by PyKd.

C. Logger: Costume built logger.

D. Duplicate Handler: Handled by !exploitable Toolkit.

We will go over each component and our way of implementing it in details in the next section.

A. Test Case Generator:

As mentioned before, we used Radamsa as our test case generator (mutation-based), so we integrated it with our fuzzer mainly due to it supporting a lot of mutation types, which saves plenty of time on reimplementing and optimizing some mutation types.

we also listed some of the mutation types that Radamsa supports and stored it in a list to get a random mutation type each time.

After generating the list, we need to place Radamsa’s full command to start generating the test cases after specifying all the arguments needed:

Now we got the test cases at the desired destination folder, each time we execute this function Radamsa generates 20 mutated files which later will be fed to the targeted application.

B. Execution Monitor:

This part is considered as the main component in our fuzzer, it contains three main stages:

1. Test case running stage.

2. Program execution stage.

3. Logging stage. 

After we prepared the mutated files, we can now test them on the selected target. In our fuzzer, we used PyKd library to execute and check the harness’ execution progress. If the harness terminates the execution normally, our fuzzer will test the next mutated file, and if our harness terminates the execution due to access valuation our fuzzer will deal with it (more details on this later).

PyKd will run the harness and will use the expHandler variable to check the status of the harness execution. The fuzzer will decide whether a crash happened to the harness or not. We create a class called ExceptionHandler which monitors the execution flow of our harness, it checks exception flag, if the value is 0xC0000005, its usually a promising bug.

If accessViolationOccured was set to true, our fuzzer will save the mutated file for us to analyze it later,  if it was set to false, that means the mutated file did not affect the harness execution and our harness will test another file.

C. Logging:

This component is crucial in any fuzzing framework. The role of the logger is to log a file that briefly details the crash and saves the mutated file that triggered the crash. Some important details you might want to include in a log:

- Assembly instruction before the crash. 

- Assembly instruction where the crash occurred.

- Registries states.

- Call stack.

After fetching all information we need from the crash, now we can write it into a log file. To avoid naming duplication problems, we saved both the test case that triggered the crash and the log file with the epoch time as their file names.

This code snippet saves the PoC that triggered the crash and creates a log file related to the crash in our disk for later analysis.


D. Duplicate Handler:

After running the fuzzer over long periods of time, we found that the same crash may occur multiple times, and it will be logged each time it happens. Making it harder for us to analyse unique crashes.  To control duplicate crashes, we used “MSEC.dll”, which is created by the Microsoft Security Engineering Center (MSEC). 

We first need to load the DLL to WinDbg.

Then we used a tool called “!exploitable”, this tool will generate a unique hash for each crash along with crash analysis and risk assessment. Each time the program crashes, we will run this tool to get the hash of the crash and compare it to the hashes we already got before. If it matches one of the hashes, we will not save a log for this crash. If it’s a unique hash, we will store the new hash with previous crash hashes we discovered before and save a log file for the new crash with it’s test case.

In the second part of this blogpost, we will discuss integrating the harness with a publicly available fuzzer and comparing the results between these two different approaches.

Stay tuned, and as always, happy hunting!

Modern Harnessing Meets In-Memory Fuzzing - PART 2


In the first part of the blog post we covered ways to harness certain SDKs along with in-memory fuzzing and how to harness applications using technologies such as TTD (Time Travel Debugging).

In this part of the blog post, we will cover some techniques that we used to uncover vulnerabilities in various products. It involves customizing WinAFL’s code for that purpose.

Buckle up, here we go..



WinAFL is a well-known fuzzer used to fuzz windows applications. It's originally a fork of AFL which was initially developed to fuzz Linux applications. Because of how instrumentation works in the Linux version, there was a need to rewrite it to work in Windows with a different engine for instrumentation. WinAFL mainly uses DynamoRIO for instrumentation, but also uses Intel PT for decoding traces to gather code coverage which is basically the instrumentation WinAFL needs.

We care about execution speed and performance, since we don't have a generative mutation engine specialized for PDF structures, we decided to go with no instrumentation since the WinAFL mutation engine works best with binary data and not text like PDF data.

Flipping a bit a million times will probably make no difference :)

WinAFL Architecture

WinAFL Intel PT’s (Processor Tracing) source rely on Windows Debugging APIs to debug and monitor the target process for crashes. Win32 Debugging APIs work with debug events that are sent from the target to the debugger. An example of such events is LOAD_DLL_DEBUG_EVENT which translates to load-dynamic-link-library (DLL) debugging event.

For a complete list of debugging events that could be triggered from the debugee (target) please check msdn documentation about Debug Event

To describe the process of WinAFL fuzzing architecture we created a simple diagram that shows the important steps that we used from WinAFL:

 1. Create a new process while also specifying that we want to debug the process. This step is achieved through calling CreateProcess API and specifying the dwCreationFlags flag with the value of `DEBUG_PROCESS`. Now the application will need to be monitored by using WaitForDebugEvent to receive debug events.

2. While listening for Debug Events in our debug loop, a LOAD_DLL_DEBUG_EVENT event is encountered which we can parse and determine if it’s our target DLL based on the image name, if so, we place a software breakpoint at the start of the Target Function.

3. If our software breakpoint gets triggered then we will be notified through a debugging event but this time it’s about an exception of type EXCEPTION_BREAKPOINT. From there, WinAFL saves the arguments based on the calling convention. In our case it’s  __stdcall so all of our argument are in the stack, we save the argument passed and context to replay them later. Winafl's way of in memory fuzzing is by overwriting the Return Address in the stack to an address that can't be allocated normally (0x0AF1). 

4. When the function returns normally it will trigger an exception violation on address 0x0AF1, WinAFL knows that this exception means that we returned from our target function and it’s time to restore the stack frame we saved before that contains argument to the target function and also restores the context of registers to its desired state that was also saved during step 3.

Customizing Winafl to target ConverterCoreLight

During our Frida section in part-1, we showcased our attack vector approach, now to automate it we modified Winafl-PT to Fit our needs:

Hardcoded configuration options used to control fuzzing.

Redirecting execution to PdfConverterConvert, saving the address of PdfConverterConvert in the configuration options to modify EIP at the restoration phase.

on_target_method gets called by the debugger engine of WinAFL when the execution reaches PdfConverterConvertEx,  Snapshotting the context depends on the calling convention. PdfConverterConvert is __stdcall which means we only care about the argument that is on the stack. Therefore, we only store the original values on the stack using read_stack wrapper and then we allocate memory in the Acrobat Process to hold the file path to our mutated input and save it on the backup we just took. We will perform the redirection when the function ends.

When the target method ends we restore the stack pointer and modify EIP to point to our target function PdfConverterConvert, we also should fix the argument order to match PdfConverterConvert like we did in our Frida POC.

Since we only used some features inside of winAFL, we decided to eliminate unnecessary features that were related to crash handling and instrumentation (Intel PT), for the purpose of increasing the overall performance of our fuzzer. We also implemented our own crash analysis that triages crashes and provides summary of each unique crash.



Modern Harnessing Meets In-Memory Fuzzing - PART 1

Fuzzing or Fuzz Testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program then observe how the program processes it.

In one of our recent projects, we were interested in fuzzing closed source applications (mostly black-box). While most standard fuzz testing mutates the program inputs which makes targeting these programs normally take lot of reverse engineering to rebuild the target features that process that input. We wanted to enhance our fuzzing process and we came across an interesting fuzzing technique where you don't need to know so much about the underlying initialization and termination of the program prior to target functions which is a tedious job in some binaries and takes a lot of time to reverse and understand. Also, that technique has the benefit of being able to start a fuzz cycle at any subroutine within the program.

So we decided to enhance our fuzzing process with another fuzzing technique, Introducing: In-Memory Fuzzing.

A nice explanation of how in-memory fuzzing works is by Emanuele Acri : "If we consider an application as “chain of function” that receives an input, parses and processes it then produces an output, we can describe in-memory fuzzing as a process that “tests only a few specific rings” of the chain (those dealing with parsing and processing)".

And based on many fuzzing articles there are two types of in-memory fuzzing:

- Mutation loop insertion where it changes the program code by creating a loop that directs execution to a function used previously in the program.

- Snapshot Restoration Mutation where the context and the arguments are saved at the beginning of the target routine then context is restored at the end of the routine to execute a new test case.

We used the second type because we wanted to execute the target function at the same program context with each fuzzing cycle.

In one of our fuzzing projects, we targeted Solid framework, we were able to harness it fully through their SDK, but we wanted to go the extra mile and fuzz Solid using Adobe Acrobat’s integration code. Acrobat uses adobe with a custom OCR and configuration than the normal SDK provide, which caught our interest to perform fuzzing through Acrobat DC directly

This blog post will introduce techniques and tools that aid in finding a fuzzing vector for a feature inside a huge application. Finding a fuzzing vector vary between applications as there is no simple way of finding fuzzing vectors. No worries, though. We got you covered. In this blogpost we’ll introduce various tools that’ll make binary analysis more enjoyable.

Roll up your sleeves and we promise you by the end of this blog post you will understand how to Harness Solid Framework as Acrobat DC uses it :)

Finding a Fuzzing Vector

Relevant code that we need to analyze

The first step is identifying the function that handles our application input. To find it, we need to analyze the arguments passed to each function and locate the controllable argument. We need to locate an argument that upon fuzzing it, it will not corrupt other structures inside the application. We implemented our fuzz logic around the file path that we can control, the file path is in the disk provided to a function that parse the content of the file.

Adobe Acrobat DC divides its code base into DLLs, which are shared library that Adobe Acrobat DC loads at run-time to call its exported functions. There are many DLLs inside Adobe Acrobat and finding a specific DLL could be troublesome. But from the previous post, we know that Solid provides its DLLs as part of their SDK deliverable. Luckily, Acrobat have a separate folder that contains Solid Framework SDK files.

Solid comprises quite a number of DLLs. This is no surprise since it parses pdf files that are complex in its format structure and supports more than seven output formats (docx, pptx, ...). We’ll needed to isolate the relevant DLL that handles the conversion process so we can concentrate on the analysis of a specific DLL to find a fuzzing vector that we can abuse to perform in-memory fuzzing. 

By analyzing Acrobat DC with WinDBG, we can speed up the process of analyzing Solid DLLs by knowing how Acrobat DC loads them. Converting a PDF To DOCX will make Acrobat DC load the necessary DLLs  from Solid.

Using WinDBG we can monitor certain events. The one event that we are interested in is ModLoad. This event gets logged in the command output window when the process being debugged loads a DLL. It’s worth noting that we can keep a copy of WinDBG’s debugger command window in a file by using the .logopen command and provide a path to the log file as an argument. Now convert a PDF to a word document to exercise the relevant DLL and finally closing the log file using .logclose  after we finish exporting to flush the buffer into the log file.

Before we view the log file we need to filter it using string `ModLoad` to find the DLLs that got loaded inside Acrobat process, sorted by their loading order.

SaveAsRTF.api, SCPdfBridge.dll and ConverterCoreLight.dll appear to be first DLLs to be loaded and from their names we conclude that the conversion process starts with these DLLs.

Through quick static analysis we found out that their role in the conversion is as follows:

SaveAsRTF.api is an adobe plugin, Acrobat DC plugins are DLLs that extend the functionality of Adobe Acrobat. Adobe Acrobat Plugins follow a clear interface that was developed by adobe that allows plugin developers to register callbacks and menu Items for adobe acrobat. Harnessing it means understanding Adobe’s complex structures and plug-in system.

Adobe uses SCPdfBridge.dll to interact with ConverterCoreLight.dll, Adobe needed to develop an adapter to prepare the arguments in a way that ConverterCoreLight.dll accepts. Harnessing `SCPdfBridge.dll` is possible but we were interested in ConverterCoreLight because it handled the conversion directly.

ConverterCoreLight.dll is the DLL responsible of converting PDF files into other formats. It does so by exporting a number of functions to SCPdfBridge.dll. Functions exported by ConverterCoreLight.dll mostly follow a C style function exporting like: PdfConverterCreate, PdfConverterSetOptionInt, PdfConverterSetConfiguration and finally the function we need to target is PdfConverterConvertEx

Recording TTD trace

Debugging a process is a practice used to understand the functionality of complex programs. Setting breakpoints and inspecting arguments of function calls is needed to find a fuzzing vector. Yet it's time consuming and prone to human errors..

Modern debuggers like WinDBG Preview provide the ability to record execution and memory state at the instruction level. WinDBG Preview is shipped with an engine called TTD (Time Travel Debugging). TTD is an engine that allows recording the execution of a running process, then replay it later using both forward and backward (rewind) execution.

Recording a TTD Trace can be done using WinDBG Preview by attaching and enabling TTD mode. It can also be done through a command line tool:

Recording a trace consumes a high amount of disk space. To overcome this problem, instead of recording the whole process from the beginning; we open a pdf document under Acrobat DC and then before triggering the conversion process, we attach the TTD engine using the command line to capture the execution. After the conversion is done we can kill the Acrobat DC process and load the output trace into WinDBG Preview to start debugging and querying the execution that happened during the conversion process thus we isolated the trace to only containing the relevant code we want to debug.

Since we have a TTD trace that recorded the integration of Adobe and Solid Framework, then replaying it in WinDBG allows us to execute forward or backward to understand the conversion process.

Instead of placing a breakpoint at every exported function from ConverterCoreLight.dll we can utilize TTD query engine to retrieve information about every call directed to ConverterCoreLight.dll by using the dx command with the appropriate Link object.

- Querying Calls information to ConverterCoreLight module.

TTD stores an object that describes every call. As you can see from the above output, there are a couple of notable information we can use to understand the execution.

ThreadId: Thread Identifier

  • All function calls were executed by the same thread. 

TimeStart, TimeEnd: Function start and end positions inside the trace file.

 FunctionAddress:  is the address of the function. Since we don't have symbols, the Function member in the object point to UnknownOrMissingSymbols.

ReturnValue: is the return value of the function upon return which usually ends up in the EAX register.

 Before analyzing every function call, we can eliminate redundant function calls made to the same FunctionAddress by utilizing the LINQ Query engine.


- Grouping function calls by FunctionAddress

NOTE: the output above was enriched manually by adding the symbol of every function address by utilizing the disassembly command `u` on each address.

Now we have a list of functions that handles the conversion process that we want to fuzz. Next, we need to inspect the arguments supplied to every function so that we findan argument we can target in fuzzing. Our goal is to find an argument that we could control and modify without affecting the conversion process or corrupting it.

In this context, the user input is the pdf file to be converted. Some of the things that we need to figure out is how Adobe passes the PDF content to Sold for conversion. We also need to inspect the arguments passed and figure out which ones are mutation-candidates.

Function calls are sorted, we won't dig deep in every call and but will briefly mention the important calls to keep it minimal. 

Function calls that are skipped:

ConverterCoreLight::ConverterCoreLight, PdfConverterSetTempRootName, ConverterCoreServerSessionUnlock,  GetConverterCoreWrapper, PdfConverterAttachProgressCallback, PdfConverterSetOptionData, PdfConverterSetConfiguration, PdfConverterGetOptionInt

Analyzing Function Calls to ConverterCoreLight

  • ConverterCoreLight!PdfConverterCreate

PdfConverterCreate takes one argument and returns an integer. After reversing sub_1000BAB0 we found out that a1 is a pointer to the SolidConverterPDF object. This object holds conversion configuration and is used as a context for future calls.

  • ConverterCoreLight!PdfConverterSetOptionInt

PdfConverterSetOptionInt is used to configure the process of conversion. By editing the settings of the conversion object, Solid allows the customization of the conversion process which affects the output. An example, is whether to use OCR to recognize text in a picture or not.

PdfConverterSetOptionInt is used to configure the process of conversion. By editing the settings of the conversion object, Solid allows the customization of the conversion process which affects the output. An example, is whether to use OCR to recognize text in a picture or not.

 From the arguments supplied we noticed that the first argument is always a `SolidConverterPDF` object created from `PdfConverterCreate` and passed as context to hold the configuration needed to perform the conversion. Since we want to mimic the normal conversion options we will not be changing the default settings of the conversion.

 We traced the function calls to `PdfConverterSetOptionInt` to show the default settings of the conversion.

Note: The above are default settings of Acrobat DC

  • ConverterCoreLight!PdfConverterConvertEx

PdfConverterConvertEx accepts a source and destination file paths. From the debug log above we notice that `a3` points to the source PDF file. Bingo, that can be our Fuzzing Vector that we can abuse to perform an in-memory fuzzing.

Testing with Frida

Now that we found a potential attack vector to abuse which is in PdfConverterConvertEx. The function accepts six arguments. The third argument is the one of interest. It represents the source pdf file path to be converted.

Next should be easy right ? just intercept PdfConverterConvertEx and modify the third argument to point to another file :)

Being Haboob researchers, we always like to make things fancier. We went ahead and used a DBI (Dynamic Binary Instrumentation) engine to demo a POC. Our DBI tool of choice is always Frida. Frida is a great DBI toolkit that allows us to inject JavaScript code or your own library into native apps for different platforms such as windows, iOS etc...

The following Frida script intercepts PDFConverterConvertEX:

So running the script above will intercept PDFConverterConvertEX and when adobe reader calls PDFConverterConvertEX we changed the source file path (currently opened Document) to our path which is “C:\\HaboobSa\Modified.pdf”. What we are expecting here is the exported document should contain whatever inside Modified.pdf and not the current opened pdf.

Sadly that didn't work :(,  Solid converted the currently opened document and not the document we modified through Frida. So what now!

Well, During our analysis of ConverterCoreLight.dll we noticed that there is another exported function with the name PDFConverterConvert that had a similar interface but only differs in the number of the arguments (5 instead of 6). We added a breakpoint on that function, but the problem is that function never gets called when exporting pdf to word document.

So we went back to inspect it even further in IDA:

As we can observe from the image above both PDFConverterConvertEx and PDFConverterConvert are wrappers to a function that does the actual conversion but differ slightly and call the same function. We named that function pdf_core_convert.

Same arguments passed to Ex version are passed to PDFConverterConvert except for the sixth argument in PDFConverterConvertEx version is passed as the fifth argument in PDFConverterConvert. Because The fifth argument in PDFConverterConvertEx version is constructed inside PDFConverterConvert.

In order to hijack execution to PDFConverterConvert, we used Frida's `Interceptor.replace()` to correct the argument number to be 5 instead of 6 and their order.

The diagram below explains how we achieved that:

It worked :)

So, probably whatever object in EX_arg5 was created based on the source file which is the currently opened document this why it didn't work when we modified the source file in EX version. While PDFConverterConvert internally takes care of the creation of that object based on the source file .

Now we can create a fuzzer that hijacks execution to PDFConverterConvert with the mutated file path as source file at each restoration point during our in-memory fuzzing cycles.

In the next part of the blogpost, we will implement a fuzzer based on the popular framework WINAFL. The results we achieved from In-memory fuzzing were staggering, this is how we owned Adobe’s security bulletins two times in a row, back-to-back.
Until then!


ClipBOREDication: Adobe Acrobat’s Hidden Gem


I’ve always enjoyed looking for bugs in Adobe Acrobat Pro DC. I’ve spent a decent amount of time looking for memory corruption bugs. Definitely exciting – but what’s even more exciting about Acrobat is looking for undocumented features that can end up being security issues.

There has been a decent amount of research about undocumented API’s and features in Adobe Acrobat. Some of those API’s allowed IO access while others exposed memory corruption or logic issues. That said, I decided to have a look myself in the hopes of finding something interesting.

There are many ways to find undocumented features in Acrobat. It varies from static and dynamic analysis using IDA along with the debugger of your choice, to analyzing JavaScript API’s from console. Eventually, I decided to manually analyze JavaScript features from console.


Menu Items:

Adobe Acrobat exposes decent capabilities that allows users and administrators to automate certain tasks through JavaScript. One specific feature is Menu Items. For example, if an admin wants to automate something like: Save a document, followed by Exiting the application – this can be easily achieved using Menu Items.


For that purpose, Adobe Acrobat exposes the following API’s:

app.listMenuItems() : Dump all Menu Items

app.execMenuItem() : Execute a Menu Item

app.addMenuItem() : Add a new Menu Item with custom JS code


It’s always documented somewhere in code…

In their official API reference, Adobe only documented the menu items that can be executed from doc-level. Here’s a snippet of the “documented” menu items from their documentation:

Of course, this is not the complete list. Most of the juicy ones require a restrictions bypass chained with them. So, let’s dig into the list from console:

There’s quite a lot.

One specific menu item that caught my eye was: “ImageConversion:Clipboard”. This one does not run from the doc-level and requires a restrictions bypass chained with it. This Menu Item is not documented and, while testing – turns out that through that menu item, one can gain access to the clipboard through JavaScript. Sounds insane right? Well here’s how it works:

First, the menu item uses the ImageConversion plugin. The ImageConversion plugin is responsible for converting various image formats to PDF documents. When the menu item “ImageConversion:Clipboard” is executed, the plugin is loaded, clipboard contents are accessed and a new PDF file is created using the clipboard content. Yes, all this can be done with a single JavaScript function call. We were only able to use this menu item with text content in the clipboard.


Sounds great, how can we exploit this?

Easy, create a PDF that does the following:

1.      Grabs the clipboard content and creates a new PDF file

2.      Accesses the newly created PDF file with the clipboard content

3.      Grabs the content from the PDF document

4.      Sends the content to a remote server

5.      Closes the newly created document


How does that look in JavaScript?

Of course, this POC snippet is for demo purposes and was presented as such to Adobe. No API restrictions bypass was chained with it.

No Security Implications...move on. 

We submitted this “issue” to Adobe hoping that they’ll get it fixed.

To our disappointment, their argument was that this works as designed and there are no security implications since this only works from restricted context. They also added that they would consider again if there’s a JavaScript API restrictions bypass.

What that technically means is that they overly trust the application’s security architecture. Also, it’s unclear whether or not if a chain was submitted they’d address this issue or just the API bypass.

To counter Adobe’s argument, we referenced a similar issue that was reported by ZDI and fixed in 2020. Adobe stated:

Of course, we went back and manually verified if it did indeed trigger from doc-level. Our testing showed otherwise – the menu item did not work (at least from our testing) from doc-level and required a restrictions bypass. It’s unclear whether or not there’s a specific way to force that menu item to run from doc-level.


Do JavaScript API restrictions bypasses exist?

They did, they do and will probably always be around. Here’s a demo of this clipboard issue chained with one. Note that this is only a demo and can definitely be refined to be more stealthy. We cannot confirm nor deny that this chain uses a bypass that works on the latest version:

Disclosure Timeline:


It’s unfortunate that Adobe decided not to fix this issue although they have in the past fixed issues in restricted APIs thus requiring a JS restrictions bypass chained. There’s a reason why “chains” exist.

This makes me wonder whether or not they will fix other issues that require a JS restrictions bypass like memory corruptions in restricted JS API’s? Or should we expect bugs that require an ASLR bypass not to be fixed unless an ASLR bypass is provided?

Adobe closed this case as “Informative” which means dropping similar 0days for educational and informational purposes :)


Until next time…




IDAPython Scripting: Hunting Adobe's Broker Functions


Recently, many vulnerabilities were fixed by Adobe. Almost all of those vulnerabilities fix issues in the renderer. It’s quite rare to find a bug fixed in Reader’s broker.

Our R&D Director decided to embark on an adventure to understand really what’s going on. What’s behind this beast? is it that tough to escape Adobe’s sandbox?

He spent a couple of weeks reading, reversing and writing various toolset. He spent a good chunk of his time in IDAPro finding broker functions, marking them, renaming them and analyzing their parameters.

Back then I finished working on another project and innocently asked if he needs any help. Until this day, I’m still questioning myself whether or not I should have even asked ;). He turned and said: “Sure I think it would be nice to have an IDAPython script that automatically finds all those damn broker functions”. IDAPython, what’s that? Coffee?

First, IDA Pro is one of the most commonly used reverse engineering tools. It exposes an API that allows automating reversing tasks inside the application. As the name implies, IDAPython is used to automate reverse engineering tasks in python.

I did eventually agree to take on this challenge - of course without knowing what I was getting myself into.

Throughout this blog post, I will talk about my IDAPython journey. Especially with the task that I signed myself to, writing an IDAPython script that automatically finds and flags broker functions in Acrord32.exe

Adobe Acrobat Sandbox 101

When Acrobat Reader is launched, two processes are usually created. A Broker process and a sandboxed child process. The child process is spawned with low integrity. The Broker process and the sandboxed process communicate over IPC  (Inter-Process Communication). The whole sandbox is based on Chromium’s legacy IPC Sandbox.

The broker exposes certain functions that the sandboxed process can execute. Each function is called a tag and has a bunch of parameters. The whole architecture is well documented and can be found in the references below.

Now the question is, how can we find those broker functions? How can we enumerate the parameters? Here comes the role of IDAPython.

Now let's get our hands dirty...


Scripting in IDAPython

After some research and reversing, I deduced that all the information we need is contained within the '.rdata' section. Each function with its tag and parameters have a fixed pattern which is 52 bytes followed by a function offset, and looks as follows:

Some bytes were bundled and defined as ‘'xmmword'’ instructions due to IDA’s  analysis.

In order to fix this, we undefine those instructions by right-clicking each one and selecting the  undefine option in ida. Ummm... but what if there are hundreds of them? Wouldn't that take hours? Yup, that’s definitely not efficient. Solution? You guessed it, IDAPython!

The next thing we need to do is convert all those bytes (db) to dwords  (dd) and then create an array to group them together so we can get something that looks like the following:

At 0x001DE880 we have the function tag which is 41h. At 0x001DE884 we have the three parameters 2 dup(1) (two parameters of type 1) and a third parameter of type 2. Finally, at 0x001DE8D4 we have the offset of the function.

Since now we know what to look for and how to do it, let’s write a pseudo-process to accomplish this task for all the broker functions:

1. Scan the '.rdata' section and undefine all unnecessary instructions (xmmword)

2. Start scanning the pattern of the tag, parameters, and offset

3. Convert the bytes to dwords

4. Convert the dwords to an array

5. Find all the functions going forward

5. Display the results


The Implementation

First, we start off by writing a function that undefines xmmword instructions:

As all our work will be in '.rdata' section, we utilize the 'get_segm_by_name' function from the Idaapi package, which returns the address of any segment you pass as a parameter. Using the startEA and endEA properties of the function, we determined the start and the end addresses of the '.rdata' section.

We scan the '.rdata' section using GetDisasm() function to check for any xmmword we stumble across.  Once we do encounter an xmmword then we apply the do_unknown() function which undefines them.

The itemSize() function is used to move and proceed with one instruction at a time.

Next, we check if there are 52 bytes followed by a function offset containing the string 'sub', then pass the starting address of that pattern to the next function, convertDword().

This convertDword function takes the start address of the pattern and converts each 4 bytes to dwords then creates an array out of those dwords.

Having executed the previous function on the entire '.rdata' section, we end up with something similar to the following:

Next, we grab the functions and write them into a file and put them into a new window in IDAPro.

As for the argument types? Sure, here’s what each match to:

The next step is to scan the data section and convert all arguments type numbers to the actual type name to be displayed later.

As I mentioned before, there’s a tag of type dword followed by the parameters which always includes dup() and then followed by a function offset that always contains 'sub' string. We split the parameters and pass the list returned to remove_chars() function which removes unnecessary characters and spaces, lastly we pass the list to remove_dups() function to remove the dup() keyword and replace it with the number of parameters (will be explained in a bit).

Before explaining this function, lets explain what does dup(#) means, if we have for example “2 dup(3)” this means we have 2 parameters of type 3, if we have a number with dup(0) that means we can remove that parameter because it’s an invalid type as we saw earlier in the table we have.

That said, this function is straight forward, we iterate over the list containing all the parameters. We then remove all spaces and characters like 'dd' from the instruction. If there is a dup(0) in the list we just pop that item from the list, and return an array with only valid parameters. so now the next step is to replace dup() with how many numbers in front of it. For example if we have 5 dup (2) that would result 2, 2, 2, 2, 2 in the array.

We iterate over the list using regex to extract the number between dup() parenthesis and append the number extracted based on the number before the dup() just like the example we discussed earlier. After this, we will have a list of numbers only which we can iterate over and replace each parameter type number to its corresponded type.

Finally, the results are written to a file. The results are also written to a new subview in IDA.


It was quite a ride. Maybe I should have known what I was getting myself into. Regardless, the end result was great. It’s worth noting that I ended up sending the directory many output iterations with wrong results – but hey, I was able to get it right in the end!

Finally, you’d never understand the power of IDAPython until you actually write IDAPython scripts. It definitely makes life much easier when it comes to automating RE tasks in IDAPro.


Until next time..


Cooking Solid Vanilla-tasting Stack-Overflows


Recently at Haboob, we decided to look into PDF convertors. Anything that converts various file formats to PDF files and vice versa is game. We stumbled across different frameworks and tools. One of the frameworks that we decided to look into is Solid Framework.

In our first blog post, we covered the basics of Solid Framework, harnessing and fuzzing. We also covered possible attack surfaces in both Acrobat Pro DC and Foxit Editor that can end up triggering Solid Framework vulnerabilities since both applications use the framework.

One of the interesting vulnerabilities that recently got fixed is a Stack Overflow vulnerability. It’s interesting enough that we were able to fully control the crash.

Buckle up, here we go..

The Vulnerability:

AW’ array entry is responsible for defining the widths for individual CIDs in a PDF XRef object. It’s possible to trigger a Stack-based Buffer Overflow by an invalid CID width but the story has more into it.

The crash initially looked interesting enough for us to pursue:

The root-cause of the vulnerability was unclear and at first glance the vulnerability can be misleading. That address was not mapped to anything so, things like WinDBG’s “!heap” or “!address” won’t get you anywhere. To make things more intriguing, we kept getting the same crash each time we ran the testcase. We did not know where the value that kept getting dereferenced came from.

We had to do a lot back-tracing in order to understand the story behind the value that kept being dereferenced. During the back-tracing process, an interesting function call caught our attention. A function call in the SecurePdfSDK Library reads the object stream by calling the read function to extract the data and then copies it to a stack buffer.

The read function calls xsgetn which seemed to be getting the data from a controlled buffer with a controlled size:

Luckily, in that specific testcase the size that caused the crash was 0xffffffff, which made the crash visible. The following screenshot shows the call to the xsgetn function:

Later, a memcpy call is made to copy the data into a buffer on the stack. Looking at the destination buffer after the copy we noticed that the value (0x82828080) that kept being dereferenced was in the data.

So where did this value come from? Can it be controlled?

The Mangle:

After a bit (too much) of investigation, we finally figured out that the value came from a stream. The stream was zlib compressed. That said, the stream was decompressed then the decompressed data was copied.

Armed with that piece of information, we moved ahead and crafted our own stream, compressed it, embedded it, and ran the test case.

By setting a breakpoint on xsget, we were able to examine the arguments passed. Continuing the execution and examining the data copied after the memcpy call showed that our crafted stream data was copied to the stack buffer. Note that the size also can be controlled with the stream length and its data:

Moving ahead with execution, the result at last looked a lot more promising:

The Control:

At this point we’re not done yet. We needed to figure out how to get this from its current state to controlling EIP. After going back and forth with minimizing the stream to be able to achieve something even better, it seemed that the easiest method was to overwrite 40 bytes which will eventually overwrite the return address on the stack. To do so, we used CyberChef to cook a recipe to compress our stream. The result looked like the following:

Now, all we needed is to edit the object stream of the corrupted XRef object. Doing so, the stream ended up close to this:

Note that shockingly stack cookies were not enabled, thus making our day way better.

And finally, the great taste of EIP control:


This bug was originally found in Solid Framework’s SDK but it did also hit in Foxit PhantomPDF since it uses the framework for conversion. Others also use the same framework (We’re looking at you Acrobat ;) ).

Foxit does not allow conversion from script (for example trigger the conversion from JavaScript) but that functionality exists in Acrobat. This vulnerability was fixed in Foxit’s May patch release.

Until then, thank you for following along.




A new Solid attack surface against Acrobat and Foxit Editor


Picking a target to fuzz can sometimes be demotivating, especially if you want the target to be aligned with certain projects that you are working on. Sometimes your approach can be fuzzing the whole application. Other times you decide to target a specific component of the application. Sometimes those components are 3rd party, 3rd party components with an SDK. An SDK? Jackpot! 

This blog post will shed some light on a new attack surface. Solid Framework is used in popular PDF applications like Adobe Acrobat and Foxit Editor. Throughout our small research we were able to find many vulnerabilities that we reported to the respective vendors. 

What is Solid framework software development kit? 

Solid framework is constructed of a set of Dynamic Link Libraries (DLL) that contributes in parsing and converting PDF files to other formats, like Microsoft word document, Microsoft Excel Workbook, Microsoft PowerPoint Presentation, etc. It parses PDF objects and reconstruct them to their corresponding objects in other formats.  

Instead of reinventing the wheel, PDF applications such as Adobe Acrobat and Foxit Editor use Solid Framework SDK to ease the process of converting PDF files to other Microsoft file formats. 

Since there’s an SDK that we can use, isolating Solid Framework’s components and analyzing how it converts various formats is pretty much a straight forward process. That said, developing harnesses for fuzzing purposes should be easy from there.

Harnessing Solid framework software development:

The idea of harnessing is to replicate a specific feature that Solid framework SDK offers into its simplest form while preserving the same functionality. It’s also mainly used to speed up the fuzzing process. Such functionalities include but not limited to, converting PDF file to DOC, DOCX, XLS, and PPTX. 

Here’s sample code that converts a PDF file to a DOC:

The same idea applies to produce a harness for the rest of Microsoft file formats docx, xlsx and pptx.

Integrating harness to fuzzing framework

Since we have a harness to work with, we can use it for fuzzing purposes by integrating it in your fuzzing framework. If you’re new to frameworks/framework implementation here’s a sample workflow that we ended up putting together when we first started working on this project:

The fuzzing framework is composed of three main parts: Mutator, Monitor, and Logger. Once you have those properly implemented then pushing different harnesses should not be an issue.

How can this be triggered in Adobe Acrobat / Foxit Editor?

Two ways.

First through user-interaction, specifically by manually exporting the PDF file to another file format (DOC, PPT etc..):

Second way is to trigger the conversion through JavaScript. Can this be done? In Acrobat, you can do it through the saveAs JavaScript API.

Let’s take a closer look at the arguments accepted by the saveAs API:

If used, cConvID should be one of the following:

That said, we can use com.adobe.acrobat.doc to trigger the conversion code (Solid code), thus trigger vulnerabilities through JavaScript. Only caveat here is that saveAs needs to be chained with an API restrictions bypass to work.


Finding new un-touched components in an application is great. Being able to harness those components is even better, especially for fuzzing purposes.

This research yielded many bugs that were common between Solid Framework, Adobe Acrobat and Foxit Editor. It’s great to pop all of them with the same bug, right? ☺

Until next time...


