This is was one of the most painstaking ones (which is reflected in the length of this write up). While finding the vulnerability was trivial, building a working exploit was quite of a challenge here.
The target app
The target app https://github.com/RPISEC/MBE/blob/master/src/lab07/lab7A.c is, on the face of it, quite similar to the previous one LAB7C.c (https://hackingiscool.pl/exploiting-the-same-user-after-free-twice-to-leak-the-mem-layout-and-execute-code-mbe-lab7c-walkthrough/). But instead of Use after Free, it's vulnerable to a multi-stage heap-based overflow.
There is a structure with buffers, length field and a pointer to overwrite:
And there are standard CRUD (Create/Read/Update/Delete) options available from the user interface, which - when chosen - call relevant functions:
The initial (and very short) heap overflow resides in the create_message() function and was quite trivial to find:
It gets interesting when the program asks the user to provide the length for the new message by calling
get_unum() (which is defined in https://github.com/RPISEC/MBE/blob/master/include/utils.h, for the record):
So, the vulnerability resides on lines 109-110:
msg->message buffer size is
Results of arithmetic division operations on integers give integer results. So,
32, but also
129,130 and 131 are possible length values we can sneak in without hitting the
(new_msg->msg_len / BLOCK_SIZE) > MAX_BLOCKS condition and having our input length overwritten with the safe value of
And then, right away after smuggling the slightly bigger
new_msg->msg_len, in the same
create_message() call, we have this:
read() (to which there are no bad characters, by the way! :D) writes
new_msg->msglen bytes from standard input to the
new_msg->message buffer. So we can overflow the
new_msg->message buffer by up to three bytes. And what will we overwrite this way? Yup, the
msg_len field! This way we can achieve having created a message with nearly any size in
msg_len, as we can control 3 out of its 4 bytes.
This can be taken advantage of in the
Here is the second heap overflow, directly resulting from the first one, making it possible to overwrite the heap contents far beyond currently edited message body and its length field (so we will overwrite the
print_msg pointer of the next message we create on the heap, getting a foothold into execution control).
Also, note the
numbuf buffer used to store user input before converting it to an integer used for message index (with
10 being maximum expected number of messages, which are indexed from
0, hence one digit is actually enough to store the index). We are going to use it later.
More insight on the target app
First of all, LAB7A.c comes with the following
This is the way it is being run (output from
ps aux from the gameadmin/root user):
lab7end 12237 0.0 0.2 5076 2116 ? S 18:39 0:00 socat TCP-LISTEN:7741,reuseaddr,fork,su=lab7end EXEC:timeout 60 /levels/lab07/lab7A
So, it is running as the
lab7end user (so we would have to become root to debug it) and has ugly timeout of 60 seconds, making debugging additional hassle.
So, we should work locally, on
/levels/lab07/lab7A, or on its copy in
This itself will not let us get rid of the timeout... because it is also implemented with the following macro call on line 10:
The macro boils down to calling
alarm(60) and setting the alarm handler to a function doing
I initially tried to work on a version I manually recompiled (with the only difference being the timeout parameter changed from
0 to effectively disable it), using the same flags... and I thought it was OK... but then later when searching for ROP gadgets I noticed slight differences in offsets, so I decided to work on the original copy instead and just add an automated
call alarm(0) to my gdb script (-x commands.txt).
Second, we will have to write a custom shellcode here this time, as the binary is compiled statically (libc will no longer be dynamically linked, instead only the required functions are statically linked, which means they are built in the executable):
/* compiled with: gcc -static -z relro -z now -fstack-protector-all -o lab7A lab7A.c */
Simply returning to libc's
system() won't be an option as we won't have the entire libc dynamically linked. Instead, only the libc functions actually used by the program would be linked in, by putting their code into the same code segment as program's own code. This will definitely make exploitation more difficult.
Another specific property is the lack of
-fPIE -pie flags, the result of which was the code segment not being ASLR-ed, so all the functions and ROP gadgets are at fixed addresses (which will, in turn, make the exploitation easier). Still, other segments get ASLR-ed (except for the heap, which turned out kind of tricky - although its range in the target app was the same every time I checked, the first address returned by
malloc() varied between instances, making the need for leaking).
How data is aligned in memory - getting our first crash at an EIP we control
Ok, let's run this.
A look at the
An interesting thing to notice, the heap is already mapped as well, even though we have not issued a single
malloc() yet (as opposed to https://hackingiscool.pl/exploiting-the-same-user-after-free-twice-to-leak-the-mem-layout-and-execute-code-mbe-lab7c-walkthrough/), this is most likely due to the lack of
-fPIE -pie flags.
Creating and viewing the first message:
OK, let's see it in memory:
Oh, this was unexpected. Where is it then? Let's find it by searching for any part (first four bytes) of the XOR pad/encrypted output we just displayed above. Search for the literal did not work due to endianness, search for the bytes in reversed order did the trick:
OK, let's see it (we need to aim a bit wider, as
0x80f19d4 is just the beginning of the XOR pad, while we want to see the entire structure and the preceding malloc metadata):
OK, now we create another message and have a look again:
Now, this is the layout with the messages sitting on the heap:
This should give us clear picture of how to start make the program call an arbitrary address. After we create message
#0 with an arbitrary length value bigger than 140 (128 for the message body + 4 for the overwritten once again length value + 8 bytes for the malloc meta fields = 140), we will start overwriting message
print_msg() pointer, then the message
#1's XOR pad, then the message
#1's body itself.
Afterwards we ask the program to print message
#1, making it call our overwritten
So we create a new message, with arbitrary length of 131 (max we can sneak through the faulty boundary check) and we use those additional three bytes to smuggle three 'C's:
Now, those three 'C's overwrote the three least significant bytes of the original
msg.len field, turning it from
OK, cool. Let's proceed to a careful overwrite of a pointer. Luckily, we don't have to be very careful when it comes to the arbitrary value of the message length field we put here.
0x0043434) is good enough, because we don't have to fill the full length of
read() will stop when no more data is available from the standard input, at first we'll just write 144 bytes, with the last 4 being the new pointer for the next message's
We create a message with declared length 131 and following content (this will be index
We create a second message with any length and content (irrelevant now).
We edit the first message, filling it with this payload (
ZZZZwill be the hijacked EIP):
python -c 'print "A"*140+"Z"*4'
We ask the program to print the second message (index
#1) and watch it crash on
We control EIP, now what
Controlling only the EIP is not sufficient to make the program do exactly what we want. As long as we cannot simply inject a shellcode to an executable memory range and jump to it (and we can't - at least not yet :) - as we are dealing with DEP/NX), we need to go with ROP approach. Even if we were not doing an actual ROP chain, but a simple overwrite of the controlled pointer to
system()'s address (which we can't here, as mentioned before), we still need to control the argument that function call is expecting to have on the stack when called.
Let's be entirely clear, this is what we hijack:
So, at the time of execution of our arbitrary EIP, the argument on the stack will be a pointer to
messages[i] structure on the heap (with its first field being the
print_msg pointer, by the way). Even if we could make EIP point at
system(), we would still want the stack to contain a pointer pointing to a buffer we control - as opposed to the XOR pad, which is filled with random data.
This is what the stack looks like at the time function
ZZZZ) is entered:
The saved RET points to the next instruction in
messages[i] points to the beginning of the
message structure on the heap.
Then I noticed that
mprotect() is linked into the program:
So I instantly recalled XPN's ROP Primer writeup (https://blog.xpnsec.com/rop-primer-level-0/) and thought 'fuck yeah, I am gonna set
mprotect() and make the heap executable, the buffer address is the first argument... but what about the next two?'
So the next two variables on the stack being
0x0000000a were ALMOST what I thought would suffice.
0x00000000 is where the buffer length should be (0 is not great for length), while
0x0000000a would serve as the flags (0x7 is RWX, so
0xb, we would be fine as long as this flags value is not invalid). Obviously, this did not work (at least because of the
0 as length, but probably there's more than there is to it - we'll be back to this later).
Anyway, I started wondering whether I can control that
0x0000000a on the stack, only to figure out where they come from: they are a survival from the
strtoul(numbuf, NULL, 10) call in
print_index() right before our message is printed (it's the
10, last two arguments to
So I looked at the registers state and the stack at the time of the call/crash, looking for anything to hook on, any candidate for the next step in execution control that would allow us make the memory and registers alignment more favorable - and put my attention to EDX, as it was pointing to the buffer we control (remember, we can write past the
ZZZZ any number of bytes we want!):
So I started wondering, what if I could find a ROP gadget that would somehow
mv EDX to
ESP, tricking the program to start using the heap as the stack? Then we could place the rest of the ROP chain on the heap, as we are having a hard time trying to control the stack right now.
By the way,
ropeme is a nice tool for this (the
lab7A.ggt was previously created by the same script, by calling
generate on the target binary):
Well, let's just say I could not find the proper gadgets (looking for stuff like
mov edx, esp; ret; or
push edx; pop esp; ret). The two
pop esps were not preceded with what I needed once I checked in gdb with
x/5i address-offset. And yup, I tried
x/5i address-offset for different
offsets, as the instructions will vary depending on what offset of the opcodes we hit - which sometimes can work in our favor as we can find a gadget that does not exist under any offset normally operated by the program when it executes as intended - I accidently came across this being mentioned here https://securingtomorrow.mcafee.com/other-blogs/mcafee-labs/emerging-stack-pivoting-exploits-bypass-common-security. Plus, sometimes this phenomenon is abused as an anti-disassembly technique - which I learned from this workshop whitepaper which I recommend: https://github.com/theevilbit/workshops/blob/master/Anti Disassembly Workshop/Hacktivity 2015 - Fitzl Csaba - Hello Anti Disassembly.pdf.
But back on the subject!
As after about two hours I felt stuck, I decided to peek into Corb3nik's solution https://github.com/Corb3nik/MBE-Solutions/tree/master/lab7a for clues.
Corb3nik's solution - tricky format-string-based leaking of the heap pointer after sneaky stack-grooming
Reading Corb3nik's exploit made me realize that leaking a heap-stored message pointer would in fact be needed for a reliable exploit to work - if we want to use the heap for our payload (as we could alternatively use the stack - but then we would need to leak the stack address, so it does not really make much difference at this point). So leaking has to be done before proceeding to attaining arbitrary code execution. And it turned out to be tricky (no comfy gadget this time, like with https://hackingiscool.pl/mbe-is-fun-lab6a-walkthrough/).
So, if we want to leak something, we can overwrite the EIP with
printf()'s address, which as we know is fixed in this case. But what about the argument? When doing heap overflow, we overwrite the next message
print_msg pointer and if we keep writing, we overwrite it's XOR pad. Then if we ask the program to print the message, it will call
printf() with just one argument, being the pointer to the
0x080f09d0 at the moment) looks like this:
Now here's the trick. If
printf() treats the entire buffer pointed by
messages[i] as a string, the XOR pad we overwrite past the
print_msg pointer (which itself we overwrite with
printf()'s address) can contain a FORMAT STRING expression.
This means that if we overwrite the XOR pad (pointed by
messages[i]) with a string like
%1$p, it will print the value of the next dword on the stack, right to the heap pointer itself - exactly where the next argument to
printf() would be, if it was called with that format string properly (like
This way, we can pick an arbitrary value from the stack we want printed plain and clean in proper pointer format (thus the
number$ index selects the number of the value from the stack). So, even the nullbytes on the stack are not an issue.
As we can see on the screenshot above, we could already leak the stack (the last value to the right at the bottom,
0xbffff728 in that case) if we wanted to use it for storing the final payload. Something I realized only while writing this up and have not tried pursuing.
In Corb3nik's solution - which I followed - heap was used to store the payload (a ROP chain execve(/bin/sh) shellcode), by using a
pop esp; ret +
payload_addr_on_the_heap first-stage chain to trick the program to start using the heap as the stack - something I originally wanted to do when I got stuck.
So, the problem with this approach is that we do not actually have a pointer pointing to the heap anywhere on the stack while
print_msg()/printf() is called - except for
messages[i] - but
messages[i] is literally the first argument from
printf() call's perspective and we cannot reach it with
%0$p format string (I tried)... we can reach everything past it, but not it itself. And it's not held anywhere on the heap itself either, so just leaking the heap without a format string at all wouldn't help.
This is where Corb3nik used a recurrent call of
print_index() from within
print_index(). In order to achieve this, four messages had to be created, because two overwritten pointers were needed:
- By overflowing message
#1's pointer was overwritten to
print_index()(so this is why we already had to create two messages).
- By overflowing message
#3's pointer was overwritten to
printf(), with its XOR pad being overwritten with the format string (thus, two more mesages).
- Then, by asking the program to print message
#1, we have it call
print_index()from the menu and asks for the number of the message to print. Once the number is provided,
print_msg()- or whatever pointer we put there. Since for message
#1we overwrote this pointer with
print_index(), by asking the program to print message
#1we manage to have a recurrent
print_index()->print_index()call. This way another set
print_index()'s local variables and arguments is put on the stack, along with
messages[i]. When the second
print_index()call asks for the message number to print, we chose
#3, because its
print_msg()pointer was overwritten with
printf()'s address and its XOR pad with our format string.
- This way we achieve the
print_index()->print_index()->printf("%20$p"), creating and exploiting a format string condition.
Below is the stack of
Below is the stack of
So the stack's properly groomed for leaking the address of
messages[i]. From this point we can calculate the offset to the XOR pad we decide to put our payload in. We'll get back to this, now let's find out how to take control of the stack and start our ROP.
Corb3nik's solution - ROP-based stack pivoting
By analyzing Corb3nik's solution, after comprehending the convoluted heap address leaking, I realized that he started his ROP chain with a pointer pointing at a
mov ecx, esp; ret; gadget (changing the stack pointer to the value of
ECX). We saw a bunch of those earlier in
ropeme output, when we were searching for stack-pivoting gadgets. I felt puzzled, as back then when looking at the state of the registers the only register that appeared to have a useful value was
I understood the use of
ECX after I analyzed the way he provided arguments to the final
print_index() call in his exploit, made me understand the trick:
So let's not focus on the ROP chain itself now (it makes ESP point to the buffer on the heap and then ret to it, as mentioned before, turning the heap into program's stack, because why not :D - I decided to go a different route, later on that).
Let's focus on the way the chain is delivered - along with the message index!
print_index() source code, focusing on a part we did not pay much attention to before:
So again, this is the function that calls our overwritten pointer. It will trigger the exploitation by calling
print_index()->mov ecx, esp; ret;.
Its local variables are held on the stack. Then it makes the call to
print_msg() - or whatever we overwrite it with, putting more variables (call arguments, stack frame, saved RET and any local variables if needed) to the stack. This means that the
numbuf is there on the stack - and that's where
ECX happens to point when
fgets() allows us to stuff 31 bytes into
numbuf, only the first one needs to be a digit corresponding to the chosen message index, the rest can be anything non-null we want to place on the stack, as the later
strtoul() conversion will simply ignore it - and
ECX points there:
So, after we have any message structure on the heap with its
print_msg pointer overwritten with one of the
mov ecx, esp; ret; gadgets (e.g.
0x80bd536), we can simply ask the program to print a message and provide it's index along with up to 30 bytes of our ROP chain.
So now we control EIP, we control the stack and we can overwrite the heap pretty much anyway we want. Now we can talk!
My last-stage ugly alternative - shellcode to heap, mprotect() heap RWX and ret there
Corb3nik's ROP chain delivered to the
print_index()->numbuff buffer via
print_index()->fgets()'s input made the program start using the heap as the stack (
pop esp, ret;).With the rest of his ROP chain stored on the heap via the initial heap overflow:
As you might remember from the beginning of this way too long write up, I wanted to use
mprotect() really bad, to make the heap executable and just fucking jump to it (thus ugly), without using any more ROP.
I did as well start off with the
messages[i]->print_msg() pointer overwritten with the address of a
mov ecx, esp; ret; gadget.
But my ROP chain delivered along with the message number to the
print_index() stack looked like this:
So obviously it started with the address of
Then there was an address of a
pop3ret instruction - the next instruction the
mprotect() would return to - this is where it would expect to have its parent's saved RET stored. Before we return to the next address, we have to jump over/clean up the
mprotect() arguments still lying on the stack, hence we have to use
popNret as the next addr to return from our function whenever that function takes N arguments. This is the basic principle of building ROP chains.
OK, then the arguments.
First, the start address of the memory area we want to change memory protection flags of. I initially used the address of my shellcode (warning, this did NOT work and required a fix, but read on!). The shellcode was already delivered to message
#1's buffer by overwriting its XOR pad with the format string for
printf() AND the shellcode itself:
At the time of writing the exploit it was still a NOP-holder, I left shellcode writing till the end. The point is that the address of the shellcode on the heap was already known thanks to its fixed offset from the leaked pointer.
Then the length (I wanted to use
0x64 just to be on the safe side and have enough space executable). Then the flags (
0x7 = READ + WRITE + EXEC).
And then, lastly, again the address of the shellcode on the heap. This is where the last
ret from this short ROP chain will return. Then it will be just normal (a sequence of opcodes) shellcode executed on the heap.
For some reason
mprotect() kept returning an error (
EAX), so the range must have been incorrect. I peeked into XPN's ROP Primer https://blog.xpnsec.com/rop-primer-level-0/ write up again and did as he did there with the stack - used the entire fixed heap start address + length as arguments (as I mentioned, they stayed the same between instances, hence could be fixed). Not the prettiest solution, but it was late and I just wanted to write the shellcode, run it from the heap and call it a day.
OK, the shellcode now. As far as I remember it should look like:
I built this using
shellnoob (a tool I recommend for asm->opcode and opcode->asm conversions):
So, the opcodes are:
So yeah, it worked: