Normal view

There are new articles available, click to refresh the page.
Before yesterdayNytro Security

Network scanning with nmap

21 January 2019 at 06:45

Introduction

First step in the process of penetration testing is “Information gathering”, the phase where it is useful to get as much information about the target(s) as possible. While it might be different for the different type of penetration tests, such as web application or mobile application pentest, network scanning is a crucial step in the infrastructure or network pentest.

Let’s take a simple scenario: you are a penetration tester and a company want to test one of its servers. They send you the IP address of the server. How to proceed? Although nmap allows to easily specify multiple IP targets or IP classes, to keep things simple, I will use a single target IP address which I have the permission to scan (my server): 137.74.202.89.

Why?

To find vulnerabilities in a remote system, you should first find the network services running on the target server by doing a network scan and finding the open ports. A service, such as Apache or MySQL can open one or multiple ports on a server to provide its functionality, such as serving web pages or providing access to a database.

How?

A well-known tool that helps penetration testers to perform network scan is nmap (Network Mapper). Nmap is not just a port-scanner, it is a powerful tool, highly customizable that can also find the services running on a system or even use scrips (modules) to find vulnerabilities.

The easiest way to use nmap is to use the Pentest-Tools web interface which allows anyone to easily perform a network scan.

Let’s see some examples. We want to scan an IP address using nmap. How can we do it? What parameters should we use? We can start with the easiest version:

root@kali:~# nmap 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-16 02:11 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.045s latency).
Not shown: 993 closed ports
PORT    STATE    SERVICE
22/tcp  open     ssh
25/tcp  filtered smtp
80/tcp  open     http
135/tcp filtered msrpc
139/tcp filtered netbios-ssn
443/tcp open     https
445/tcp filtered microsoft-ds
Nmap done: 1 IP address (1 host up) scanned in 2.07 seconds

We can find some useful information:

  • We see the nmap version and start time of the scan
  • We can see the domain name of the IP address: rstforums.com
  • We can see that host is up, so nmap checked this
  • We can see that 993 ports are closed
  • We can see that 7 ports are open or filtered

However, even if the default scan can be very useful, it might not provide all the information we need to perform the penetration test on the remote server.

Nmap options

Checking the options of nmap is the best place to start. The “nmap -h” command will show us the command line parameters grouped in multiple categories: target specification, host discovery, scan techniques, port specification, version/service detection, OS scan, script scan, performance, firewall evasion and output. It is possible to easily find detailed information about all options by using the “man nmap” command.

Let’s see what common options might be useful, from each category.

  1. Target specification – Since we have a single IP address as a target, there is no need to load it from a file (-iL), we will specify it in the command line.
  2. Host discovery – These options are useful when there are a lot of target IP addresses and can help to reduce the scan time by checking if the target IP addresses are online. It does this by sending multiple different packets, but it can miss some of them. Since in our case there is a single target IP address, we can disable the host discovery by using the “-Pn” argument.
  3. Scan techniques – It is possible to scan using multiple techniques. First, it is important to know what to scan for: TCP, UDP or both. The most common services are running on TCP, but in a penetration test UDP ports must not be forgotten. It is possible to scan for UDP ports using “-sU” command line option and for TCP, there are two common scan techniques: SYN scan (“-sS” option) and Connect scan (“-sT” option).
  4. Port specification – After we decide what scan technique to use, we have to mention the ports we want to scan. This can be easily achieved with “-p” option. By default, nmap scans the most common 1000 ports. However, to be sure, we can scan all ports (1-15535) using “-p-“ option.
  5. Service/version detection – Even if finding open ports is a good start, finding which service and which service version are running on the target system would help more. This can be easily achieved by using the “-sV” option.
  6. OS detection – It might be useful to also know which Operating System is running on the target system and specifying the “-O” option will instruct nmap to try to find it out.
  7. Script scan – With the previous options we can find which services are running on the target system. However, why not to get more information about them? Nmap has a large amount of scripts that can get additional information about them. Please note that some of them might be “intrusive”, so we need the permission before scanning a target.
  8. Performance – This category allows us to customize the speed of the scan. There are a few timing templates that can be used with “-T” parameter, from “-T0” (paranoid mode) to “-T5” (insane mode). A recommended value would be “-T3” (normal mode) and if network connectivity is good, “-T4” (aggressive mode) can be used as well.
  9. Firewall evasion – There are multiple options which specify different techniques that can be used to avoid firewalls, however, for the simplicity we will not use them here.
  10. Output – What happens if you scan for a long time and your system crashes? What if you close the Terminal by mistake and not check the scan result? You should always save the output of the scan result! The “-oN” saves the normal output, “-oX” saves the output as XML or “-oG” saves it in “greppable” format.
  11. Other options – It is also very useful to know what’s happening if a long-time scan is running an “-v” can improve verbosity and keep you up to date. If there are a lot of targets, by using “–open” you will only get the open ports as output and it can improve your scan read time. It is possible to also resume a scanning session (if output was saved) using “–resume” option and “-A” (aggressive) can turn on multiple scan options by default: “-O -sC -sV” but not “-T4”.

During a penetration test all ports must be scanned. A possible nmap command to do it would be the following:

nmap -sS -sU -p- -sC -sV -O -Pn -v -oN output 137.74.202.89

However, it will take some time, so a good suggestion is to run a shorter scan first, scan for example only most common 100 or 1000 TCP ports and after this scan is finished, start the full scan while working with the result of this scan. Below is an example, where “–top-ports” option choses the most common 1000 ports.

nmap -sS --top-ports 1000 -sC -sV -v -Pn -oN output 137.74.202.89

TCP vs UDP scan

While doing a network scan, it is useful to understand the differences between TCP and UDP protocols.

UDP protocol is very simple, but it does not offer the functionalities that TCP offers. The most useful features of TCP are the following:

  • It requires an initial connection, in 3 steps, also called “3-way handshake”:
  1. Client sends a packet with the SYN flag (a bit in the TCP header) set
  2. Server replies with the SYN and ACK flags set (as mentioned in the TCP standard, this can also be done in two packets, but it’s easier to combine them in a single packet)
  3. Client confirms using an ACK flag set packet.
  • Each packet sent to a target is confirmed by another packet, so it is possible to know if the packed reached the destination or not
  • Each packet has a number, so it is sure that the packets are processed at the destination in the same order as they were sent

The initial connection is important to be understood in order to understand the difference between the two common TCP scans: SYN scan (-sS) vs Connect (-sT) scan. The difference is that the SYN scan is faster, as nmap will not send the last ACK packet. Also, it is important to note that nmap requires root privileges to use SYN scan. This is because nmap need to use RAW sockets, a functionality of the Operating system, to be able to manually create the TCP packets and this needs root privileges. If we run nmap with root privileges, by default it will use SYN scan, if not, It will use Connect scan.

How does it work?

Enough with the theory, let’s see what happens during a SYN and UDP scan. We will use a simple command, to scan for port 80 on both TCP (using SYN scan) and UDP.

# nmap -sS -sU -Pn -p 80 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-17 13:12 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.045s latency).
PORT   STATE  SERVICE
80/tcp open   http
80/udp closed http
Nmap done: 1 IP address (1 host up) scanned in 0.26 seconds

During the scan, we open Wireshark and check for the packets sent using a filter that will show us only the packets sent to our target IP address: “ip.addr == 137.74.202.89”. Below is the result:

syn

We can see the following:

  1. First three packets are TCP: one with the SYN flag sent by nmap, one with the SYN and ACK flags sent by the target server and one with RST (Reset) flag sent by nmap. As you can see, being a SYN scan, the last packet of the three-way handshake does not exist. This is helpful, because some services, on connection, might log the IP address that connected to them and this type of scan helps us to avoid this issue.
  2. Last two packets are UDP and ICMP: first packet is the one sent by nmap to the remote port 80, and it received an ICMP message “Destination unreachable (Port unreachable)” which informs us that the port is not open and nmap can show it as closed. However, please note that those packets might not be sent.

Let’s also check for a Connect scan is performed. We can use the following command:

nmap -sT -Pn -p 80 137.74.202.89

Below is the result:

open

We can see that there are four packets:

  1. First three packets represent the three-way handshake used to initiate the connection.
  2. Last packet is sent to close the connection

What happens if the port is closed? We will change the port to a random one: 1337

closed

There are two packets:

  1. First packet is the SYN packet sent by nmap to initiate the connection
  2. Second packet is the RST packet received, meaning that the port is not opened

However, if a firewall is used, it might be possible to not receive the RST packet.

Service version

Service version option (-sV) allows us to find out what is running on the target port. This depends on the service running there. However, let’s see some examples of requests that nmap will use to find what is running on port 80, which is an Apache web server.

# nmap -sS -Pn -p 80 -sV 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-17 14:05 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.043s latency).
PORT   STATE SERVICE VERSION
80/tcp open  http    Apache httpd
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 6.98 seconds

Below is a list of HTTP requests sent by nmap:

GET / HTTP/1.0
GET /nmaplowercheck1539799522 HTTP/1.1
Host: rstforums.com
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)
POST /sdk HTTP/1.1
Host: rstforums.com
Content-Length: 441
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)

<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Header><operationID>00000001-00000001</operationID></soap:Header><soap:Body><RetrieveServiceContent xmlns="urn:internalvim25"><_this xsi:type="ManagedObjectReference" type="ServiceInstance">ServiceInstance</_this></RetrieveServiceContent></soap:Body></soap:Envelope>
GET /HNAP1 HTTP/1.1
Host: rstforums.com
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)
GET / HTTP/1.1
Host: rstforums.com
GET /evox/about HTTP/1.1
Host: rstforums.com
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)

Script scan

If we enable the script scan (-sC), the number of requests increases as it will use multiple “scripts” to find more information about the target. Let’s take the following example:

# nmap -sS -Pn -p 80 -sC 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-17 14:14 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.045s latency).
PORT   STATE SERVICE
80/tcp open  http
|_http-title: Did not follow redirect to https://rstforums.com/
Nmap done: 1 IP address (1 host up) scanned in 1.50 seconds

Below is the Wireshark output, using a filter that matches only the HTTP requests sent:

http wireshark

As you can see, nmap scripts will send several HTTP requests useful to find more information about the application running on the web server. For example, it will send a request to find if “.git” directory is present, which can contain source code, it sends a request to get “robots.txt” file which might lead to additional paths and one script even sends a POST request to find if there is a RPC (Remote Procedure Call) aware service running:

POST / HTTP/1.1
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)
Content-Type: application/x-www-form-urlencoded
Content-Length: 88
Host: rstforums.com

<methodCall> <methodName>system.listMethods</methodName> <params></params> </methodCall>

Conclusion

Nmap is most often seen as a “port scanner”. However, in the right hands, in the hands of someone that properly understands how it works, it turns into a powerful penetration testing tool.

This article highlights some of the most common and useful features of nmap, but for a comprehensive understanding of the tool it is required to read the manual and actually use it.

Writing shellcodes for Windows x64

30 June 2019 at 16:01

Long time ago I wrote three detailed blog posts about how to write shellcodes for Windows (x86 – 32 bits). The articles are beginner friendly and contain a lot of details. First part explains what is a shellcode and which are its limitations, second part explains PEB (Process Environment Block), PE (Portable Executable) file format and the basics of ASM (Assembler) and the third part shows how a Windows shellcode can be actually implemented.

This blog post is the port of the previous articles on Windows 64 bits (x64) and it will not cover all the details explained in the previous blog posts, so who is not familiar with all the concepts of shellcode development on Windows must see them before going further.

Of course, the differences between x86 and x64 shellcode development on Windows, including ASM, will be covered here. However, since I already write some details about Windows 64 bits on the Stack Based Buffer Overflows on x64 (Windows) blog post, I will just copy and paste them here.

As in the previous blog posts, we will create a simple shellcode that swaps the mouse buttons using SwapMouseButton function exported by user32.dll and grecefully close the proccess using ExitProcess function exported by kernel32.dll.

ASM for x64

There are multiple differences in Assembly that need to be understood in order to proceed. Here we will talk about the most important changes between x86 and x64 related to what we are going to do.

Please note that this article is for educational purposes only. It has to be simple, meaning that, of course, there are a lot of optimizations that can be done for the resulted shellcode to be smaller and faster.

First of all, the registers are now the following:

  • The general purpose registers are the following: RAX, RBX, RCX, RDX, RSI, RDI, RBP and RSP. They are now 64 bit (8 bytes) instead of 32 bits (4 bytes).
  • The EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP represent the last 4 bytes of the previously mentioned registers. They hold 32 bits of data.
  • There are a few new registers: R8, R9, R10, R11, R12, R13, R14, R15, also holding 64 bits.
  • It is possible to use R8d, R9d etc. in order to access the last 4 bytes, as you can do it with EAX, EBX etc.
  • Pushing and poping data on the stack will use 64 bits instead of 32 bits

Calling convention

Another important difference is the way functions are called, the calling convention.

Here are the most important things we need to know:

  • First 4 parameters are not placed on the stack. First 4 parameters are specified in the RCX, RDX, R8 and R9 registers.
  • If there are more than 4 parameters, the other parameters are placed on the stack, from left to right.
  • Similar to x86, the return value will be available in the RAX register.
  • The function caller will allocate stack space for the arguments used in registers (called “shadow space” or “home space”). Even if when a function is called the parameters are placed in registers, if the called function needs to modify the registers, it will need some space to store them, and this space will be the stack. The function caller will have to allocate this space before the function call and to deallocate it after the function call. The function caller should allocate at least 32 bytes (for the 4 registers), even if they are not all used.
  • The stack has to be 16 bytes aligned before any call instruction. Some functions might allocate 40 (0x28) bytes on the stack (32 bytes for the 4 registers and 8 bytes to align the stack from previous usage – the return RIP address pushed on the stack) for this purpose. You can find more details here.
  • Some registers are volatile and other are nonvolatile. This means that if we set some values into a register and call some function (e.g. Windows API) the volatile register will probably change while nonvolatile register will preserve their values.

More details about calling convention on Windows can be found here.

Function calling example

Let’s take a simple example in order to understand those things. Below is a function that does a simple addition, and it is called from main.

#include "stdafx.h"

int Add(long x, int y)
{
    int z = x + y;
    return z;
}

int main()
{
    Add(3, 4);
    return 0;
}

Here is a possible output, after removing all optimizations and security features.

Main function:

sub rsp,28
mov edx,4
mov ecx,3
call <consolex64.Add>
xor eax,eax
add rsp,28
ret

We can see the following:

  1. sub rsp,28 – This will allocate 0x28 (40) bytes on the stack, as we discussed: 32 bytes for the register arguments and 8 bytes for alignment.
  2. mov edx,4 – This will place in EDX register the second parameter. Since the number is small, there is no need to use RDX, the result is the same.
  3. mov ecx,3 – The value of the first argument is place in ECX register.
  4. call <consolex64.Add> – Call the “Add” function.
  5. xor eax,eax – Set EAX (or RAX) to 0, as it will be the return value of main.
  6. add rsp,28 – Clears the allocated stack space.
  7. ret – Return from main.

Add function:

mov dword ptr ss:[rsp+10],edx
mov dword ptr ss:[rsp+8],ecx
sub rsp,18
mov eax,dword ptr ss:[rsp+28]
mov ecx,dword ptr ss:[rsp+20]
add ecx,eax
mov eax,ecx
mov dword ptr ss:[rsp],eax
mov eax,dword ptr ss:[rsp]
add rsp,18
ret

Let’s see how this function works:

  1. mov dword ptr ss:[rsp+10],edx – As we know, the arguments are passed in ECX and EDX registers. But what if the function needs to use those registers (however, please note that some registers must be preserved by a function call, these registers are the following: RBX, RBP, RDI, RSI, R12, R13, R14 and R15)? In this case, the function will use the “shadow space” (“home space”) allocated by the function caller. With this instruction, the function saves on the shadow space the second argument (the value 4), from EDX register.
  2. mov dword ptr ss:[rsp+8],ecx – Similar to the previous instruction, this one will save on the stack the first argument (value 3) from the ECX register
  3. sub rsp,18 – Allocate 0x18 (or 24) bytes on the stack. This function does not call other function, so it is not needed to allocate at least 32 bytes. Also, since it does not call other functions, it is not required to align the stack to 16 bytes. I am not sure why it allocates 24 bytes, it looks like the “local variables area” on the stack has to be aligned to 16 bytes and the other 8 bytes might be used for the stack alignment (as previously mentioned).
  4. mov eax,dword ptr ss:[rsp+28] – Will place in EAX register the value of the second parameter (value 4).
  5. mov ecx,dword ptr ss:[rsp+20] – Will place in ECX register the value of the first parameter (value 3).
  6. add ecx,eax – Will add to ECX the value of the EAX register, so ECX will become 7.
  7. mov eax,ecx – Will save the same value (the sum) into EAX register.
  8. mov dword ptr ss:[rsp],eax and mov eax,dword ptr ss:[rsp] look like they are some effects of the removed optimizations, they don’t do anything useful.
  9. add rsp,18 – Cleanup the allocated stack space.
  10. ret – Return from the function

Writing ASM on Windows x64

There are multiple ways to write assembler on Windows x64. I will use NASM and the linker provided by Microsoft Visual Studio Community.

I will use the x64.asm file to write the assembler code, the NASM will output x64.obj and the linker will create x64.exe. To keep this process simple, I created a simple Windows Batch script:

del x64.obj
del x64.exe
nasm -f win64 x64.asm -o x64.obj
link /ENTRY:main /MACHINE:X64 /NODEFAULTLIB /SUBSYSTEM:CONSOLE x64.obj

You can run it using “x64 Native Tools Command Prompt for VS 2019” where “link” is available directly. Just not forget to add NASM binaries directory to the PATH environment variable.

To test the shellcode I open the resulted binary in x64bdg and go through the code step by step. This way, we can be sure everything is OK.

Before starting with the actual shellcode, we can start with the following:

BITS 64
SECTION .text
global main
main:

sub   RSP, 0x28                 ; 40 bytes of shadow space
and   RSP, 0FFFFFFFFFFFFFFF0h   ; Align the stack to a multiple of 16 bytes

This will specify a 64 bit code, with a “main” function in the “.text” (code) section. The code will also allocate some stack space and align the stack to a multiple of 16 bytes.

Find kernel32.dll base address

As we know, the first step in the shellcode development process for Windows is to find the base address of kernel32.dll, the memory address where it is loaded. This will help us to find its useful exported functions: GetProcAddress and LoadLibraryA which we can use to achive our goals.

We will start finding the TEB (Thread Environment Block), the structure that contains thread information in usermode and we can find it using GS register, ar gs:[0x00]. This structure also contains a pointer to the PEB (Process Envrionment Block) at offset 0x60.

The PEB contains the “Loader” (Ldr) at offset 0x18 which contains the “InMemoryOrder” list of modules at offset 0x20. As we did for x86, first module will be the executable, the second one ntdll.dll and the third one kernel32.dll which we want to find. This means we will go through a linked list (LIST_ENTRY structure which contains to LIST_ENTRY* pointers, Flink and Blink, 8 bytes each on x64).

After we find the third module, kernel32.dll, we just need to go to offset 0x20 to get its base address and we can start doing our stuff.

Below is how we can get the base address of kernel32.dll using PEB and store it in the RBX register:

; Parse PEB and find kernel32

xor rcx, rcx             ; RCX = 0
mov rax, [gs:rcx + 0x60] ; RAX = PEB
mov rax, [rax + 0x18]    ; RAX = PEB->Ldr
mov rsi, [rax + 0x20]    ; RSI = PEB->Ldr.InMemOrder
lodsq                    ; RAX = Second module
xchg rax, rsi            ; RAX = RSI, RSI = RAX
lodsq                    ; RAX = Third(kernel32)
mov rbx, [rax + 0x20]    ; RBX = Base address

Find the address of GetProcAddress function

It is really similar to find the address of GetProcAddress function, the only difference would be the offset of export table which is 0x88 instead of 0x78.

The steps are the same:

  1. Go to the PE header (offset 0x3c)
  2. Go to Export table (offset 0x88)
  3. Go to the names table (offset 0x20)
  4. Get the function name
  5. Check if it starts with “GetProcA”
  6. Go to the ordinals table (offset 0x24)
  7. Get function number
  8. Go to the address table (offset 0x1c)
  9. Get the function address

Below is the code that can help us find the address of GetProcAddress:

; Parse kernel32 PE

xor r8, r8                 ; Clear r8
mov r8d, [rbx + 0x3c]      ; R8D = DOS->e_lfanew offset
mov rdx, r8                ; RDX = DOS->e_lfanew
add rdx, rbx               ; RDX = PE Header
mov r8d, [rdx + 0x88]      ; R8D = Offset export table
add r8, rbx                ; R8 = Export table
xor rsi, rsi               ; Clear RSI
mov esi, [r8 + 0x20]       ; RSI = Offset namestable
add rsi, rbx               ; RSI = Names table
xor rcx, rcx               ; RCX = 0
mov r9, 0x41636f7250746547 ; GetProcA

; Loop through exported functions and find GetProcAddress

Get_Function:

inc rcx                    ; Increment the ordinal
xor rax, rax               ; RAX = 0
mov eax, [rsi + rcx * 4]   ; Get name offset
add rax, rbx               ; Get function name
cmp QWORD [rax], r9        ; GetProcA ?
jnz Get_Function
xor rsi, rsi               ; RSI = 0
mov esi, [r8 + 0x24]       ; ESI = Offset ordinals
add rsi, rbx               ; RSI = Ordinals table
mov cx, [rsi + rcx * 2]    ; Number of function
xor rsi, rsi               ; RSI = 0
mov esi, [r8 + 0x1c]       ; Offset address table
add rsi, rbx               ; ESI = Address table
xor rdx, rdx               ; RDX = 0
mov edx, [rsi + rcx * 4]   ; EDX = Pointer(offset)
add rdx, rbx               ; RDX = GetProcAddress
mov rdi, rdx               ; Save GetProcAddress in RDI

Please note that this has to be done carefully. Some structures from the PE file are not 8 bytes, while we need in the end 8 bytes pointers. This is why in the code above there are registers such as ESI or CX used.

Find the address of LoadLibraryA

Since we have the address of GetProcAddress and the base address of kernel32.dll, we can use them to call GetProcAddress(kernel32.dll, “LoadLibraryA”) and find the address of LoadLibraryA function.

However, it is something important we need to be careful about: we will use the stack to place our strings (e.g. “LoadLibraryA”) and this might break the stack alignment, so we need to make sure it is 16 bytes alligned. Also, we must not forget about the stack space that we need to allocate for a function call, because the function we call might use it. So, we need to place our string on the stack and just after this to allocate space for the function we call (e.g. GetProcAddress).

Finding the address of LoadLibraryA is pretty straightforward:

; Use GetProcAddress to find the address of LoadLibrary

mov rcx, 0x41797261          ; aryA
push rcx                     ; Push on the stack
mov rcx, 0x7262694c64616f4c  ; LoadLibr
push rcx                     ; Push on stack
mov rdx, rsp                 ; LoadLibraryA
mov rcx, rbx                 ; kernel32.dll base address
sub rsp, 0x30                ; Allocate stack space for function call
call rdi                     ; Call GetProcAddress
add rsp, 0x30                ; Cleanup allocated stack space
add rsp, 0x10                ; Clean space for LoadLibrary string
mov rsi, rax                 ; LoadLibrary saved in RSI

We put the “LoadLibraryA” string on the stack, setup RCX and RDX registers, allocate space on the stack for the function call, call GetProcAddress and cleanup the stack. As a result, we will store the LoadLibraryA address in the RSI register.

Load user32.dll using LoadLibraryA

Since we have the address of LoadLibraryA function, it is pretty simple to call LoadLibraryA(“user32.dll”) to load user32.dll and find out its base address which will be returned by LoadLibraryA.

mov rcx, 0x6c6c               ; ll
push rcx                      ; Push on the stack
mov rcx, 0x642e323372657375   ; user32.d
push rcx                      ; Push on stack
mov rcx, rsp                  ; user32.dll
sub rsp, 0x30                 ; Allocate stack space for function call
call rsi                      ; Call LoadLibraryA
add rsp, 0x30                 ; Cleanup allocated stack space
add rsp, 0x10                 ; Clean space for user32.dll string
mov r15, rax                  ; Base address of user32.dll in R15

The function will return the base address of the user32.dll module into RAX and we will save it in the R15 register.

Find the address of SwapMouseButton function

We have the address of GetProcAddress, the base address of user32.dll and we know the function is called “SwapMouseButton”. So we just need to call GetProcAddress(user32.dll, “SwapMouseButton”);

Please note that when we allocate space on stack for the function call, we do not allocate anymore 0x30 (48) bytes, we allocate only 0x28 (40) bytes. This is because to place our string (“SwapMouseButton”) on the stack we use 3 PUSH instructions, so we get 0x18 (24) bytes of data, which is not a multiple of 16. So we use 0x28 instead of 0x30 to align the stack to 16 bytes.

; Call GetProcAddress(user32.dll, "SwapMouseButton")

xor rcx, rcx                  ; RCX = 0
push rcx                      ; Push 0 on stack
mov rcx, 0x6e6f7474754265     ; eButton
push rcx                      ; Push on the stack
mov rcx, 0x73756f4d70617753   ; SwapMous
push rcx                      ; Push on stack
mov rdx, rsp                  ; SwapMouseButton
mov rcx, r15                  ; User32.dll base address
sub rsp, 0x28                 ; Allocate stack space for function call
call rdi                      ; Call GetProcAddress
add rsp, 0x28                 ; Cleanup allocated stack space
add rsp, 0x18                 ; Clean space for SwapMouseButton string
mov r15, rax                  ; SwapMouseButton in R15

GetProcAddress will return in RAX the address of SwapMouseButton function and we will save it into R15 register.

Call SwapMouseButton

Well, we have its address, it should be pretty easy to call it. We do not have any issue as we previously cleaned up and we do not need to alter the stack in this function call. So we just set the RCX register to 1 (meaning true) and call it.

; Call SwapMouseButton(true)

mov rcx, 1    ; true
call r15      ; SwapMouseButton(true)

Find the address of ExitProcess function

As we already did before, we use GetProcAddress to find the address of ExitProcess function exported by the kernel32.dll. We still have the kernel32.dll base address in RBX (which is a nonvolatile register and this is why it is used) so it is simple:

; Call GetProcAddress(kernel32.dll, "ExitProcess")

xor rcx, rcx                 ; RCX = 0
mov rcx, 0x737365            ; ess
push rcx                     ; Push on the stack
mov rcx, 0x636f725074697845  ; ExitProc
push rcx                     ; Push on stack
mov rdx, rsp                 ; ExitProcess
mov rcx, rbx                 ; Kernel32.dll base address
sub rsp, 0x30                ; Allocate stack space for function call
call rdi                     ; Call GetProcAddress
add rsp, 0x30                ; Cleanup allocated stack space
add rsp, 0x10                ; Clean space for ExitProcess string
mov r15, rax                 ; ExitProcess in R15

We save the address of ExitProcess function in R15 register.

ExitProcess

Since we do not want to let the process to crash, we can “gracefully” exit by calling the ExitProcess function. We have the address, the stack is aligned, we have just to call it.

; Call ExitProcess(0)

mov rcx, 0     ; Exit code 0
call r15       ; ExitProcess(0)

Conclusion

There are many articles about Windows shellcode development on x64, such as this one or this one, but I just wanted to tell the story my way, following the previously written articles.

The shellcode is far away from being optimized and it also contains NULL bytes. However, both of these limitations can be improved.

Shellcode development is fun and swithing from x86 to x64 is needed, because x86 will not be used too much in the future.

Or course, I will add support for Windows x64 in Shellcode Compiler.

If you have any question, please add a comment or contact me.

❌
❌