Normal view

There are new articles available, click to refresh the page.
Before yesterdayReverse Engineering

Native x86 User-mode System Calls Hooking

By: walied
27 July 2012 at 20:41
In this post i am going to explain how to implement system call hooking from user-mode for native x86 processes (i here refer to 32-bit processes running in 32-bit versions of Windows XP SP2 and SP3).

Let's have a look at the "ZwOpenProcess" function of Windows XP SP2 and of Windows XP SP3.

1) XP SP2


2) XP SP3

As you can see in the images above, EAX is set to 0x7A, the system call ordinal and EDX is made to point at 0x7FFE0300 in the _KUSER_SHARED_DATA page. Then comes a CALL instruction which jumps to the "KiFastSystemCall" function whose address is stored in 0x7FFE0300 (_KUSER_SHARED_DATA::SystemCall).

One difference we can see is that SYSENTER of XP SP2 is followed by 5 NOPs while in XP SP3 SYSENTER is directly followed by the RET of the "KiFastSystemCallRet" function.
 
The first thing one may think of to implement the user-mode system call hook in Windows XP SP3/SP2 is to overwrite the "_KUSER_SHARED_DATA::SystemCall" and "_KUSER_SHARED_DATA::SystemCallRet" fields. Unfortunately, this is not possible since the page is not writable and any attempt to change its memory protection constant always fails.

So, we should now turn to the "KiFastSystemCall" function and try to overwrite its very first instruction with a JMP instruction. Is this all? Let's see.

For XP SP2, it is okay to write a near jmp instruction (5-byte long) since we have enough space (filled with 5 NOPs) and this does not hurt the RET instruction of the "KiFastSystemCallRet" function. But for XP SP3, any attempt to write the near jmp instruction will hurt the "KiFastSystemCallRet" function. Any common method for both XP SP2 and SP3?

I thought about that and came up with something that worked for both service packs. If we allocate a memory page at an address which when converted from absolute to relative gives 0xC3 as the fifth byte of the new JMP instruction. For example, if we allocate a memory page at 0x3F910000, given that the "KiFastSystemCall" function is at 0x7C90E510, we get the new JMP instruction as a sequence of
 "\xE9\xEB\x1A\x00\xC3". You can check the source code of InjectHookLib for more information.

N.B. We can still use a short JMP by searching for any vacant 5 bytes in the range of -128 to +127 from the address of the "KiFastSystemCall" function. LEA ESP,[ESP] seems to be okay for both service packs.

N.B. With certain processors or under certain conditions e.g. disabled VT-x/AMD-V if using VirtualBox, the "KiFastSystemCall" function is not used at all and the "KiIntSystemCall" is used instead. In these cases, you can safely overwrite the first instructions of "KiIntSystemCall" function with a near JMP instruction as long as the code you hook to takes care of that.


Any ideas or suggestions are always very welcome.

You can follow me @waleedassar

Major / MinorSubsystemVersion

By: walied
5 August 2012 at 21:00
If you are still using Windows 2000, you must have noticed that certain executables refuse to run. Actually, this is due to the executables being built with Microsoft Visual Studio 2010 which sets the MajorSubSystemVersion and MinorSubsystemVersion in the PE header to 5 and 1. In other words, it creates executables to run on Windows XP (5.1) and above. This causes Windows 2000 (5.0) to refuse to load these executables.

Now, let's see where the check occurs and how to bypass it. The first place to check must be the kernel32 "CreateProcess" function.


If we start at address 0x7C4F1ECE, we can see a call to the ntdll "ZwQuerySection" function with the "InformationClass" parameter set to 1 (SectionImageInformation). After the "ZwQuerySection" function has returned successfully, the "SECTION_IMAGE_INFORMATION" structure should be filled with some useful data. Among the data returned are the executable's subsystem type and minor and major versions.

Then comes a check for the subsystem type. The subsystem type must be either GUI (IMAGE_SUBSYSTEM_WINDOWS_GUI) or console (IMAGE_SUBSYSTEM_WINDOWS_CUI). If it is not any of these two types, the "CreateProcess" function fails.


As you can see in the image above, at address 0x7C4F1F91, the major and minor subsystem versions extracted from the PE header via the "ZwQuerySection" function are passed to the "CheckSubSystem" function. If the "CheckSubSystem" function returns TRUE, the "CreateProcess" function proceeds and if it returns FALSE, the "CreateProcess" function fails as such. Now, let's check this function.


As you can see in the disassembly and C-code in the three images above, if the subsystem versions extracted from the PE header are less than 3.10, the "CheckSubsystem" function returns FALSE. Then comes the important part, if the "MajorSubsystemVersion" extracted from the PE header is greater than the value of the "NtMajorVersion" field (The field is at offset 0x26C from the _KUSER_SHARED_DATA page), the function fails. The same applies for "MinorSubSystemVersion" if "MajorSubsystemVersion" and "NtMajorVersion" are equal.

N.B. NtMajorVersion and NtMinorVersion are usually the same as the OS version info. returned by the kernel32 "GetVersion" or "GetVersionEx" functions.

As a developer, bypassing the check can easily be done by using Platform Toolset v9 in microsoft visual studio (thanks @skier_t) or by directly editing the PE header of the executable using any PE Editor. 

Imagine the scenario where the executable in question has CRC check upon its PE header as part of the implemented protection scheme. In this case, as a user, you won't be able to run the executable since any attempt to edit the PE header will cause the CRC check to fail. This leads us to find a system-wide solution. Yes, patching.

Speaking of patching, we have two options:

1) The first is to patch a couple of addresses inside the "CheckSubSystem" function (Actually, i don't recommend patching the return value check).

To implement the check bypass, i created  a dynamic link library, hooksubsystem.dll that once injected into a process bypasses the subsystem version check.

You can find the source code of hooksubsystem.dll here.
You can find hooksubsystem.dll here.

One drawback of this method is that it is Service pack-specific since the "CheckSubSystem" function is not exported by kernel32.dll.

2) The second is to patch the "ZwQuerySection" function such that we can manipulate the data returned in the "SECTION_IMAGE_INFORMATION" structure before being used by "CheckSubSystem" function.

To implement this method, i created another version of hooksubsystem.dll. You can find it here and its source code from here.

I also created a small application, BypassSubSystem.exe, which installs a system-wide hook of the type provided in the command line arguments. It can be used in the way you see in the image below.
BypassSubSystem.exe can be downloaded from here and its source code from here.

In a future post i will go deeper into this topic. 


You can follow me on Twitter @waleedassar

Anti-Dumping - Part 3

By: walied
8 September 2012 at 13:46
In this post i will share with you a couple of small tricks that can be deployed to harden or defeat memory dumping attempts. As i have just mentioned they are small tricks, so don't flame at me.

The first trick briefly involves appending a special section header to the section table of your executable. The new section header is to be set with a huge virtual size. Don't worry, this is not going to affect the file size (on disk) since we can set the raw size of the new section to zero (completely virtual section).

This results in the the "SizeOfImage" field of the IMAGE_OPTIONAL_HEADER structure being huge as well.

Unlike old anti-dumping tricks, we don't have to forge the "SizeOfImage" field of PEB.LoaderData or that in the PE Header memory page. Here, we give the dumping tools a huge value that they are very likely to fail to allocate using e.g. the "VirtualAlloc" function or its likes. Of course, this trick does not defeat dumping tools that read the memory of processes page by page.

Since the raw size of this huge section is zero, then the new section will be zero-initialized and the OS memory manager will throw it away making the memory usage of such process as smooth as possible.

It is now obvious that the new section should be left as it is. Your code should never read, write, or execute it. As any attempt to e.g. write to it results in the OS memory manager restoring the whole section into memory.

Here you can find a demo.

The second trick was first mentioned by Kris Kaspersky. The trick is very nice and simple. If we set the memory protection of one section as PAGE_GUARD, then the "ReadProcessMemory" function will fail usually with the system error code ERROR_PARTIAL_COPY, 0x12B. To defeat this trick, dumping tools are now using the "VirtualProtectEx" function to remove the PAGE_GUARD attribute, then read the section, and finally restore the PAGE_GUARD attribute.

To enhance this trick, i have created a watching thread that infinitely calls the "VirtualQuery" function and once it detects that PAGE_GUARD is removed from the section's memory protection attributes, it just terminates the process. Here is the code and here is a demo.

N.B. For the second trick to be effective, you should place the sections you want to protect after the PAGE_GUARD section so that the process terminates before them being dumped.

N.B. The second trick theoretically has better chances to work on multi-processor systems than on single-processor ones.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

PAGE_EXECUTE_WRITECOPY As Anti-Debug Trick

By: walied
28 September 2012 at 14:48
In this post, i will share with you a poorly discussed anti-debug trick that i may be the first one to discover or disclose.

Now let's start with a quick introduction. If a memory page with the "PAGE_EXECUTE_READWRITE" access protection attributes is requested from the OS, then a page with the "PAGE_EXECUTE_WRITECOPY" attributes, not the "PAGE_EXECUTE_READWRITE" attributes is given.   

The reason for that behavior is so simple, that is, the OS memory manager wants to physically share the page between all the process instances (since it is guaranteed to be the same in all the process instances before any write).

Once you make the first write to the new page, the OS assigns a private copy of the page to the process in which the write occurrs and the page attributes change to PAGE_EXECUTE_READWRITE.

N.B. The same applies to pages requested with the PAGE_READWRITE attributes. They are initially given the "PAGE_WRITECOPY" attributes and after the first write, they turn into PAGE_READWRITE.

N.B. PAGE_EXECUTE_WRITECOPY and PAGE_WRITECOPY are not valid parameters to the "VirtualAlloc" or "VirtualAllocEx" function.

Now if you have a section in your executable with the read, write, and execute access attributes (See section xyz in the image below), then the abovementioned applies to it.
The access protection attributes given to section xyz causes its memory page to be mapped with the "PAGE_EXECUTE_WRITECOPY" attributes. See image below.
If we design section xyz in a way that it is never written to (e.g. does not contain self-modifying code) throughout the whole lifetime of the process, then the page will always be PAGE_EXECUTE_WRITECOPY even at process exit.

If the attributes change to PAGE_EXECUTE_READWRITE, that means the page must have been written to e.g. when another process, mostly a debugger, had called the "WriteProcessMemory" function while stepping-over, tracing-over, or placing software breakpoints. That definitely means the process is being debugged. See images below.

Now our executable of question can call the "VirtualQuery" function to check the page protection attributes of section xyz. If it is something other than PAGE_EXECUTE_WRITECOPY, then a debugger is present and the process should quit.

The good thing about this trick is that, unlike the 0xCC-scanning trick, it can detect software breakpoints even if there are no longer active (removed by the debugger).

Also, most debuggers in their default settings are used to place software breakpoints on modules' entry points, which means the page protection attributes change even before the reverse engineer starts to debug the module.

A common way to bypass this trick for stepping-over and tracing-over is to use hardware breakpoints which is an available option in OllyDbg v1.10 and OllyDbg v2.01 (alpha 4).

A simple demo can be found here and its source code from here.

Any ideas or comments are very welcome.

You can follow me on Twitter @waleedassar

Virtual PC Machine Reset

By: walied
26 October 2012 at 00:58
While playing with Virtual PC 2007, i came up with an interesting trick not only to detect Virtual PC 2007 but also to reset (restart) the Virtual Machine.

The trick is so simple that all you need to do in your code is execute "\x0F\xC7\xC8\x05\x00"

Executing that x86 instruction sequence causes the following message to pop up.
A POC can be found here and its source from here.

N.B. Other x86 instruction sequences can cause the same result.

Any comments or ideas are welcome.
You can follow me on Twitter @waleedassar

Virtual PC vs. Resume Flag

By: walied
27 October 2012 at 19:17
In this post i will show you another weird behavior of Virtual PC 2007. I encountered this weird behavior while playing with Virtual PC 2007 with Windows XP SP3 installed inside. The behavior is all about how a Windows XP Virtual PC virtual machine handles the Resume Flag.

For those who don't know, the Resume Flag (Flag no. 16 in the EFLAGS register) is used to temporarily disable Hardware Breakpoints exceptions for one instruction. Without it, a Hardware-Breakpoint-On-Execution would infinitely trigger an EXCEPTION_SINGLE_STEP exception.

According to @osxreverser, Windows XP does not support the Resume Flag (RF). I was also amazed to see that also WinDbg and OllyDbg v1.10 don't use the resume flag. They use the Trap Flag (TF) instead.

Running a simple executable that on purpose makes use of the Resume Flag inside an XP Virtual PC Virtual Machine, i found out that execution flows normally as if XP supports the resume flag.

Given the finding above, i created a small executable that tries to detect if it is running inside Virtual PC 2007.
You can find it here and its source code from here.

I guess the finding above only applies if the host operating system itself supports the resume flag e.g. Windows 7 or later.

N.B. This topic is still under research.

Please don't hesitate to leave a comment.
You can also follow me on Twitter @waleedassar

Virtual PC vs. DR7

By: walied
29 October 2012 at 17:07
In this post i will show you another weird behavior of Virtual PC 2007. This time the trick is about how Virtual PC handles the debug register DR7 known as Debug Control register.

For those who don't know, DR7 is used to specify the conditions under which the EXCEPTION_SINGLE_STEP exception is triggered for addresses held in DR0-DR3.
If we want to dissect DR7, it would be as follows:
Bit 0     ---> DR0 is locally enabled.
Bit 1     ---> DR0 is globally enabled.
Bit 2     ---> DR1 is locally enabled.
Bit 3     ---> DR1 is globally enabled.
Bit 4     ---> DR2 is locally enabled.
Bit 5     ---> DR2 is globally enabled.
Bit 6     ---> DR3 is locally enabled.
Bit 7     ---> DR3 is globally enabled.

Bit 8     ---> The "Local Enable Bit". Also for "Last Branch" tracing.
Bit 9     ---> The "Global Enable Bit". Also for "Last Branch" tracing.
Bit 10   ---> Reserved.
Bit 11  ----> Reserved.
Bit 12 -----> IR
Bit 13 -----> GD
Bit 14 -----> TB
Bit 15 -----> TT

Bit 16 -----
                  | ----> When DR0 is triggered.
Bit 17 -----
Bit 18 -----
                  | ----> Size of DR0's trigger condition.
Bit 19 -----
Bit 20 -----
                  | ----> When DR1 is triggered.
Bit 21 -----
Bit 22 -----
                  | ----> Size of DR1's trigger condition.
Bit 23 -----

Bit 24 -----
                  | ----> When DR2 is triggered.
Bit 25 -----
Bit 26 -----
                  | ----> Size of DR2's trigger condition.
Bit 27 -----
Bit 28 -----
                  | ----> When DR3 is triggered.
Bit 29 -----
Bit 30 -----
                  | ----> Size of DR3's trigger condition.
Bit 31 -----

For example:
Imagine we want to place a Hardware-Breakpoint-On-Execution for an instruction at 0x401000. See image below.

What the debugger does in this case is:
1) Sets DR0 to 0x401000.
2) Sets bit 0 of DR7 to 1.
3) Sets bit 8 of DR7 to 1 (for backward compatibility).
4) Sets bits 16 and 17 of DR7 to 00 (00 means On-Execution).

And if we then want to place a Hardware-Breakpoint-On-Write-Four for memory at 0x10000. See image below.
What the debugger does in this case is:
1) Sets DR1 to 0x10000.
2) Sets bit 2 of DR7 to 1.
3) Sets bit 8 of DR7 to 1 (for backward compatibility).
4) Sets bits  20 and 21 of DR7 to 01 (01 means On-Write).
5) Sets bits  22 and 23 of DR7 to 11 (11 for the size of trigger condition means to watch four bytes).

Now let's try to get back to the main topic of this post.

Hereafter, i will call the second byte of DR7 (byte 0xBB of 0xDDCCBBAA) the flags byte, just for brevity.

On Windows XP, if we set the flags byte to any value ranging from 0x00 to 0xFF, the breakpoint is always active and the exception is always raised whenever the trigger condition is met e.g. if we set DR7 to 0x0000FF01 (a hardware breakpoint On-Execution with Local enable, global enable, reserved, reserved, IR, GD, TB, and TT bits set), the exception is raised whenever the address in DR0 executes.
The same applies for Windows 7.

What about Virtual PC 2007? 

In Virtual PC 2007 with Windows XP installed inside, with certain flags set in DR7 e.g. 0x00003F01, the breakpoint is sometimes not activated.

So, i created simple executable that brute-forces the DR7's flag byte and based on the number of times the exception is raised it determines whether it is running inside Virtual PC 2007.

You can download the demo from here and its source code from here.
 
N.B. It has been tested with Windows XP SP2 and SP3.
N.B. VirtualBox is also affected, but i will leave this for a future post.


Any comments or ideas are very welcome. You can also follow me on Twitter @waleedassar

Virtual PC vs. CPUID

By: walied
30 October 2012 at 18:14
In this post i will show another weird behavior of Virtual PC 2007. This time it is about the CPUID instruction. As most of you already know well what the CPUID is for and how it works, i will directly jump into the main topic.

In Virtual PC, executing CPUID disables interrupts for one instruction. Oh, wait, how is that?

Imagine we want to trace a sequence of x86 instruction. What the debugger does in that situation is as follows:
1) Calls the "GetThreadContext" function to extract the current context of the thread executing this sequence of instructions.
2) Modifies the "EFLAGS" register of the "CONTEXT" structure such that the Trap flag (TF) is set. EFLAGS is situated at offset 0xC0 from the start of the structure for the x86 version. TF is bit number 8 (0x100).
3) Calls the "SetThreadContext" and "ContinueDebugEvent" functions to continue execution.

When the trap flag is set, after executing an x86 instruction, an exception EXCEPTION_SINGLE_STEP is raised and trap flag is cleared.

The debugger receives the exception and resets the trap flag as shown above and so on.

Disable interrupts, what does that mean?
Executing certain instructions when the trap flag is set, no EXCEPTION_SINGLE_STEP exception is raised. The exception is raised after executing the instruction following them. One example instruction that disables interrupts is POP SS. POP SS has been used for a long time as an anti-tracing trick. Since it disables interrupts for one instruction, dumping the EFLAGS register to stack via . PUSHFD reveals the Trap Flag.


Executing CPUID in Virtual PC 2007, i found out that it has the same effect as POP SS. CPUID disables interrupts for one instruction.

I created a simple demo that exploits this bug to detect whether it is running inside Virtual PC 2007. It has been tested on Windows XP SP2 running inside Virtual PC 2007.

Reason for that is still under research but it seems to be due to the Virtualized CPUID (Intel FlexMigration) hardware support since the trick only works if Hardware Virtualization is enabled.

You can download the demo from here and its source code from here.

N.B. VirtualBox v4.1.22 r80657 is also affected by this bug.

N.B. Parallels Desktop is reportedly affected by this bug.

You can follow me on Twitter @waleedassar.

SizeOfStackReserve As Anti-Attaching Trick

By: walied
6 November 2012 at 01:08
In this post i will show you a new anti-attaching trick that has been tested on Windows 7. It does not work on Windows XP due to the changes Microsoft introduced in the way threads are created.

Let's first see how thread creation in Windows 7 is different from that of Windows XP.

In Windows XP, whenever you call the kernel32 "CreateRemoteThread" or the ntdll "RtlCreateUserThread" function to create a new thread, the following occurs underneath:

The kernel32 "BaseCreateStack" or ntdll "RtlpCreateStack" function is called in case of  "CreateRemoteThread" or "RtlCreateUserThread" successively to allocate space for the new thread's stack in the address space of the target process.

N.B. The kernel32 "CreateThread" function is only a call to the kernel32 "CreateRemoteThread" function with the "hProcess" parameter set to -1.

Since there is no big difference between the "BaseCreateStack" and "RtlpCreateStack" functions, it is enough for us to take the "BaseCreateStack" function in disassembly in this post.

The "BaseCreateStack" function takes four parameters, only three of them are of interest. The first parameter is the handle to the process in which we are about to allocate user stack memory. The second parameter is the size in bytes of user stack memory to COMMIT into the target process's address space. The third parameter is the size in bytes of user stack memory to RESERVE into the target process's address space. Hereafter, i will refer to them as hProcess, CommitSize, and ReserveSize.

N.B. If you call the "CreateRemoteThread" function with the "dwStackSize" parameter set to e.g. 0x10000, then BaseCreateStack commits 0x10000 bytes. On the other side, if the "CreateRemoteThread" function is called with the "dwCreationFlags" parameter having the "STACK_SIZE_PARAM_IS_A_RESERVATION" flag set, then BaseCreateStack Reserves 0x10000.

Now, let's dive into the "BaseCreateStack" function and see what is going on inside.

1) It extracts the value of ImageBase from the PEB of the process in which it is called, the value is then passed to the "RtlImageNtHeader" function. If the "RtlImageNtHeader" function fails an error ERROR_BAD_EXE_FORMAT is returned.


2)
If the "ReserveSize" parameter passed to it is zero, it uses the value of the "SizeOfStackReserve" field of the IMAGE_OPTIONAL_HEADER structure.



3) Similarly, If the "CommitSize" parameter passed to it is zero, it uses the value of the "SizeOfStackCommit" field of the IMAGE_OPTIONAL_HEADER structure. Please remember that the values are extracted from the PE header of the main executable of the process that is calling the "CreateRemoteThread" function, not the target process.



4) It then makes some sanitization checks on the ReserveSize and CommitSize, for example to ensure that the commit size is never greater than the reserve size. It also checks to ensure that the commit size is never lower than the value of the "MinimumStackCommit" field of PEB.




5) It calls the "ZwAllocateVirtualMemory" function to reserve memory of size ReserveSize into the address space of the target process with the PAGE_READWRITE protection attribute.


6) It calls the "ZwAllocateVirtualMemory" function to commit CommitSize+0x1000 of the memory reserved in the previous step.



7) The extra page committed in the previous step is then given the PAGE_GUARD protection attribute.


Here is a similar reversed code of the "BaseCreateStack" function. From here.


The reason why a PAGE_GUARD page always exists at the end of committed stack is for the kernel to be notified each time the stack needs to be expanded. For example, if a thread tries to touch its stack's PAGE_GUARD page, an STATUS_GUARD_PAGE_VIOLATION exception is raised and swallowed by the kernel and it automatically commits one more page.

N.B. If a thread tries to touch the PAGE_GUARD page of another thread's stack, the exception is passed to the application or the debugger.

After the stack has been allocated in the target process's address space, the "CreateRemoteThread" function formulates a CONTEXT structure for the new thread. After the previous steps have completed successfully, the "ZwCreateThread" function is called to initiate the new remote thread.

Now let's see how threads are created in Windows 7.

In Windows 7, if we take the "CreateRemoteThread" or "RtlCreateUserThread" function into disassembly, we will see that the "dwStackSize" is directly passed to the "ZwCreateThreadEx" function.
So, our first assumption here is that stack allocation is now forwarded to the kernel. Also, we can note that now in later versions of Windows than XP, the "ZwCreateThreadEx" function is by default used for thread creation instead of the "ZwCreateThread" function.

Now let's check the "NtCreateThreadEx" function in ntoskrnl.exe.

We can easily see in "NtCreateThreadEx" a call to the "PspCreateThread" function.
The "PspCreateThread" function calls the "PspAllocateThread" function which calls "RtlCreateUserStack" function.


The "RtlCreateUserStack" function is called after attaching to the target process's address space. Now let's look at the "RtlCreateUserStack" function in disassembly.

Now it is easy to see that it reads the PE header from the main executable of the process in which the remote thread is being created unlike XP where information was extracted from the main executable of the process that creates the thread. Yeah, it seems Microsoft fixed a very minor issue.


From the image above, it is also easy to conclude that if we forced the "RtlImageNtHeader" function to fail, we can prevent any foreign process including the debugger from attaching to our process. The easiest way to accomplish that is by erasing the PE header at runtime.  Any call to ZwCreateThreadEx as part of calling the "DebugActiveprocess" function (Used for attaching to a running process) would fail. For more information and examples, please refer to my previous post.

N.B. DebugActiveProcess calls DbgUiIssueRemoteBreakin which calls ~RtlCreateUserThread which calls "ZwCreateThreadEx".

One may say, "Erasing the whole PE header may render many APIs which read from the PE header useless e.g. FindResource or GetProcAddress". My answer will be "Yes, you are right".

So, we should find a smarter way to do it.

Okay, let's continue disassembling the "RtlCreateUserStack" function.


As you can see in the image above if the size of stack commit argument passed to it is zero, it takes the value of the "SizeOfStackCommit" field from the PE header. The same measure is taken if the size of stack reserve passed is zero. It is also noteworthy that if both the size of stack commit argument passed and "SizeOfStackCommit" of the PE header are zero, the commit size becomes 0x4000 (The default commit size is 0x4000).

The function then checks the size of stack commit against the size of stack reserve. If the size of stack commit happens to be greater, then the size of stack reserve is adjusted to be greater.

The function then ensures that the size to be committed is not less than the "MinimumStackCommit" field of  the process's PEB. If it is less, the size to be committed is adjusted.


The function then calls the "ZwSetInformationProcess" function with the "ProcessInformationClass" parameter set to 0x29 (ProcessThreadStackAllocation). The size to be reserved is passed in the 4th member of the structure passed in the "ProcessInformation" parameter.

Now let's quickly have a look at the "NtSetInformationProcess" function.

As you can see in the two images above, the value of the 4th member of the structure passed to the "ZwSetInformationProcess" function is used as the "RegionSize" parameter passed to the "ZwAllocateVirtualMemory" function.

Given this knowledge, if we at runtime change the value of the "SizeOfStackReserve" field of the PE header to a huge value, then we can cause the "ZwAllocateVirtualMemory", "ZwSetInformationProcess", "RtlCreateUserThread", "PspAllocateThread", "PspCreateThread", and "NtCreateThreadEx" functions to successively fail preventing any foreign processes including debuggers from creating any thread in our process.

A demo can be found here and its source code from here.

Any comments or ideas are more than welcome.

You can follow me on Twitter @waleedassar

Defeating Memory Breakpoints

By: walied
12 November 2012 at 21:20
In this post i will show you a couple of tricks that can be used to defeat memory breakpoints. First i should explain what memory breakpoints are and how they work.

Anyone who has spent some time in the field of software protection and debuggers must have heard of Memory breakpoints. Actually, memory breakpoints were not extensively used in the past but since more and more protection schemes implement anti-INT3 and anti-Hardware breakpoints tricks, reverse engineers started to use memory breakpoints to avoid detection.

The idea of memory breakpoints is so simple. Imagine that we want to place a memory breakpoint at address 0x402005 (On-Execution), what the debugger theoretically does is as follows:

1) Marks the memory page which the address 0x402005 belongs to (page 0x402000) as guarded via calling the "VirtualProtectEx" or "ZwProtectVirtualMemory" function with the "flNewProtect" parameter having the "PAGE_GUARD" protection attribute set. In this case page 0x402000 is originally PAGE_EXECUTE_READ 0x20 and after placing the memory breakpoint it becomes PAGE_EXECUTE_READ|PAGE_GUARD 0x120.

2) Each time the guarded page is touched whether read from, written to, or executes, then an exception STATUS_GUARD_PAGE_VIOLATION 0x80000001 is raised and the debugger receives a debug event of type  EXCEPTION_DEBUG_EVENT.

3) The debugger then inspects various fields in the "EXCEPTION_RECORD" structure of the "DEBUG_EVENT" structure to determine the reason why the exception was raised.
If the following conditions are met, then the debugger figures out that instruction at 0x402005 is about to execute i.e. breakpoint reached and that it should break accordingly.
a) The "ExceptionCode" field is set to STATUS_GUARD_PAGE_VIOLATION 0x80000001. b) The "NumberParameters" field is greater than or equal to 2. c) The "ExceptionInformation[0]" field is set to 8. d) The "ExceptionInformation[1]" field is set to 0x402005. The image below represents something very similar.


If any of the above mentioned conditions is not met, then the debugger figures out it is not the breakpoint. Whether the breakpoint is hit or not, the debugger resets the "PAGE_GUARD" protection attribute.

Surprisingly, even though this is the typical way debuggers should implement memory breakpoints, OllyDbg and many other user-mode debuggers implement memory breakpoints in a slightly different way.

Let's first take OllyDbg v1.10 and see how it implements memory breakpoints.

If you already use OllyDbg v1.10, you should already know that it has only two kinds of memory breakpoints, On-Access and On-Write. On-Access memory breakpoints trigger anytime the page is touched and On-Write memory breakpoints trigger anytime the page is written to.

Trying to reverse OllyDbg v1.10 to see how it implements each type, i found out that:

1) For On-Access memory breakpoints, they are implemented by marking the page that the breakpoint address belongs to as PAGE_NOACESS. PAGE_NOACCESS means that anytime the page is touched, an exception STATUS_ACCESS_VIOLATION is raised. The debugger then receives the debug event and inspects fields in the "EXCEPTION_RECORD" structure in a similar way to the conventional method mentioned above.

2) For On-Write memory breakpoints, they are implemented by depriving the page which the breakpoint address belongs to of the write access right via setting the "flNewProtect" parameter passed to the "VirtualProtectEx" function to PAGE_EXECUTE_READ. Every time the page is written to, an exception STATUS_ACCESS_VIOLATION is received. The debugger then receives the debug event and inspects fields in the "EXCEPTION_RECORD" structure in a similar way to the conventional method mentioned above. Here lies a bug in OllyDbg v1.10 since it assumes that the memory protection of any single page in the process address space can be turned into PAGE_EXECUTE_READ while this is not true for example memory page at 0x10000 can never be executable (Windows 7).

After we have seen how memory breakpoints are implemented, i will show you two tricks that can be used as anti-memory-breakpoints.

Trick 1)

Given the knowledge above, we can conclude that in order to defeat memory breakpoints esp. those of type On-Execution, we should cause the "VirtualProtectEx" function to fail. How is that possible?
By copying our code to a dynamically-allocated memory page whose page protection attributes can be executable and in the same time can not be guarded or no-access. This type of memory pages does really exist. For every thread you create, the kernel allocates one page (three pages in case of Wow64 processes) for the TEB. The TEB page(s) can't be non-writable and can't be assigned the "PAGE_GUARD" protection attribute. How can this be implemented?
All you have to do to implement this trick is call the "CreateThread" function with the "dwCreationFlags" parameter set to CREATE_SUSPENDED. At this point, we have the new thread's TEB with the page protection attributes set to PAGE_READWRITE. The next thing we should do is make the TEB page executable by calling the "VirtualProtect" function with the "flNewProtect" parameter set to PAGE_EXECUTE_READWRITE.

You can use this demo to test this trick.

N.B. For more stealthy way to conceal the point at which the page protection is changed to executable, use the "VirtualAlloc" function instead of "VirtualProtect". The allocation type in this case must be MEM_COMMIT only.

Trick 2)

This trick can easily detect memory breakpoints. It relies on the fact that the "ReadProcessMemory" function returns false if you try to read guarded or no-access memory. To use this trick, all you have to do is call the "ReadProcessMemory" function with the "Handle" parameter set to 0xFFFFFFFF, the "lpBaseAddress" parameter set to the image base, and the "nSize" parameter set to the size of image. If it returns false, then at least one memory breakpoint is present.

You can use this demo to test this trick.

N.B. Certain executables have gap inaccessible pages e.g. those pages intended for anti-dumping described in a previous post. So you have to take care of that if implementing this trick.

N.B. ReadProcessMemory has also been used as a stealthy way to read memory without triggering Hardware Breakpoints.


Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar





OllyDbg RaiseException Bug

By: walied
12 November 2012 at 22:48
In this post i will show you a bug in OllyDbg that can be used to detect its presence. The trick is so easy that all you have to do is call the "RaiseException" function with the "dwExceptionCode" parameter set to EXCEPTION_BREAKPOINT 0x80000003. The response depends on the OllyDbg version used. If it is v1.10, then the exception is going to be silently swallowed by the debugger and the registered exception handler is not called. In v2.01 (alpha 4), several message boxes pop up and the exception handler is not called either. Only v2.01 (beta 2) is immune to this bug.



The reason behind this bug is OllyDbg trying to read the x86 instruction pointed to by the "ExceptionAddress" field of the "EXCEPTION_RECORD" structure to ensure it is 0xCC or 0x03. In case of EXCEPTION_BREAKPOINT exceptions raised by explicitly calling the "RaiseException" function, the instructions at ExceptionAddress is definitely not 0xCC or 0x03.


You can find a demo here and its source code from here.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

VirtualBox CPUID Discrepancy

By: walied
12 November 2012 at 23:53
In this post i will show you a weird issue i have lately found in VirtualBox. This issue is seen only if VirtualBox is running without hardware virtualization support (VT-x/AMD-V).

For example, when Windows XP is running in VirtualBox with no hardware virtualization support, it is forced to use INT 2E to make system calls instead of SYSENTER. This is because SYSENTER is apparently not supported by VirtualBox. The problem here is that in this case the CPUID instruction still detects supported SYSENTER/SYSEXIT instructions.

We can use this discrepancy to detect VirtualBox (only if running with no hardware virtualization). All we have to do is execute CPUID (Leaf 1) and if we have bit 0x800 of EDX set, then execute SYSENTER in the form of any system call e.g. ZwDelayExecution. If an EXCEPTION_ILLEGAL_INSTRUCTION 0xC000001D is raised, then VirtualBox is present.


You can find a demo here and source code from here.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Hidding Threads From Debuggers

By: walied
23 November 2012 at 05:05
In this post i will take into discussion an old anti-debug trick that many of us know well. The trick is the ability of our code to hide specific threads from debuggers. This is usually achieved by calling the ntdll "ZwSetInformationThread" function with the "ThreadInformationClass" parameter set to ThreadHideFromDebugger 0x11. Sample code for this trick can be found here.

If we take the "ZwSetInformationThread" function into disassembly, we can easily see that the "ThreadInformationLength" parameter must be zero for the function call to succeed, otherwise ERROR_BAD_LENGTH is returned. See image below.

 And here is the 64-bit version

As you can see from the two images above, the whole function call ends up setting the "HideFromDebugger" bit of the "_ETHREAD" structure. Once this flag has been set, the kernel guarantees that the debugger will never receive any debug events from the corresponding thread.

For example, let's take the LOAD_DLL_DEBUG_EVENT events. As you know, any time a module is loaded into the address space of specific process, the debugger is notified of this action through the LOAD_DLL_DEBUG_EVENT events.The debugger then inspects various interesting fields in the "LOAD_DLL_DEBUG_INFO" structure e.g. ImageBase. Depending on the debugger configuration, the debugger notifies you of that or not. You can see this if you instruct OllyDbg to break on new module.

The two images above show how OllyDbg acts if a normal (not hidden) thread loads a new DLL. It is as follows:
1) Thread Loads a new DLL via calling e.g. the "LoadLibrary" function.

2) The "LoadLibrary" function wraps up a call to the ntdll "ZwMapViewOfSection" function.

3) The kernel mode part of ZwMapViewOfSection calls the "DbgkMapViewOfSection" function.

4) The "DbgkMapViewOfSection" function queries both the "HideFromDebugger" bit of the "_ETHREAD" structure and the value of the "DebugPort" field of the "_EPROCESS" structure. If the "HideFromDebugger" bit is not set and the "DebugPort" field is set, then the function builds the "LOAD_DLL_DEBUG_INFO" structure and calls the "DbgkpSendApiMessage" function which is responsible for delivering the debug event to the attached debugger.
On the other side, if the "HideFromDebugger" bit is set, DbgkMapViewOfSection returns immediately without delivering the debug event. See images below.


N.B. Regarding the UN/LOAD_DLL_DEBUG_EVENT's, there are other factors that determine whether or not the debug event is going to be delivered to debugger e.g. the "SuppressDebugMsg" bit of the Thread Environment Block (TEB).

5)  In the debugger, the "WaitForDebugEvent" function returns with the "dwDebugEventCode" field set to LOAD_DLL_DEBUG_EVENT 0x6. Given this, the debugger figures out that a new module has just been loaded and that it should inspect the "LOAD_DLL_DEBUG_INFO" structure to extract the new image base, file handle, etc.

6) After extracting info. from the "LOAD_DLL_DEBUG_INFO" structure, the debugger calls the "ContinueDebugEvent" function to continue executing the thread.

Similar to LOAD_DLL_DEBUG_EVENT's, debuggers never get notified of exceptions raised in the scope of hidden threads. To ensure that let's have a look at the "DbgkForwardException" function.

As you can see in the image above, the "HideFromDebugger" bit of the "_ETHREAD" structure is queried here as well.

Conclusion: When the "HideFromDebugger" bit flag of the "_ETHREAD" structure is set, the thread will not receive any debug events.

If we look again at the "NtSetInformationThread" function in disassembly, we will see that the function call is one-way i.e. you can make this function call to hide the thread from debugger but you can not make this call to un-hide the thread from debuggers.

Let's have a look at the "ZwQueryInformationThread" function. As the name implies, we can use this function to determine if a specific thread is hidden from debuggers. See below.

And here is the 64-bit version.

As you can see from the two images above, the "ThreadInformationLength" parameter must be one for this function call to succeed. If it is one as expected, nothing surprising is seen, the function just sets the first byte pointed to by the "ThreadInformation" parameter to one if the "HideFromDebugger" bit of the "_ETHREAD" structure is set. Given this knowledge, i have created a small OllyDbg v1.10 plugin to detect any hidden thread in the process being debugged esp. if we are attaching to an active process. The plugin is called HiddenThreads. You download it from here and its source code from here.

Unfortunately, in older versions of Windows e.g. XP, the "ZwQueryInformationThread" function can't be used to detect if a thread is hidden from debuggers as the ThreadHideFromDebugger information class 0x11 is simply not implemented. The function call returns ERROR_INVALID_PARAMETER.

Now that we have seen how to hide a thread from debuggers, how this works under the hood, and how to detect if a thread is hidden from debuggers, let's try to find another way to hide the thread other than calling the "ZwSetInformationThread" function.

With the introduction of the "ZwCreateThreadEx" function e.g. Windows Vista and 7, a new flags parameter is present. This flag causes new threads to be created hidden from debuggers i.e. you don't need to call the "ZwSetInformationThread" function. If we set this parameter (the 7th parameter) to 0x4, then the new thread will be hidden from debuggers. In this case, setting the "HideFromDebugger" bit occurs in the "PspAllocateThread" function. See image below.


You can find a demo here and its source code from here.


This post was written based on debugging sessions on Windows 7 64-bit. This is why you see me switching from x86 to x64.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

SuppressDebugMsg As Anti-Debug Trick

By: walied
24 November 2012 at 22:06
In this post i will show you a new anti-debug trick that affects many debuggers e.g. WinDbg and IDA Debugger.

When you load a module into the address space of a process usually via calling e.g.  the kernel32 "LoadLibrary" function, the debugger is notified of this through the LOAD_DLL_DEBUG_EVENT event. This occurs at the point the "NtMapViewOfSection" function calls the "DbgkMapViewOfSection" function.

As we saw in the previous post, the "HideFromDebugger" flag of the "_ETHREAD" structure and the "DebugPort" field of the "_EPROCESS" structure are queried. If the "HideFromDebugger" flag is not set and the "DebugPort" field is set, the debug event is delivered to the debugger but only after the return value of the "DbgkpSuppressDbgMsg" function is checked.

If the "DbgkpSuppressDbgMsg" function returns false, the debug event is delivered to the debugger and vice versa. Now let's see the "DbgkpSuppressDbgMsg" function in disassembly.


As you can see in the image below, it checks the "SuppressDebugMsg" flag of the 64-bit TEB of the thread. If it is set, the function returns true and the debug event is not delivered to the debugger.

Also, the "SuppressDebugMsg" field of the 32-bit TEB is queried, if the "Wow64Process" field of the "_EPROCESS" structure is set.

Notes:
1) Each Wow64 process has two Process Environment Blocks (PEBs), a 64-bit one and a 32-bit one.

2) Each thread in a Wow64 process has two Thread Information Blocks (TEBs), a 64-bit one and a 32-bit one. The 64-bit TEB is of size 2 pages and the 32-bit TEB is of size 1 page. The 32-bit TEB always follows the 64-bit TEB.

3) If the "Wow64Process" field of the "_EPROCESS" structure is set, then it is a Wow64 process (32-bit process running on 64-bit system). This field holds the address of the process's 32-bit PEB.
In WinDbg and IDA debugger, if our process loads a module e.g. walied.dll via calling e.g. the "LoadLibrary" function, the debugger receives the LOAD_DLL_DEBUG_EVENT event and caches the "hFile" field of the "LOAD_DLL_DEBUG_INFO" structure. It uses the "hFile" field to ReadFile info. e.g. debug info. from walied.dll

The problem here is that WinDbg and IDA debugger don't CloseHandle(hFile) until the UNLOAD_DLL_DEBUG_EVENT event for walied.dll is received. So, if we set the "SuppressDebugMsg" bit of TEB and then call FreeLibrary("walied.dll"), then the debugger will not receive the UNLOAD_DLL_DEBUG_EVENT for walied.dll. Any subsequent attempt to acquire an exclusive access to walied.dll via calling the "CreateFile" function will definitely fail which is a very sign of debugger existence.

A demo can be found here and its source code from here.

The trick mentioned above affects WinDbg and IDA debugger. OllyDbg v1.10 is affected but in a slightly different way. OllyDbg v1.10 does not CloseHandle(hFile) even if the corresponding UNLOAD_DLL_DEBUG_EVENT event is received.

N.B. OllyDbg v2.x is not affected since it immediately CloseHandle the "hFile" field of the "LOAD_DLL_DEBUG_INFO" structure once it receives the LOAD_DLL_DEBUG_EVENT event.

Conclusion:
Setting the "SuppressDebugMsg" bit of thread's TEB prevents the attached debugger from receiving UN/LOAD_DLL_DEBUG_EVENT's from this thread.

For debuggers to be immune to this trick, they should use the "hFile" field to read info. and close this handle immediately.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Windows Internals: SkipThreadAttach

By: walied
7 December 2012 at 14:32
In this post i will not present any new tricks but i will instead discuss a new issue introduced in later versions of Windows regarding thread creation.
In a previous post, i quickly explained the ntdll "NtCreateThreadEx" function and its flag HideFromDebugger 0x4 that when passed to the function causes the new thread to be created hidden from debuggers.

In this post we will see another interesting flag that i prefer to call it SuppressDllMains 0x2. Let's see this in disassembly.

As we can see in the image above, the "PspAllocateThread" function inspects the "Flags" parameter. If the SuppressDllMains 0x2 flag is set, then the function sets the "SkipThreadAttach 0x8" bit flag in the new thread's TEB.

Similarly for the 64-bit version of the function. If the "SuppressDllMains" flag is passed, then the "SkipThreadAttach 0x8" bit flag is set in both the 32-bit TEB and 64-bit TEB of the new thread.

N.B. The bit flags are at offset 0xFCA in 32-bit TEB's and at offset 0x17EE in 64-bit TEB's.

Now let's see what the "SkipThreadAttach" bit flag does. To track this, we will have to shift to user-mode.

In OllyDbg, search for the "\xCA\x0F" (0xFCA) in ntdll.dll and see which functions make use of the "SkipThreadAttach 0x8" bit flag.

The ntdll "RtlIsCurrentThreadAttachExempt" function was among the results i found.
This function returns false if the "SkipThreadAttach" bit flag is not set.
If the "SkipThreadAttach" bit flag is set, another bit flag "RanProcessInit 0x20" is tested. If not set, the function returns true. Otherwise, the function returns false. In C code it looks something like below.

Searching for all references to the "RtlIsCurrentThreadAttachExempt" function, i found one interesting place in ntdll.dll where this function is called, that is LdrpInitializeThread. 

The "LdrpInitializeThread" function is for calling the DllMain's of loaded dlls ( and TLS callbacks as well) each time a thread is initializing (with the "fdwReason" parameter set to DLL_THREAD_ATTACH) or is exiting (with the "fdwReason" parameter set to DLL_THREAD_DETACH).

Taking the "LdrpInitializeThread" function in disassembly, we can see that if  the ntdll "RtlIsCurrentThreadAttachExempt" function returns true e.g. due to the "NtCreateThreadEx" function being called with the "Flags" parameter set to SuppressDllMains 0x2, the DllMains and TLS callbacks of loaded modules will not be called in the context of the new thread. See image below.


A good example for this is the "DbgUiIssueRemoteBreakin" function in ntdll.dll of Windows 7. This function is called by the "DebugActiveProcess" function to create the attaching thread in the context of the process to be debugged.
In Windows XP, the thread created by the "DbgUiIssueRemoteBreakin" function caused the DllMains and TLS callbacks of loaded modules to be called, presenting another layer of protection against attaching.
In Windows 7, since the "DbgUiIssueRemoteBreakin" function ends up calling the "NtCreateThreadEx" function with the "Flags" parameter set to 0x2 (SuppressDllMains), no DllMain's or TLS callbacks are called for the debugger thread.

You can download the demo of this post from here and source code from here.

You can follow me on Twitter @waleedassar

Any comments or ideas are very welcome.

Call64, Bypassing Wow64 Emulation Layer

By: walied
14 January 2013 at 12:48
In this post i will discuss a piece of code that i wrote to ease the process of issuing 64-bit system calls without passing through the Wow64 emulation layer implemented in Wow64cpu.dll, Wow64.dll, and Wow64win.dll.


I implemented it in a function called "Call64()". Since some arguments in 64-bit system calls are 64 bits long, the "Call64()" function expects its arguments in the form of pointers to LARGE_INTEGER structures. Also, the return value is in the form of a pointer to a LARGE_INTEGER structure.


Let's take the implementation of this function step by step.

The first argument Call64 takes is a pointer to a LARGE_INTEGER structure which will receive the return value (RAX) of this system call. It is the caller's responsibility to allocate this structure. Also, it is the caller's responsibility to type-cast the value returned in it.

The second argument the function takes is the system call number or ordinal e.g. The "ZwWaitForSingleObject" function in Windows 7 has a system call number of 0x1.

This argument is later used to formulate the shellcode used to issue the 64-bit system call.


Since this function is supposed to make 64-bit system calls with different number of arguments, the function is implemented as variadic (A function with an indefinite number of arguments) with the third argument being the number of arguments the system call expects. The next arguments are all in the form of pointers to LARGE_INTEGER structures.

The function prototype is like below:

After we have looked at how the arguments look like, let's see how the function works.

First, given the number of arguments, it calculated the stack space needed and commits it using the "_alloca" function. The newly-allocated stack space is initialized to zero.

The function takes the first four arguments and stores them in RCX, RDX, R8, and R9 respectively. Extra arguments are stored on stack. Also, shadow space is taken care of.

Using the value of the 64-bit mode Code Segment selector, the function makes a Far Call to a 64-bit shellcode responsible for issuing the system call.


Suppose that we want to make a call to the "ZwClose" function using the "Call64" function, what you should do is allocate two LARGE_INTEGER structure, one to hold the value of the "Handle" parameter and the other to receive the return value (RAX). It looks like below.

Other example is the "ProcessConsoleHostProcess" class of the "ZwSetInformationProcess" function. If we trace into this call, we will find that the Wow64 emulation layer implemented in Wow64.dll prevents Wow64 processes from making such call and thus preventing them from changing their console host processes. See implementation of the "wow64!whNtSetInformationProcess" function.

The sole solution to this is to directly make the system call without passing through the Wow64 emulation layer. The call using the "Call64" function is like below.


N.B. You should bear in mind that some system calls expect pointer arguments to be aligned by 8 and this is why we should align them by using e.g. the "_aligned_malloc" function.


Source code and examples can be found here. The function has also been implemented in a Dynamic Link Library, you can find it and its header and .lib files here.

GitHub project from here.

Any comments, ideas, or bug reports are more than welcome.

You can follow me on Twitter @waleedassar

A Real Random VirtualAlloc

By: walied
18 January 2013 at 00:26
In this post i will discuss one disadvantage of using the "VirtualAlloc" function to allocate memory and also suggest a trick to play around this disadvantage.

If you ever used the "VirtualAlloc" function  to allocate memory, you must have noticed that addresses returned are almost the same over instances of the same process. This is due to the "ZwAllocateVirtualMemory" function doing nothing to ensure the randomness of the base address returned, at least in Windows 7.

N.B. VirtualAlloc is just a wrap up of the "VirtualAllocEx" function which is a wrap up of the ntdll "ZwAllocateVirtualMemory" function.

To test that fact, we will create a small application that does almost nothing but calling the "ZwAllocateVirtualMemory" function and printing the base address at which memory has been allocated.
The source code looks like below.

N.B. Even though the ASLR has nothing to do with randomizing the base address of memory returned by ZwAllocateVirtualMemory, we just set the "IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE" bit field for testing purposes.

Compile the above source code and run the application several times. See image below.

As you can see in the image above, the base address is the same (0x30000) across all instances of the process and this poses a security issue.

It seems that Microsoft has taken care of this issue while allocating stack memory for threads. Now in later versions of Windows e.g. Windows 7, the "RtlCreateUserStack" function which is responsible for reserving and committing the memory for threads is calling the "NtSetInformationProcess" function with a new information class to reserve the stack memory at a random address. The new process information class is ProcessThreadStackAllocation 0x29.

Now let's see how this new information class reserves memory.

Looking at the disassembly we can see that the function checks the "StackRandomizationDisabled" flag of the "_EPROCESS" structure. We can also see the function trying to randomize some variable by using the "SystemTime" field of the "SharedUserData" page, and the "RDTSC" instruction.

The function then calls the "MiScanUserAddressSpace" and "ZwAllocateVirtualMemory" functions to reserve memory at a random base address.


Now let's try to test the "ZwSetInformationProcess" function and see if addresses returned are really random. So, we compile the code in the image below and see.
N.B. Setting the "IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE" bit field is necessary for the "StackRandomizationDisabled" bit flag of the "_EPROCESS" structure to be unset.



As you can see in the two images above, in each time we invoked the application we got a random address for the memory allocated.


Conclusion:
using the "ProcessThreadStackAllocation" class of the "ZwSetInformationProcess" function, we can guarantee a random address for memory we allocate which can be considered a security enhancement.

Code and examples for this post can be found here.

You can follow me on Twitter @waleedassar

Wow64Log

By: walied
26 January 2013 at 21:38
In this post i will discuss an interesting functionality that i discovered while reversing Wow64.dll and specifically the "wow64!ProcessInit" function. Now let's take the function into assembly and see how it looks like.

The first thing the function does is open a registry key by calling the "ZwOpenKey" function with the "ObjectAttributes" parameter having the "ObjectName" member set to "REGISTRY\MACHINE\SOFTWARE\Microsoft\WOW64". So our first conclusion here is that the function tries to open the registry key "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Wow64" to retrieve specific information that may affect the Wow64 process throughout its lifetime. Usually, the key does not exist, at least on my machine (Windows 7 SP1 64-bit).


Next, if the key was successfully opened, the "Wow64GetWow64ImageOption" function is then called with the first parameter set to the opened key handle, the second parameter pointing at the wide string "Wow64ExecuteFlags", the third parameter set to 0x4 (REG_DWORD), the fourth parameter pointing at the variable that will receive the returned value, and the fifth parameter set to 0x4 (output size).

The "Wow64GetWow64ImageOption" function opens the IFEO registry key and queries the registry value whose name is the string pointed to by the second parameter and on error, it queries the same registry value under the registry key whose handle was given in the first parameter.

The extracted flags are then used to initialize three Wow64 global variables, Wow64!Wow64CommittedStackSize, Wow64!Wow64MaximumStackSize, and Wow64InfoPtr.


Whether the registry key was successfully opened or not and whether the "Wow64ExecuteFlags" value was successfully extracted or not, the "ProcessInit" function then directly jumps to the code you see in the image below.


As you can see it is trying to load a library called "Wow64Log.dll" residing in the "System32" directory by calling the ntdll "LdrLoadDll" function. Usually, this module does not exist.

N.B. Since this code is x64, the library Wow64Log.dll must be 64-bit.

Then comes some code that tries to resolve certain function addresses from Wow64Log.Dll by calling the "LdrGetProcedureAddress" function.

N.B. The kernel32 "GetProcAddress" function is just a wrap up of the ntdll "LdrGetProcedureAddress" function.

The code we see in the image above tries to resolve addresses of the "Wow64LogInitialize", "Wow64LogSystemService", "Wow64LogMessageArgList", and "Wow64LogTerminate"  functions from the Wow64Log.dll.  If any of the functions' addresses could not be resolved, the function fails and Wow64Log.dll is unloaded from the address space by calling the ntdll "LdrUnloadDll" function.

Assuming Wow64Log.dll was found in NtSystemRoot\\System32 and the above mentioned functions were found to be exported from it, we will have the global wow64.dll function pointers, "pfnWow64LogInitialize", "pfnWow64LogSystemService", "pfnWow64LogMessageArgList", and "pfnWow64LogTerminate" holding the address of the "Wow64LogInitialize", "Wow64LogSystemService", "Wow64LogMessageArgList", and "Wow64LogTerminate" functions respectively. The "Wow64LogInitialize" function will then be immediately called.

The "Wow64LogSystemService" function will be called every time the "Wow64!Wow64SystemServiceEx" function is called i.e. called with every system call being issued. This can be used for system call logging.

The "Wow64LogMessageArgList" function is called by the "Wow64!Wow64LogPrint" function to log certain events, more likely errors.

The "Wow64LogTerminate" function is called upon process termination by the "Wow64!whNtTerminateProcess" function.


The above mentioned topic can be used as a simple method for injecting 64-bit Dll's into Wow64 (32-Bit) processes by dropping Wow64Log.dll into system32.

Here is a simple Wow64Log.dll that i wrote as a demo.


You can follow me on Twitter @waleedassar

Wow64-Specific Anti-Debug Trick

By: walied
29 January 2013 at 07:17
In this post i will show you an anti-debug trick that i have recently found. The trick is specific to Wow64 processes. It rely on the fact that 32-bit debuggers e.g. OllyDbg, IDA Pro Debugger, and WinDbg_x86 don't receive debug events for certain exceptions originating from 64-bit code. One example of these exceptions is EXCEPTION_BREAKPOINT 0x80000003.

N.B. In a Wow64 process in Windows 7, its 32-bit code is executing in CS=0x23, while its 64-bit code is executing in CS=0x33.

Let's take for example the ntdll "DbgPrompt" function in Windows 7 64-bit.  I chose DbgPrompt for two reasons:
1) Calls to it end up with executing the INT 0x2D instruction, which raises an EXCEPTION_BREAKPOINT.
2) The 32-bit version of it (in 32-bit version of ntdll.dll) calls the 64-bit version of it (in 64-bit version of ntdll.dll).

N.B. The ntdll "DbgPrompt" function wraps up calls to the non-exported "DebugPrompt" function.

So, now if we call the "DbgPrompt" function from within our 32-bit code, we know that the call will end up with an EXCEPTION_BREAKPOINT raised from 64-bit mode.

The interesting thing here is that if you call the function without a debugger, the exception will be raised and its exception handler will be called. One the other hand, if a debugger is present, no exceptions are raised and the instruction following INT 2D will be executed.

Given the above knowledge, i wrote a simple demo for that Wow64-specific anti-debug trick. You can download the demo from here and its source code from here.







To bypass this trick, you have to use a 64-bit debugger where the exception will be raised and seen by the debugger.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Kernel Bug #0 ThreadIoPriority

By: walied
12 February 2013 at 19:24
This post is the first in a series of posts that will discuss several kernel bugs that i find in Windows Kernel. This post is about a bug found in the kernel of Windows 7 SP1 (64-bit).

Description:
With the "ThreadInformationClass" parameter set to ThreadIoPriority 0x16, passing certain signed values e.g.  0xFF3FFF3C or  0xFF3FFFFC in the variable pointed to by the "ThreadInformation" parameter to the ntdll "ZwSetInformationThread" function can be abused to arbitrarily set certain bit flags of the corresponding "_ETHREAD" structure e.g. ThreadIoPriority:3, ThreadPagePriority:3, RundownFail:1, or NeedsWorkingSetAging:1.

Bug Type:
This is due to a signedness error in the "nt!NtSetInformationThread" function.

32-Bit kernel:
64-bit kernel:


Impact:
1) The signed value leads to bypassing the check for the "SeIncreaseBasePriorityPrivilege" privilege that is required to set the thread's IO priority to HIGH.

2) An unprivileged thread can use certain calculated signed values to escalate its IO priority and memory priority to maximum values e.g. Raise IO priority to CRITICAL or Page priority to 7.

3) Also, certain bit flags of the corresponding "_ETHREAD" structure can be set e.g. RundownFail and NeedsWorkingSetAging.


POC:
https://www.dropbox.com/s/x7zzx5r62h0k4uz/PriorityCheckBypass.exe

Code:
http://pastebin.com/TanNzkn9

Status:
Reported to the vendor and rejected for not being a security issue.

Any comments or ideas are very welcome. You can also follow me on Twitter @waleedassar

Kernel Bug #1 ProcessIoPriority

By: walied
12 February 2013 at 19:29
In this post i will show you the second kernel bug that i found in the  Kernel of Windows 7 SP1 (64-bit). This one is in the "nt!NtSetInformationProcess" function.

Description:
With the "ProcessInformationClass" parameter set to ProcessIoPriority 0x21, passing certain signed values e.g.  0xFFFFFFFF or 0x8000F129 in the variable pointed to by the "ProcessInformation" parameter to the ntdll "ZwSetInformationProcess" function can be abused to arbitrarily set certain bit flags of the corresponding "_EPROCESS" structure e.g. DefaultIoPriority: Pos 27, ProcessSelfDelete : Pos 30, or SetTimerResolutionLink: Pos 31.

Bug Type:
This is due to a signedness error in the "nt!NtSetInformationProcess" function.


32-Bit kernel:


64-bit kernel:

 Impact:
1) The signed value leads to bypassing the check for the "SeIncreaseBasePriorityPrivilege" privilege that is required to set the process's IO priority to HIGH.


2) The signed value leads to bypassing the check for disallowed values for the process's IO priority e.g. the bug can be abused to set the process's IO priority to CRITICAL.

3) Setting the "ProcessSelfDelete" flag, which makes the target process non-killable by conventional methods.

4) Setting the "SetTimerResolutionLink" flag, which causes a BSOD (Bug check code of 0x3B)  if we terminate the process due to a null pointer dereference bug.

Poc:

Non-Killable Process

BSOD

Code:
http://pastebin.com/QejGQXib

Status:
Reported to the vendor.

Any comments or ideas are very welcome. You can also follow me on Twitter @waleedassar

PE TimeDateStamp Viewer

By: walied
23 February 2014 at 19:49
In this this post, i will share with you a tiny tool that i wrote to discover all occurrences of TimeDateStamps in a PE executable. The tool simply traverses the PE header and specifically the following structures/fields:

1) The "TimeDateStamp" field of the "_IMAGE_FILE_HEADER" structure.

This is the most notorious field that is always a target for both malware authors and forensic guys.

N.B. Certain versions of Delphi linkers always emit a fixed TimeDateStamp of 0x2A425E19, Sat Jun 20 01:22:17 1992. In this case you should not rely on this field and continue looking in other fields.

2) The "TimeDateStamp" field of the "_IMAGE_EXPORT_DIRECTORY" structure.

It is usually the same as or very close to the "TimeDateStamp" field of the the "_IMAGE_FILE_HEADER" structure".

N.B. Not all linkers fill this field, but Microsoft Visual Studio linkers do fill it for both DLL's and EXE's.

3) The "TimeDateStamp" field of the "_IMAGE_IMPORT_DESCRIPTOR" structure.

Unlike what the name implies, this field is a bit useless if you are trying to determine when the executable was built. It is -1 if the executable/dll is bound (see #8) and zero if not. So, it is not implemented in my tool.

4) The "TimeDateStamp" field of the "_IMAGE_RESOURCE_DIRECTORY" structure.

Usually Microsoft Visual Studio linkers don't set it (I have tested with linker versions of 6.0, 8.0, 9.0,  and 10.0).

Borland C and Delphi set this field for the main _IMAGE_RESOURCE_DIRECTORY and its subdirectories.

Sometimes spoofers forget to forge this field for subdirectories.

5) The "TimeDateStamp" of the "_IMAGE_DEBUG_DIRECTORY" structures.

Microsoft Visual Studio linkers emitting debug info. in the final PE always set this field. Spoofers may forge the field in the first "_IMAGE_DEBUG_DIRECTORY" structure and forget the following ones.

N.B. Debug info as pointed to by Debug Data Directory is an array of  "_IMAGE_DEBUG_DIRECTORY" structures, each representing debug info of different type e.g. COFF, CodeView, etc.

6) If  "_IMAGE_DEBUG_DIRECTORY" has the "Type" field set to 0x2 (IMAGE_DEBUG_TYPE_CODEVIEW), then by following the "PointerToRawData" field we can find another occurrence of TimeDateStamp ( only if the PDB format is PDB 2.0 i.e when "Signature" field is set to "NB10" )


7) The "TimeDateStamp" field of the "_IMAGE_LOAD_CONFIG_DIRECTORY" structure.

I have not seen it being used before. However, it is  implemented in the tool.

8) The "TimeDateStamp" field of the "_IMAGE_BOUND_IMPORT_DESCRIPTOR" structures.

It is the TimeDateStamp of the DLL that the executable is bound to. We can't use this field to know when the executable was build, but we can use it to determine on which Windows version/Service pack the file was built/bound. It is not implemented in the tool.

The tool has a very simple command line. See below.

You download the tool from here. For any bugs or suggestions, please don't hesitate to leave me a comment or contant me @waleedassar.

GitHub Project here.

ShareCount As Anti-Debugging Trick

By: walied
24 June 2014 at 21:04
In this post i will share with you an Anti-Debugging trick that is very similar to the "PAGE_EXECUTE_WRITECOPY" trick mentioned here, where we had to flag code section as writeable such that any memory write to its page(s) would force OS to change the page protection from PAGE_EXECUTE_WRITECOPY to PAGE_EXECUTE_READWRITE. But in this case we don't have to make any modifications to the code section's page protection. We will just query the process for its current working set info. Among the stuff we receive querying the working set of a process are two fields, "Shared" and "ShareCount".

By default the OS assumes the memory pages of code section (Non-writable sections) should share physical memory across all process instances. This is true till one process instance commits a memory-write to the shared page. At this point the page becomes no longer shared. Thus, querying the working set of the process and inspecting the "Shared" and/or "ShareCount" fields for our Code section pages would reveal the presence  of  debugger, only if the debugger uses INT3 for breakpoints.


To implement the trick, all you have to do is call the "QueryWorkingSet" or "QueryWorkingSetEx" functions.

N.B. You can also use the "ZwQueryVirtualMemory" function with the "MemoryInformationClass" parameter set to MemoryWorkingSetList for more portable code.

Code from here and demo from here. Tested on Windows 7.

For any suggestions, leave me a comment or drop me a mail [email protected].

You can also follow me on Twitter @waleedassar

OkayToCloseProcedure callback kernel hook

Hi ,

During the last few weeks I was busy exploring the internal working of Handles under Windows , by disassembling and decompiling certain kernel (ntoskrnl.exe) functions under my Windows 7 32-bit machine.In the current time I am preparing a paper to describe and explain what I learned about Handles. But today I’m here to discuss an interesting function pointer hook that I found while decompiling and exploring the ObpCloseHandleEntry function. (Source codes below).

A function pointer hook consists of overwriting a callback function pointer so when a kernel routine will call the callback function, the hook function will be called instead . The function pointer that we will be hooking in this article is the OkayToCloseProcedure callback that exists in the _OBJECT_TYPE_INITIALIZER structure which is an element of the OBJECT_TYPE struct.

Every object in Windows has an OBJECT_TYPE structure which specifies the object type name , number of opened handles to this object type ...etc OBJECT_TYPE also stores a type info structure (_OBJECT_TYPE_INITIALIZER) that has a group of callback functions (OpenProcedure ,CloseProcedure…) . All OBJECT_TYPE structures pointers are stored in the unexported ObTypeIndexTable array.

As I said earlier , the OkayToCloseProcedure is called inside ObpCloseHandleEntry function.In general this function (if the supplied handle is not protected from being closed) frees the handle table entry , decrements the object’s handle count and reference count.
Another case when the handle will not be closed is if the OkayToCloseProcedure returned 0 , in this case the ObpCloseHandleTableEntry returns STATUS_HANDLE_NOT_CLOSABLE.
I will discuss handles in more details in my future blog posts.

So how the OkayToCloseProcedure is called ?

ObpCloseHandleTableEntry function actually gets the Object (which the handle is opened to) header (_OBJECT_HEADER). A pointer to the object type structure (_OBJECT_TYPE) is then obtained by accessing the ObTypeIndexTable array using the Object Type Index from the object header (ObTypeIndexTable[ObjectHeader->TypeIndex]).

The function will access the OkayToCloseProcedure field and check if it’s NULL , if that’s true the function will proceed to other checks (check if the handle is protected from being closed). If the OkayToCloseProcedure field isn’t NULL , the function will proceed to call the callback function. If the callback function returns 0 the handle cannot be closed and ObpCloseHandleTableEntry will return STATUS_HANDLE_NOT_CLOSABLE. If it returns a value other than 0 we will proceed to the other checks as it happens when the OkayToCloseProcedure is NULL.

An additional point is that the OkayToCloseProcedure must always run within the context of the process that opened the handle in the first place (a call to KeStackAttachProcess). I don’t think that this would be a problem if ObpCloseHandleTableEntry is called as a result of calling ZwClose from usermode because we’ll be running in the context of the process that opened the handle.However, if ZwClose was called from kernel land and was supplied a kernel handle KeStackAttachProcess will attach the thread to the system process. The reason behind that is that we always want to access the right handle table (each process has a different handle table, and for the kernel we have the system handle table).

So if ObpCloseHandleTableEntry is called from another process context and is trying to close another process’s handle, the OkayToCloseProcedure must run in that process context. That’s why ObpCloseHandleTableEntry takes a pointer to the process object (owner of the handle) as a parameter.

Applying the hook :

Now after we had a quick overview of what’s happening , let’s try and apply the hook on the OBJECT_TYPE_INITIALIZER’s OkayToCloseProcedure field.
I applied the hook on the Process object type , we can obtain a pointer to the process object type by taking advantage of the exported PsProcessType , it’s actually a pointer to a pointer to the process’s object type.

Here’s a list containing the exported object types :
POBJECT_TYPE *ExEventObjectType;
POBJECT_TYPE *ExSemaphoreObjectType;
POBJECT_TYPE *IoFileObjectType;
POBJECT_TYPE *PsThreadType;
POBJECT_TYPE *SeTokenObjectType;
POBJECT_TYPE *PsProcessType;
POBJECT_TYPE *TmEnlistmentObjectType;
POBJECT_TYPE *TmResourceManagerObjectType;
POBJECT_TYPE *TmTransactionManagerObjectType;
POBJECT_TYPE *TmTransactionObjectType;


A second way to get an object’s type is by getting an existing object’s pointer and then pass it to the exported kernel function ObGetObjectType which will return a pointer to the object’s type.

A third way is to get a pointer to the ObTypeIndexTable array, it’s unexported by the kernel but there are multiple functions using it including the exported ObGetObjectType function.So the address can be extracted from the function's opcodes , but that will introduce another compatibility problem. After getting the pointer to the ObTypeIndexTable you'll have to walk through the whole table and preform a string comparison to the target's object type name ("Process","Thread" ...etc) against the Name field in each _OBJECT_TYPE structure.

In my case I hooked the Process object type , and I introduced in my code the 1st and the 2nd methods (second one commented).
My hook isn’t executing any malicious code !! it’s just telling us (using DbgPrint) that an attempt to close an open handle to a process was made.
“An attempt” means that we’re not sure "yet" if the handle will be closed or not because other checks are made after a successful call to the callback.And by a successful call , I mean that the callback must return a value different than 0 that’s why the hook function is returning 1. I said earlier that the ObpCloseHandleTableEntry will proceed to check if the handle is protected from being closed  (after returning from the callback) if the OkayToCloseProcedure is null or if it exists and returns 1 , that's why it’s crucial that our hook returns 1.One more thing , I’ve done a small check to see if the object type’s OkayToCloseProcedure is already NULL before hooking it (avoiding issues).

Example :
For example when closing a handle to a process opened by OpenProcess a debug message will display the handle value and the process who opened the handle.
As you can see "TestOpenProcess.exe" just closed a handle "0x1c" to a process that it opened using OpenProcess().


P.S : The hook is version specific.


Source codes :
Decompiled ObpCloseHandleTableEntry : http://pastebin.com/QL0uaCtJ
Driver Source Codehttp://pastebin.com/Z2zucYGZ


Your comments are welcome.

Souhail Hammou.

@Dark_Puzzle

Windows Internals - Quantum end context switching

30 August 2014 at 21:12
Hello,
Lately I decided to start sharing the notes I gather , almost daily , while reverse engineering and studying Windows. As I focused in the last couple of days on studying context switching , I was able to decompile the most involved functions and study them alongside with noting the important stuff. The result of this whole process was a flowchart.

Before getting to the flowchart let's start by putting ourselves in the main plot :
As you might know, each thread runs for a period of time before another thread is scheduled to run, excluding the cases where the thread is preempted ,entering a wait state or terminated. This time period is called a quantum. Everytime a clock interval ends (mostly 15 ms) the system clock issues an interrupt.While dispatching the interrupt, the thread current cycle count is verified against its cycle count target (quantum target) to see if it has reached or exceeded its quantum so the context would be switched the next thread scheduled to run.
Note that a context-switch in Windows doesn't happen only when a thread has exceeded its quantum, it also happens when a thread enters a wait state or when a higher priority thread is ready to run and thus preempts the current thread.

As it will take some time to organize my detailed notes and share them here as an article (maybe for later),consider the previous explanation as a small introduction into the topic. However ,the flowchart goes through the details involved in quantum end context switching.

Please consider downloading the pdf  to be able to zoom as much as you like under your PDF reader because GoogleDocs doesn't provide enough zooming functionality to read the chart.

Preview (unreadable) :


PDF full size Download  : GoogleDocs Link

P.S :
- As always , this article is based is on : Windows 7 32-bit
- Note that details concerning the routine that does the context switching (SwapContext) aren't included in the chart and are left it for a next post.

See you again soon.

-Souhail.

Windows Internals - A look into SwapContext routine

5 September 2014 at 14:24
Hi,
 
Here I am really taking advantage of my summer vacations and back again with a second part of the Windows thread scheduling articles. In the previous blog post I discussed the internals of quantum end context switching (a flowchart). However, the routine responsible for context switching itself wasn't discussed in detail and that's why I'm here today.

Here are some notes that'll help us through this post :
 1 - The routine which contains code that does context switching is SwapContext and it's called internally by KiSwapContext. There are some routines that prefer to call SwapContext directly and do the housekeeping that KiSwapContext does themselves.
 2 - The routines above (KiSwapContext and SwapContext) are implemented in ALL context switches that are performed no matter what is the reason of the context switch (preemption,wait state,termination...).
 3 - SwapContext is originally written in assembly and it doesn't have any prologue or epilogue that are normally seen in ordinary conventions, imagine it like a naked function.
 4 - Neither SwapContext or KiSwapContext is responsible for setting the CurrentThread and NextThread fields of the current KPRCB. It is the responsibility of the caller to store the new thread's KTHREAD pointer into pPrcb->CurrentThread and queue the current thread (we're still running in its context) in the ready queue before calling KiSwapContext or SwapContext which will actually perform the context-switch.
 Usually before calling KiSwapContext, the old irql (before raising it to DISPATCH_LEVEL) is stored in CurrentThread->WaitIrql , but there's an exception discussed later in this article.

So buckle up and let's get started :
Before digging through SwapContext let's first start by examining what its callers supply to it as arguments.
SwapContext expects the following arguments:
- ESI : (PKTHREAD) A pointer to the New Thread's structure.
- EDI : (PKTHREAD) A pointer to the old thread's structure.
- EBX : (PKPCR) A pointer to PCR (Processor control region) structure of the current processor.
- ECX : (KIRQL) The IRQL in which the thread was running before raising it to DISPATCH_LEVEL.
By callers, I mean the KiSwapContext routine and some routines that call SwapContext directly (ex : KiDispatchInterrupt).
Let's start by seeing what's happening inside KiSwapContext :
This routine expects 2 arguments the Current thread and New thread KTHREAD pointers in ECX and EDX respectively (__fastcall).
Before storing both argument in EDI and ESI, It first proceeds to save these and other registers in the current thread's (old thread soon) stack:
EBP : The stack frame base pointer (SwapContext only updates ESP).
EDI : The caller might be using EDI for something else ,save it.
ESI : The caller might be using ESI for something else ,save it too.
EBX : The caller might be using EBX for something else ,save it too.
Note that these registers will be popped from this same thread's stack when the context will be switched from another thread to this thread again at a later time (when it will be rescheduled to run).
After pushing the registers, KiSwapContext stores the self pointer to the PCR in EBX (fs:[1Ch]).Then it stores the CurrentThread->WaitIrql value in ECX, now that everything is set up KiSwapContext is ready to call SwapContext.

Again, before going through SwapContext let me talk about routines that actually call SwapContext directly and exactly the KiDispatchInterrupt routine that was referenced in my previous post.
Why doesn't KiDispatchInterrupt call KiSwapContext ?
Simply because it just needs to push EBP,EDI and ESI onto the current thread's stack as it already uses EBX as a pointer to PCR.

Here, we can see a really great advantage of software context switching where we just save the registers that we really need to save, not all registers.

Now , we can get to SwapContext and explain what it does in detail.
The return type of SwapContext is a boolean value that tells the caller (in the new thread's stack) whether the new thread has any APCs to deliver or not.

Let's see what SwapContext does in these 15 steps:

1 - The first thing that SwapContext does is verify that the new thread isn't actually running , this is only right when dealing with a multiprocessor system where another processor might be actually running the thread.If the new thread is running SwapContext just loops until the thread stops running. The boolean value checked is NewThread->Running and after getting out of the loop, the Running boolean is immediately set to TRUE.

2 - The next thing SwapContext does is pushing the IRQL value supplied in ECX. To spoil a bit of what's coming in the next steps (step 13) SwapContext itself pops ECX later, but after the context switch. As a result we'll be popping the new thread's pushed IRQL value (stack switched).

3 - Interrupts are disabled, and PRCB cycle time fields are updated with the value of the time-stamp counter. After the update, Interrupts are enabled again.

4 - increment the count of context switches in the PCR (Pcr->ContextSwitches++;) , and push Pcr->Used_ExceptionList which is the first element of PCR (fs:[0]). fs:[0] is actually a pointer to the last registered exception handling frame which contains a pointer to the next frame and also a pointer to the handling routine (similar to usermode), a singly linked list simply. Saving the exception list is important as each thread has its own stack and thus its own exception handling list.

5 - OldThread->NpxState is tested, if it's non-NULL, SwapContext proceeds to saving the floating-points registers and FPU related data using fxsave instruction. The location where this data is saved is in the initial stack,and exactly at (Initial stack pointer - 528 bytes) The fxsave output is 512 bytes long , so it's like pushing 512 bytes onto the initial stack , the other 16 bytes are for stack-alignment I suppose.The Initial stack is discussed later during step 8.

6 - Stack Swapping : Save the stack pointer in OldThread->KernelStack and load NewThread->KernelStack into ESP. We're now running in the new thread's stack, from now on every value that we'll pop was previously pushed the last time when the new thread was preparing for a context-switch.

7 - Virtual Address Space Swapping : The old thread process is compared with the new thread's process if they're different CR3 register (Page directory pointer table register) is updated with the value of : NewThread->ApcState.Process->DirectoryTableBase. As a result, the new thread will have access to a valid virtual address space. If the process is the same, CR3 is kept unchanged. The local descriptor table is also changed if the threads' processes are different.

8 -  TSS Esp0 Switching : Even-though I'll dedicate a future post to discuss TSS (task state segment) in detail under Windows , a brief explanation is needed here. Windows only uses one TSS per processor and uses only (another field is also used but it is out of the scope of this article) ESP0 and SS0 fields which stand for the kernel stack pointer and the kernel stack segment respectively. When a usermode to kernelmode transition must be done as a result of an interrupt,exception or system service call... as part of the transition ESP must be changed to point to the kernel stack, this kernel stack pointer is taken from TSS's ESP0 field. Logically speaking, ESP0 field of the TSS must be changed on every context-switch to the kernel stack pointer of the new thread. In order to do so, SwapContext takes the kernel stack pointer at NewThread->InitialStack (InitialStack = StackBase - 0x30) ,it substrats the space that it has used to save the floating-point registers using fxsave instruction and another additional 16 bytes for stack alignment, then it stores the resulted stack pointer in the TSS's Esp0 field : pPcr->TssCopy.Esp0 (TSS can be also accessed using the TR segment register).

9 - We've completed the context-switch now and the old thread can be finally marked as "stopped running" by setting the previously discussed boolean value "Running" to FALSE. OldThread->Running = FALSE.

10 - If fxsave was previously executed by the new thread (the last time its context was switched), the data (floating-point registers...) saved by it is loaded again using xrstor instruction.

11 - Next the TEB (Thread environment block) pointer is updated in the PCR :
pPcr->Used_Self = NewThread->Teb . So the Used_Self field of the PCR points always to the current thread's TEB.

12 - The New thread's context switches count is incremented (NewThread->ContextSwitches++).

13 - It's finally the time to pop the 2 values that SwapContext pushed , the pointer to the exception list and the IRQL from the new thread's stack. the saved IRQL value is restored in ECX and the exception list pointer is popped into its field in the PCR.

14 - A check is done to see if the context-switch was performed from a DPC routine (Entering a wait state for example) which is prohibited. If pPrcb->DpcRoutineActive boolean is TRUE this means that the current processor is currently executing a DPC routine and SwapContext will immediately call KeBugCheck which will show a BSOD : ATTEMPTED_SWITCH_FROM_DPC.

15 - This is the step where the IRQL (NewThread->WaitIrql) value stored in ECX comes to use. As mentionned earlier SwapContext returns a boolean value telling the caller if it has to deliver any pending APCs. During this step SwapContext will check the new thread's ApcState to see if there are any kernel APCs pending. If there are : a second check is performed to see if special kernel APCs are disabled , if they're not disabled ECX is tested to see if it's PASSIVE_LEVEL, if it is above PASSIVE_LEVEL an APC_LEVEL software interrupt is requested and the function returns FALSE. Actually the only case that SwapContext returns TRUE is if ECX is equal to PASSIVE_LEVEL so the caller will proceed to lowering IRQL to APC_LEVEL first to call KiDeliverApc and then lower it to PASSIVE_LEVEL afterwards.

Special Case :
This special case is actually about the IRQL value supplied to SwapContext in ECX. The nature of this value depends on the caller in such way that if the caller will lower the IRQL immediately upon returning from SwapContext or not.
Let's take 2 examples : KiQuantumEnd and KiExitDispatcher routines. (KiQuantumEnd is the special case)

If you disassemble KiExitDispatcher you'll notice that before calling KiSwapContext it stores the OldIrql (before it was raised to DISPATCH_LEVEL) in the WaitIrql of the old thread so when the thread gains execution again at a later time SwapContext will decide whether there any APCs to deliver or not. KiExitDispatcher makes use of the return value of KiSwapContext (KiSwapContext returns the same value returned by SwapContext) to lower the IRQL. (see step 15 last sentence).
However, by disassembling KiQuantumEnd you'll see that it's storing APC_LEVEL at the old thread's WaitIrql without even caring about in which IRQL the thread was running before. If you refer back to my flowchart in the previous article you'll see that KiQuantumEnd always insures that SwapContext returns FALSE , first of all because KiQuantumEnd was called as a result of calling KiDispatchInterrupt which is meant to be called when a DISPATCH_LEVEL software interrupt was requested.Thus, KiDispatchInterrupt was called by HalpDispatchSoftwareInterrupt which is normally called by HalpCheckForSoftwareInterrupt. HalpDispatchSoftwareInterrupt is the function responsible for raising the IRQL to the software interrupt level (APC_LEVEL or DISPATCH_LEVEL) and upon returning from it HalpCheckForSoftwareInterrupt recovers back the IRQL to its original value (OldIrql). So the reason why KiQuantumEnd doesn't care about KiSwapContext return value because it won't proceed to lowering the IRQL (not its responsibility) nor to deliver any APCs that's why it's supplying APC_LEVEL as an old IRQL value to SwapContext so that it will return FALSE. However, a software interrupt might be requested by SwapContext if there are any pending APCs.
KiDispatchInterrupt which calls SwapContext directly uses the same approach as KiQuantumEnd, instead of storing the value at OldThread->WaitIrql it just moves it into ECX.

Post notes :
- Based on Windows 7 32 bit :>
- For any questions or suggestions feel free to leave a comment below or send me an email : [email protected]

See you again soon :)

-Souhail

NoConName 2014 - inBINcible Reversing 400 Writeup

15 September 2014 at 00:48
Hello,
We (Spiderz) have finished 26th at the NCN CTF this year with 2200 points and we really did enjoy playing. I was able to solve both cannaBINoid (300p) and (inBINcible 400p) .I have actually found 2 solutions for inBINcible that I'll describe separately later in this write-up.

We were given an 32-bit ELF binary compiled from Golang (GO Programming Language).
The main function is "text" (also called main.main) and it is where interesting stuff happens. While digging through this routine the os_args array caught my attention, around address 0x08048DB3 it will access the memory location pointed by os_args+4 and compare its content to 2. This value is nothing but the number of command line arguments given to the executable (argc in C), so the binary is expecting a command line argument which is in fact the key.
By looking at the next lines , I saw an interesting check :
.text:08048EFE                 mov     ebx, dword ptr ds:os_args
.text:08048F04                 add     ebx, 8
.text:08048F07                 mov     esi, [ebx+4]
.text:08048F0A                 mov     ebx, [esp+0C0h+var_54]
.text:08048F0E                 mov     ebp, [ebx+4]
.text:08048F11                 cmp     esi, ebp
.text:08048F13                 jz      loc_8049048
As it's the first time I encounter GOlang I though that it was better to use GDB alongside with IDA so I fired up my linux machine , put a breakpoint on  0x08048F11 , gave the binary a 4 bytes long command line argument (./inbincible abcd) and then I examined both ESI and EBP .
esi = 0x4
ebp = 0x10
You can easily notice tha abcd length is 4 and the right length that should be given is 16 , we can safely assume now that the flag length is 16 bytes.
Note :
The instruction which initializes the flag length to 0x10 is this one (examine instructions before it)
.text:08048D05                 mov     [ebx+4], edi

If the length check is true runtime_makechan function will be called (0x08049073) which creates a channel as its name implies. After that we'll enter immediately a loop that will get to initializing some structure fields then calling  runtime_newproc for each character in the flag (16 in total). One of the variables is initialized with  "main_func_001" and it can be seen as the handler that is called when chanrecv is called.
After breaking out of the loop, ECX is set to 1 and then we'll enter another loop (0x080490DD). This loop calls chanrecv for each character in the input (under certain circumstances). chanrecv is supplied the current index of the input and a pointer to a local variable which I named success_bool. Basically our main routine will supply a pointer to success_bool to the channel which will assign another thread (probably the one created using runtime_newproc) executing the main_func_001 to do some checks then write a TRUE or FALSE value into the variable. After returning from chanrecv the boolean will be checked. If it's TRUE ecx will keep its value ( 1 ) and we'll move to the next character. However if main_func_001 has set the boolean to false ecx will be zeroed and the other characters of the user input won't be checked (fail).
I have actually found 2 methods to approach this challenge (with bruteforcing and without bruteforcing) :

1 - Getting the flag using bruteforce :

This solution consists of automating the debugger (GDB) to supply a 16 bytes length string as an argument , the current character (we basically start with the first character) will be changed during each iteration using a charset and the next character will be left unchanged until we've found the right current character of the key and so on. To find the right character we must break after returning from chanrecv then read the local variable (boolean) value , if it's 1 then we've got the right character and we shall save it then move to the next one, else we'll keep looking until finding the right one.

Here's a python GDB script that explains it better :


flag : G0w1n!C0ngr4t5!!

2 - Getting the key by analyzing main_func_001 :


As main_func_001 is the one responsible for setting the boolean value, analyzing this routine will give us the possibility to get the flag without any bruteforcing. Let's see what it does :
main_func_001 expects the boolean variable pointer and the index of the character to be tested. This index , as mentionned earlier, is the iterator of the loop which has called chanrecv.
For the purpose of checking each character main_func_001 uses 2 arrays , I called the first 5 bytes sized array Values_Array. The second array size is 16 bytes , same length as the password.
So here's how we can get the flag using the 2 arrays :

flag : G0w1n!C0ngr4t5!!

The final key to validate the challenge is NcN_sha1(G0w1n!C0ngr4t5!!)

Binary download : Here

Follow me on Twitter : Here

See you soon :>

CSAW CTF 2014 - Ish Exploitation 300 Write-up

22 September 2014 at 00:36
Hi,
This time with a quick writeup . Well , I took some time to reverse the binary under IDA and I soon discovered that the vulnerability was a memory leak which leaks 16 bytes from the stack and the vulnerable function was cmd_lotto, here's the full exploit :

I'll publish a writeup for exploitation 400 ( saturn ) as soon as possible.

Download binary : Here
Follow me on Twitter : Here

See you soon :).

- Souhail

CSAW CTF 2014 - "saturn" Exploitation 400 Write-up

22 September 2014 at 16:29
Hi,

The description for this task was :

    You have stolen the checking program for the CSAW Challenge-Response-Authentication-Protocol system. Unfortunately you forgot to grab the challenge-response keygen algorithm (libchallengeresponse.so). Can you still manage to bypass the secure system and read the flag?

    nc 54.85.89.65 8888

I grabbed the binary , threw it in IDA and then started looking at the main routine. The first function that was called in main was _fillChallengeResponse and it takes two arguments . I named them : fill_arg0 and fill_arg4.
A quick check reveals that this function is imported from an external library (the one we 'forgot' to grab). Also by checking the arguments passed to the function they appear to be pointers , each pointer points to a 32 bytes array in the bss section.We can also see that the first array is directly followed by the next one.


As fillChallengeResponse is given 2 pointers , we can safely guess that its mission is to fill them with the right data.

Let's carry on :


Next, we will enter this loop. Its was previously initialized to 0 and we'll quit the loop only if the iterator is strictly above 0. In this loop, we are first prompted to supply an input in which only the first byte is read , the byte is saved at [esp+1Bh] and the switch statement only uses the highest order nibble of the read byte.
If the switch statement was supplied 0xA0 , it will lead to retrieving the original read byte (0xA2 for example) and then call a function that will access the Array1 and print the dword at the index described by the lowest order nibble of the read byte multiplied by 4 ((0xA2 & 0xF)*4 = 8 for example).
If the switch statement was supplied 0xB0 , the executed block of code will retrieve the original read byte and then call a function that will wait for user input and then compare that input to the dword indexed by the lowest orded nibble of the original byte multiplied by 4 in Array2. If the 2 values are equal another 8 sized array of bytes will be accessed and 1 is written into the same index indicated by the lowest order nibble.
If the switch statement was supplied 0x80 , it will call a function that walk through the array of bytes checking if all the elements are equal to 1. If it's the case , the function will print the contents of "flag.txt".

The trick here is to take advantage of the read_array1 function , to make it print the Array2 and then pass each dword read from Array2 to the check_array2 function. As we already know Array1 and Array2 are sticked to each other and each ones size is 16 bytes this means that supplying 0xA8 will make us read the first dword of the Array2 . So all we need to do is supply 0xA8 as an input , save the printed value from read_array1 function , supply 0xE0 as an input (switch) then supply the saved printed value as a input (in check_array2) , this will result in setting the first byte of the 8 bytes sized array to 1.
We have to  basically repeat the same 8 times , 0xA8 -> 0xAF and 0xE0 -> 0xE8. When done , we'll supply 0x80 as an input and the "target" function will print the flag for us.
Here's an automated python script which prints the flag :

Binary download : Here

Follow me on twitter : Here
Bye.
- Souhail

❌
❌