RSS Security

πŸ”’
❌ About FreshRSS
There are new articles available, click to refresh the page.
Before yesterdaywaliedassar

Native x86 User-mode System Calls Hooking

27 July 2012 at 20:41
By: walied
In this post i am going to explain how to implement system call hooking from user-mode for native x86 processes (i here refer to 32-bit processes running in 32-bit versions of Windows XP SP2 and SP3).

Let's have a look at the "ZwOpenProcess" function of Windows XP SP2 and of Windows XP SP3.

1) XP SP2


2) XP SP3

As you can see in the images above, EAX is set to 0x7A, the system call ordinal and EDX is made to point at 0x7FFE0300 in the _KUSER_SHARED_DATA page. Then comes a CALL instruction which jumps to the "KiFastSystemCall" function whose address is stored in 0x7FFE0300 (_KUSER_SHARED_DATA::SystemCall).

One difference we can see is that SYSENTER of XP SP2 is followed by 5 NOPs while in XP SP3 SYSENTER is directly followed by the RET of the "KiFastSystemCallRet" function.
Β 
The first thing one may think of to implement the user-mode system call hook in Windows XP SP3/SP2 is to overwrite the "_KUSER_SHARED_DATA::SystemCall" and "_KUSER_SHARED_DATA::SystemCallRet" fields. Unfortunately, this is not possible since the page is not writable and any attempt to change its memory protection constant always fails.

So, we should now turn to the "KiFastSystemCall" function and try to overwrite its very first instruction with a JMP instruction. Is this all? Let's see.

For XP SP2, it is okay to write a near jmp instruction (5-byte long) since we have enough space (filled with 5 NOPs) and this does not hurt the RET instruction of the "KiFastSystemCallRet" function. But for XP SP3, any attempt to write the near jmp instruction will hurt the "KiFastSystemCallRet" function. Any common method for both XP SP2 and SP3?

I thought about that and came up with something that worked for both service packs. If we allocate a memory page at an address which when converted from absolute to relative gives 0xC3 as the fifth byte of the new JMP instruction. For example, if we allocate a memory page at 0x3F910000, given that the "KiFastSystemCall" function is at 0x7C90E510, we get the new JMP instruction as a sequence of
Β "\xE9\xEB\x1A\x00\xC3". You can check the source code of InjectHookLib for more information.

N.B. We can still use a short JMP by searching for any vacant 5 bytes in the range of -128 to +127 from the address of the "KiFastSystemCall" function. LEA ESP,[ESP] seems to be okay for both service packs.

N.B. With certain processors or under certain conditions e.g. disabled VT-x/AMD-V if using VirtualBox, the "KiFastSystemCall" function is not used at all and the "KiIntSystemCall" is used instead. In these cases, you can safely overwrite the first instructions of "KiIntSystemCall" function with a near JMP instruction as long as the code you hook to takes care of that.


Any ideas or suggestions are always very welcome.

You can follow me @waleedassar

Major / MinorSubsystemVersion

5 August 2012 at 21:00
By: walied
If you are still using Windows 2000, you must have noticed that certain executables refuse to run. Actually, this is due to the executables being built with Microsoft Visual Studio 2010 which sets the MajorSubSystemVersion and MinorSubsystemVersion in the PE header to 5 and 1. In other words, it creates executables to run on Windows XP (5.1) and above. This causes Windows 2000 (5.0) to refuse to load these executables.

Now, let's see where the check occurs and how to bypass it. The first place to check must be the kernel32 "CreateProcess" function.


If we start at address 0x7C4F1ECE, we can see a call to the ntdll "ZwQuerySection" function with the "InformationClass" parameter set to 1 (SectionImageInformation). After the "ZwQuerySection" function has returned successfully, the "SECTION_IMAGE_INFORMATION" structure should be filled with some useful data. Among the data returned are the executable's subsystem type and minor and major versions.

Then comes a check for the subsystem type. The subsystem type must be either GUI (IMAGE_SUBSYSTEM_WINDOWS_GUI) or console (IMAGE_SUBSYSTEM_WINDOWS_CUI). If it is not any of these two types, the "CreateProcess" function fails.


As you can see in the image above, at address 0x7C4F1F91, the major and minor subsystem versions extracted from the PE header via the "ZwQuerySection" function are passed to the "CheckSubSystem" function. If the "CheckSubSystem" function returns TRUE, the "CreateProcess" function proceeds and if it returns FALSE, the "CreateProcess" function fails as such. Now, let's check this function.


As you can see in the disassembly and C-code in the three images above, if the subsystem versions extracted from the PE header are less than 3.10, the "CheckSubsystem" function returns FALSE. Then comes the important part, if the "MajorSubsystemVersion" extracted from the PE header is greater than the value of the "NtMajorVersion" field (The field is at offset 0x26C from the _KUSER_SHARED_DATA page), the function fails. The same applies for "MinorSubSystemVersion" if "MajorSubsystemVersion" and "NtMajorVersion" are equal.

N.B. NtMajorVersion and NtMinorVersion are usually the same as the OS version info. returned by the kernel32 "GetVersion" or "GetVersionEx" functions.

As a developer, bypassing the check can easily be done by using Platform Toolset v9 in microsoft visual studio (thanks @skier_t) or by directly editing the PE header of the executable using any PE Editor.Β 

Imagine the scenario where the executable in question has CRC check upon its PE header as part of the implemented protection scheme. In this case, as a user, you won't be able to run the executable since any attempt to edit the PE header will cause the CRC check to fail. This leads us to find a system-wide solution. Yes, patching.

Speaking of patching, we have two options:

1) The first is to patch a couple of addresses inside the "CheckSubSystem" function (Actually, i don't recommend patching the return value check).

To implement the check bypass, i createdΒ  a dynamic link library, hooksubsystem.dll that once injected into a process bypasses the subsystem version check.

You can find the source code of hooksubsystem.dll here.
You can find hooksubsystem.dll here.

One drawback of this method is that it is Service pack-specific since the "CheckSubSystem" function is not exported by kernel32.dll.

2) The second is to patch the "ZwQuerySection" function such that we can manipulate the data returned in the "SECTION_IMAGE_INFORMATION" structure before being used by "CheckSubSystem" function.

To implement this method, i created another version of hooksubsystem.dll. You can find it here and its source code from here.

I also created a small application, BypassSubSystem.exe, which installs a system-wide hook of the type provided in the command line arguments. It can be used in the way you see in the image below.
BypassSubSystem.exe can be downloaded from here and its source code from here.

In a future post i will go deeper into this topic.Β 


You can follow me on Twitter @waleedassar

Anti-Dumping - Part 3

8 September 2012 at 13:46
By: walied
In this post i will share with you a couple of small tricks that can be deployed to harden or defeat memory dumping attempts. As i have just mentioned they are small tricks, so don't flame at me.

The first trick briefly involves appending a special section header to the section table of your executable. The new section header is to be set with a huge virtual size. Don't worry, this is not going to affect the file size (on disk) since we can set the raw size of the new section to zero (completely virtual section).

This results in the the "SizeOfImage" field of the IMAGE_OPTIONAL_HEADER structure being huge as well.

Unlike old anti-dumping tricks, we don't have to forge the "SizeOfImage" field of PEB.LoaderData or that in the PE Header memory page. Here, we give the dumping tools a huge value that they are very likely to fail to allocate using e.g. the "VirtualAlloc" function or its likes. Of course, this trick does not defeat dumping tools that read the memory of processes page by page.

Since the raw size of this huge section is zero, then the new section will be zero-initialized and the OS memory manager will throw it away making the memory usage of such process as smooth as possible.

It is now obvious that the new section should be left as it is. Your code should never read, write, or execute it. As any attempt to e.g. write to it results in the OS memory manager restoring the whole section into memory.

Here you can find a demo.

The second trick was first mentioned by Kris Kaspersky. The trick is very nice and simple. If we set the memory protection of one section as PAGE_GUARD, then the "ReadProcessMemory" function will fail usually with the system error code ERROR_PARTIAL_COPY, 0x12B. To defeat this trick, dumping tools are now using the "VirtualProtectEx" function to remove the PAGE_GUARD attribute, then read the section, and finally restore the PAGE_GUARD attribute.

To enhance this trick, i have created a watching thread that infinitely calls the "VirtualQuery" function and once it detects that PAGE_GUARD is removed from the section's memory protection attributes, it just terminates the process. Here is the code and here is a demo.

N.B. For the second trick to be effective, you should place the sections you want to protect after the PAGE_GUARD section so that the process terminates before them being dumped.

N.B. The second trick theoretically has better chances to work on multi-processor systems than on single-processor ones.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

PAGE_EXECUTE_WRITECOPY As Anti-Debug Trick

28 September 2012 at 14:48
By: walied
In this post, i will share with you a poorly discussed anti-debug trick that i may be the first one to discover or disclose.

Now let's start with a quick introduction. If a memory page with the "PAGE_EXECUTE_READWRITE" access protection attributes is requested from the OS, then a page with the "PAGE_EXECUTE_WRITECOPY" attributes, not the "PAGE_EXECUTE_READWRITE" attributes is given.Β Β Β 

The reason for that behavior is so simple, that is, the OS memory manager wants to physically share the page between all the process instances (since it is guaranteed to be the same in all the process instances before any write).

Once you make the first write to the new page, the OS assigns a private copy of the page to the process in which the write occurrs and the page attributes change to PAGE_EXECUTE_READWRITE.

N.B. The same applies to pages requested with the PAGE_READWRITE attributes. They are initially given the "PAGE_WRITECOPY" attributes and after the first write, they turn into PAGE_READWRITE.

N.B. PAGE_EXECUTE_WRITECOPY and PAGE_WRITECOPY are not valid parameters to the "VirtualAlloc" or "VirtualAllocEx" function.

Now if you have a section in your executable with the read, write, and execute access attributes (See section xyz in the image below), then the abovementioned applies to it.
The access protection attributes given to section xyz causes its memory page to be mapped with the "PAGE_EXECUTE_WRITECOPY" attributes. See image below.
If we design section xyz in a way that it is never written to (e.g. does not contain self-modifying code) throughout the whole lifetime of the process, then the page will always be PAGE_EXECUTE_WRITECOPY even at process exit.

If the attributes change to PAGE_EXECUTE_READWRITE, that means the page must have been written to e.g. when another process, mostly a debugger, had called the "WriteProcessMemory" function while stepping-over, tracing-over, or placing software breakpoints. That definitely means the process is being debugged. See images below.

Now our executable of question can call the "VirtualQuery" function to check the page protection attributes of section xyz. If it is something other than PAGE_EXECUTE_WRITECOPY, then a debugger is present and the process should quit.

The good thing about this trick is that, unlike the 0xCC-scanning trick, it can detect software breakpoints even if there are no longer active (removed by the debugger).

Also, most debuggers in their default settings are used to place software breakpoints on modules' entry points, which means the page protection attributes change even before the reverse engineer starts to debug the module.

A common way to bypass this trick for stepping-over and tracing-over is to use hardware breakpoints which is an available option in OllyDbg v1.10 and OllyDbg v2.01 (alpha 4).

A simple demo can be found here and its source code from here.

Any ideas or comments are very welcome.

You can follow me on Twitter @waleedassar

Virtual PC Machine Reset

26 October 2012 at 00:58
By: walied
While playing with Virtual PC 2007, i came up with an interesting trick not only to detect Virtual PC 2007 but also to reset (restart) the Virtual Machine.

The trick is so simple that all you need to do in your code is execute "\x0F\xC7\xC8\x05\x00".Β 

Executing that x86 instruction sequence causes the following message to pop up.
A POC can be found here and its source from here.

N.B. Other x86 instruction sequences can cause the same result.

Any comments or ideas are welcome.
You can follow me on Twitter @waleedassar

Virtual PC vs. Resume Flag

27 October 2012 at 19:17
By: walied
In this post i will show you another weird behavior of Virtual PC 2007. I encountered this weird behavior while playing with Virtual PC 2007 with Windows XP SP3 installed inside. The behavior is all about how a Windows XP Virtual PC virtual machine handles the Resume Flag.

For those who don't know, the Resume Flag (Flag no. 16 in the EFLAGS register) is used to temporarily disable Hardware Breakpoints exceptions for one instruction. Without it, a Hardware-Breakpoint-On-Execution would infinitely trigger an EXCEPTION_SINGLE_STEP exception.

According to @osxreverser, Windows XP does not support the Resume Flag (RF). I was also amazed to see that also WinDbg and OllyDbg v1.10 don't use the resume flag. They use the Trap Flag (TF) instead.

Running a simple executable that on purpose makes use of the Resume Flag inside an XP Virtual PC Virtual Machine, i found out that execution flows normally as if XP supports the resume flag.

Given the finding above, i created a small executable that tries to detect if it is running inside Virtual PC 2007.
You can find it here and its source code from here.

I guess the finding above only applies if the host operating system itself supports the resume flag e.g. Windows 7 or later.

N.B. This topic is still under research.

Please don't hesitate to leave a comment.
You can also follow me on Twitter @waleedassar

Virtual PC vs. DR7

29 October 2012 at 17:07
By: walied
In this post i will show you another weird behavior of Virtual PC 2007. This time the trick is about how Virtual PC handles the debug register DR7 known as Debug Control register.

For those who don't know, DR7 is used to specify the conditions under which the EXCEPTION_SINGLE_STEP exception is triggered for addresses held in DR0-DR3.
If we want to dissect DR7, it would be as follows:
Bit 0Β Β Β Β  ---> DR0 is locally enabled.
Bit 1Β Β Β Β  ---> DR0 is globally enabled.
Bit 2 Β  Β  ---> DR1 is locally enabled.
Bit 3Β Β Β Β  ---> DR1 is globally enabled.
Bit 4Β Β Β Β  ---> DR2 is locally enabled.
Bit 5Β Β Β Β  ---> DR2 is globally enabled.
Bit 6 Β  Β  ---> DR3 is locally enabled.
Bit 7Β Β Β Β  ---> DR3 is globally enabled.

Bit 8 Β  Β  ---> The "Local Enable Bit". Also for "Last Branch" tracing.
Bit 9 Β  Β  ---> The "Global Enable Bit". Also for "Last Branch" tracing.
Bit 10 Β  ---> Reserved.
Bit 11Β  ----> Reserved.
Bit 12 -----> IR
Bit 13 -----> GD
Bit 14 -----> TB
Bit 15 -----> TT

Bit 16 -----
Β  Β  Β  Β  Β  Β  Β  Β  Β  | ----> When DR0 is triggered.
Bit 17 -----
Bit 18 -----
Β  Β  Β  Β  Β  Β  Β  Β  Β  | ----> Size of DR0's trigger condition.
Bit 19 -----
Bit 20 -----
Β  Β  Β  Β  Β  Β  Β  Β  Β  | ----> When DR1 is triggered.
Bit 21 -----
Bit 22 -----
Β  Β  Β  Β  Β  Β  Β  Β  Β  | ----> Size of DR1's trigger condition.
Bit 23 -----

Bit 24 -----
Β  Β  Β  Β  Β  Β  Β  Β  Β  | ----> When DR2 is triggered.
Bit 25 -----
Bit 26 -----
Β  Β  Β  Β  Β  Β  Β  Β  Β  | ----> Size of DR2's trigger condition.
Bit 27 -----
Bit 28 -----
Β  Β  Β  Β  Β  Β  Β  Β  Β  | ----> When DR3 is triggered.
Bit 29 -----
Bit 30 -----
Β  Β  Β  Β  Β  Β  Β  Β  Β  | ----> Size of DR3's trigger condition.
Bit 31 -----

For example:
Imagine we want to place a Hardware-Breakpoint-On-Execution for an instruction at 0x401000. See image below.

What the debugger does in this case is:
1) Sets DR0 to 0x401000.
2) Sets bit 0 of DR7 to 1.
3) Sets bit 8 of DR7 to 1 (for backward compatibility).
4) Sets bits 16 and 17 of DR7 to 00 (00 means On-Execution).

And if we then want to place a Hardware-Breakpoint-On-Write-Four for memory at 0x10000. See image below.
What the debugger does in this case is:
1) Sets DR1 to 0x10000.
2) Sets bit 2 of DR7 to 1.
3) Sets bit 8 of DR7 to 1 (for backward compatibility).
4) Sets bitsΒ  20 and 21 of DR7 to 01 (01 means On-Write).
5) Sets bitsΒ  22 and 23 of DR7 to 11 (11 for the size of trigger condition means to watch four bytes).

Now let's try to get back to the main topic of this post.

Hereafter, i will call the second byte of DR7 (byte 0xBB of 0xDDCCBBAA) the flags byte, just for brevity.

On Windows XP, if we set the flags byte to any value ranging from 0x00 to 0xFF, the breakpoint is always active and the exception is always raised whenever the trigger condition is met e.g. if we set DR7 to 0x0000FF01 (a hardware breakpoint On-Execution with Local enable, global enable, reserved, reserved, IR, GD, TB, and TT bits set), the exception is raised whenever the address in DR0 executes.
The same applies for Windows 7.

What about Virtual PC 2007?Β 

In Virtual PC 2007 with Windows XP installed inside, with certain flags set in DR7 e.g. 0x00003F01, the breakpoint is sometimes not activated.

So, i created simple executable that brute-forces the DR7's flag byte and based on the number of times the exception is raised it determines whether it is running inside Virtual PC 2007.

You can download the demo from here and its source code from here.
Β 
N.B. It has been tested with Windows XP SP2 and SP3.
N.B. VirtualBox is also affected, but i will leave this for a future post.


Any comments or ideas are very welcome. You can also follow me on Twitter @waleedassar

Virtual PC vs. CPUID

30 October 2012 at 18:14
By: walied
In this post i will show another weird behavior of Virtual PC 2007. This time it is about the CPUID instruction. As most of you already know well what the CPUID is for and how it works, i will directly jump into the main topic.

In Virtual PC, executing CPUID disables interrupts for one instruction. Oh, wait, how is that?

Imagine we want to trace a sequence of x86 instruction. What the debugger does in that situation is as follows:
1) Calls the "GetThreadContext" function to extract the current context of the thread executing this sequence of instructions.
2) Modifies the "EFLAGS" register of the "CONTEXT" structure such that the Trap flag (TF) is set. EFLAGS is situated at offset 0xC0 from the start of the structure for the x86 version. TF is bit number 8 (0x100).
3) Calls the "SetThreadContext" and "ContinueDebugEvent" functions to continue execution.

When the trap flag is set, after executing an x86 instruction, an exception EXCEPTION_SINGLE_STEP is raised and trap flag is cleared.

The debugger receives the exception and resets the trap flag as shown above and so on.

Disable interrupts, what does that mean?
Executing certain instructions when the trap flag is set, no EXCEPTION_SINGLE_STEP exception is raised. The exception is raised after executing the instruction following them. One example instruction that disables interrupts is POP SS. POP SS has been used for a long time as an anti-tracing trick. Since it disables interrupts for one instruction, dumping the EFLAGS register to stack via . PUSHFD reveals the Trap Flag.


Executing CPUID in Virtual PC 2007, i found out that it has the same effect as POP SS. CPUID disables interrupts for one instruction.

I created a simple demo that exploits this bug to detect whether it is running inside Virtual PC 2007. It has been tested on Windows XP SP2 running inside Virtual PC 2007.

Reason for that is still under research but it seems to be due to the Virtualized CPUID (Intel FlexMigration) hardware support since the trick only works if Hardware Virtualization is enabled.

You can download the demo from here and its source code from here.

N.B. VirtualBox v4.1.22 r80657 is also affected by this bug.

N.B. Parallels Desktop is reportedly affected by this bug.

You can follow me on Twitter @waleedassar.

SizeOfStackReserve As Anti-Attaching Trick

6 November 2012 at 01:08
By: walied
In this post i will show you a new anti-attaching trick that has been tested on Windows 7. It does not work on Windows XP due to the changes Microsoft introduced in the way threads are created.

Let's first see how thread creation in Windows 7 is different from that of Windows XP.

In Windows XP, whenever you call the kernel32 "CreateRemoteThread" or the ntdll "RtlCreateUserThread" function to create a new thread, the following occurs underneath:

The kernel32 "BaseCreateStack" or ntdll "RtlpCreateStack" function is called in case ofΒ  "CreateRemoteThread" or "RtlCreateUserThread" successively to allocate space for the new thread's stack in the address space of the target process.

N.B. The kernel32 "CreateThread" function is only a call to the kernel32 "CreateRemoteThread" function with the "hProcess" parameter set to -1.

Since there is no big difference between the "BaseCreateStack" and "RtlpCreateStack" functions, it is enough for us to take the "BaseCreateStack" function in disassembly in this post.

The "BaseCreateStack" function takes four parameters, only three of them are of interest. The first parameter is the handle to the process in which we are about to allocate user stack memory. The second parameter is the size in bytes of user stack memory to COMMIT into the target process's address space. The third parameter is the size in bytes of user stack memory to RESERVE into the target process's address space. Hereafter, i will refer to them as hProcess, CommitSize, and ReserveSize.

N.B. If you call the "CreateRemoteThread" function with the "dwStackSize" parameter set to e.g. 0x10000, then BaseCreateStack commits 0x10000 bytes. On the other side, if the "CreateRemoteThread" function is called with the "dwCreationFlags" parameter having the "STACK_SIZE_PARAM_IS_A_RESERVATION" flag set, then BaseCreateStack Reserves 0x10000.

Now, let's dive into the "BaseCreateStack" function and see what is going on inside.

1) It extracts the value of ImageBase from the PEB of the process in which it is called, the value is then passed to the "RtlImageNtHeader" function. If the "RtlImageNtHeader" function fails an error ERROR_BAD_EXE_FORMAT is returned.


2)
If the "ReserveSize" parameter passed to it is zero, it uses the value of the "SizeOfStackReserve" field of the IMAGE_OPTIONAL_HEADER structure.



3) Similarly, If the "CommitSize" parameter passed to it is zero, it uses the value of the "SizeOfStackCommit" field of the IMAGE_OPTIONAL_HEADER structure. Please remember that the values are extracted from the PE header of the main executable of the process that is calling the "CreateRemoteThread" function, not the target process.



4) It then makes some sanitization checks on the ReserveSize and CommitSize, for example to ensure that the commit size is never greater than the reserve size. It also checks to ensure that the commit size is never lower than the value of the "MinimumStackCommit" field of PEB.




5) It calls the "ZwAllocateVirtualMemory" function to reserve memory of size ReserveSize into the address space of the target process with the PAGE_READWRITE protection attribute.


6) It calls the "ZwAllocateVirtualMemory" function to commit CommitSize+0x1000 of the memory reserved in the previous step.



7) The extra page committed in the previous step is then given the PAGE_GUARD protection attribute.


Here is a similar reversed code of the "BaseCreateStack" function. From here.


The reason why a PAGE_GUARD page always exists at the end of committed stack is for the kernel to be notified each time the stack needs to be expanded. For example, if a thread tries to touch its stack's PAGE_GUARD page, an STATUS_GUARD_PAGE_VIOLATION exception is raised and swallowed by the kernel and it automatically commits one more page.

N.B. If a thread tries to touch the PAGE_GUARD page of another thread's stack, the exception is passed to the application or the debugger.

After the stack has been allocated in the target process's address space, the "CreateRemoteThread" function formulates a CONTEXT structure for the new thread. After the previous steps have completed successfully, the "ZwCreateThread" function is called to initiate the new remote thread.

Now let's see how threads are created in Windows 7.

In Windows 7, if we take the "CreateRemoteThread" or "RtlCreateUserThread" function into disassembly, we will see that the "dwStackSize" is directly passed to the "ZwCreateThreadEx" function.
So, our first assumption here is that stack allocation is now forwarded to the kernel. Also, we can note that now in later versions of Windows than XP, the "ZwCreateThreadEx" function is by default used for thread creation instead of the "ZwCreateThread" function.

Now let's check the "NtCreateThreadEx" function in ntoskrnl.exe.

We can easily see in "NtCreateThreadEx" a call to the "PspCreateThread" function.
The "PspCreateThread" function calls the "PspAllocateThread" function which calls "RtlCreateUserStack" function.


The "RtlCreateUserStack" function is called after attaching to the target process's address space. Now let's look at the "RtlCreateUserStack" function in disassembly.

Now it is easy to see that it reads the PE header from the main executable of the process in which the remote thread is being created unlike XP where information was extracted from the main executable of the process that creates the thread. Yeah, it seems Microsoft fixed a very minor issue.


From the image above, it is also easy to conclude that if we forced the "RtlImageNtHeader" function to fail, we can prevent any foreign process including the debugger from attaching to our process. The easiest way to accomplish that is by erasing the PE header at runtime.Β  Any call to ZwCreateThreadEx as part of calling the "DebugActiveprocess" function (Used for attaching to a running process) would fail. For more information and examples, please refer to my previous post.

N.B. DebugActiveProcess calls DbgUiIssueRemoteBreakin which calls ~RtlCreateUserThread which calls "ZwCreateThreadEx".

One may say, "Erasing the whole PE header may render many APIs which read from the PE header useless e.g. FindResource or GetProcAddress". My answer will be "Yes, you are right".

So, we should find a smarter way to do it.

Okay, let's continue disassembling the "RtlCreateUserStack" function.


As you can see in the image above if the size of stack commit argument passed to it is zero, it takes the value of the "SizeOfStackCommit" field from the PE header. The same measure is taken if the size of stack reserve passed is zero. It is also noteworthy that if both the size of stack commit argument passed and "SizeOfStackCommit" of the PE header are zero, the commit size becomes 0x4000 (The default commit size is 0x4000).

The function then checks the size of stack commit against the size of stack reserve. If the size of stack commit happens to be greater, then the size of stack reserve is adjusted to be greater.

The function then ensures that the size to be committed is not less than the "MinimumStackCommit" field ofΒ  the process's PEB. If it is less, the size to be committed is adjusted.


The function then calls the "ZwSetInformationProcess" function with the "ProcessInformationClass" parameter set to 0x29 (ProcessThreadStackAllocation). The size to be reserved is passed in the 4th member of the structure passed in the "ProcessInformation" parameter.

Now let's quickly have a look at the "NtSetInformationProcess" function.

As you can see in the two images above, the value of the 4th member of the structure passed to the "ZwSetInformationProcess" function is used as the "RegionSize" parameter passed to the "ZwAllocateVirtualMemory" function.

Given this knowledge, if we at runtime change the value of the "SizeOfStackReserve" field of the PE header to a huge value, then we can cause the "ZwAllocateVirtualMemory", "ZwSetInformationProcess", "RtlCreateUserThread", "PspAllocateThread", "PspCreateThread", and "NtCreateThreadEx" functions to successively fail preventing any foreign processes including debuggers from creating any thread in our process.

A demo can be found here and its source code from here.

Any comments or ideas are more than welcome.

You can follow me on Twitter @waleedassar

Defeating Memory Breakpoints

12 November 2012 at 21:20
By: walied
In this post i will show you a couple of tricks that can be used to defeat memory breakpoints. First i should explain what memory breakpoints are and how they work.

Anyone who has spent some time in the field of software protection and debuggers must have heard of Memory breakpoints. Actually, memory breakpoints were not extensively used in the past but since more and more protection schemes implement anti-INT3 and anti-Hardware breakpoints tricks, reverse engineers started to use memory breakpoints to avoid detection.

The idea of memory breakpoints is so simple. Imagine that we want to place a memory breakpoint at address 0x402005 (On-Execution), what the debugger theoretically does is as follows:

1) Marks the memory page which the address 0x402005 belongs to (page 0x402000) as guarded via calling the "VirtualProtectEx" or "ZwProtectVirtualMemory" function with the "flNewProtect" parameter having the "PAGE_GUARD" protection attribute set. In this case page 0x402000 is originally PAGE_EXECUTE_READ 0x20 and after placing the memory breakpoint it becomes PAGE_EXECUTE_READ|PAGE_GUARD 0x120.

2) Each time the guarded page is touched whether read from, written to, or executes, then an exception STATUS_GUARD_PAGE_VIOLATION 0x80000001 is raised and the debugger receives a debug event of typeΒ  EXCEPTION_DEBUG_EVENT.

3) The debugger then inspects various fields in the "EXCEPTION_RECORD" structure of the "DEBUG_EVENT" structure to determine the reason why the exception was raised.
If the following conditions are met, then the debugger figures out that instruction at 0x402005 is about to execute i.e. breakpoint reached and that it should break accordingly.
a) The "ExceptionCode" field is set to STATUS_GUARD_PAGE_VIOLATION 0x80000001. b) The "NumberParameters" field is greater than or equal to 2. c) The "ExceptionInformation[0]" field is set to 8. d) The "ExceptionInformation[1]" field is set to 0x402005. The image below represents something very similar.


If any of the above mentioned conditions is not met, then the debugger figures out it is not the breakpoint. Whether the breakpoint is hit or not, the debugger resets the "PAGE_GUARD" protection attribute.

Surprisingly, even though this is the typical way debuggers should implement memory breakpoints, OllyDbg and many other user-mode debuggers implement memory breakpoints in a slightly different way.

Let's first take OllyDbg v1.10 and see how it implements memory breakpoints.

If you already use OllyDbg v1.10, you should already know that it has only two kinds of memory breakpoints, On-Access and On-Write. On-Access memory breakpoints trigger anytime the page is touched and On-Write memory breakpoints trigger anytime the page is written to.

Trying to reverse OllyDbg v1.10 to see how it implements each type, i found out that:

1) For On-Access memory breakpoints, they are implemented by marking the page that the breakpoint address belongs to as PAGE_NOACESS. PAGE_NOACCESS means that anytime the page is touched, an exception STATUS_ACCESS_VIOLATION is raised. The debugger then receives the debug event and inspects fields in the "EXCEPTION_RECORD" structure in a similar way to the conventional method mentioned above.

2) For On-Write memory breakpoints, they are implemented by depriving the page which the breakpoint address belongs to of the write access right via setting the "flNewProtect" parameter passed to the "VirtualProtectEx" function to PAGE_EXECUTE_READ. Every time the page is written to, an exception STATUS_ACCESS_VIOLATION is received. The debugger then receives the debug event and inspects fields in the "EXCEPTION_RECORD" structure in a similar way to the conventional method mentioned above. Here lies a bug in OllyDbg v1.10 since it assumes that the memory protection of any single page in the process address space can be turned into PAGE_EXECUTE_READ while this is not true for example memory page at 0x10000 can never be executable (Windows 7).

After we have seen how memory breakpoints are implemented, i will show you two tricks that can be used as anti-memory-breakpoints.

Trick 1)

Given the knowledge above, we can conclude that in order to defeat memory breakpoints esp. those of type On-Execution, we should cause the "VirtualProtectEx" function to fail. How is that possible?
By copying our code to a dynamically-allocated memory page whose page protection attributes can be executable and in the same time can not be guarded or no-access. This type of memory pages does really exist. For every thread you create, the kernel allocates one page (three pages in case of Wow64 processes) for the TEB. The TEB page(s) can't be non-writable and can't be assigned the "PAGE_GUARD" protection attribute. How can this be implemented?
All you have to do to implement this trick is call the "CreateThread" function with the "dwCreationFlags" parameter set to CREATE_SUSPENDED. At this point, we have the new thread's TEB with the page protection attributes set to PAGE_READWRITE. The next thing we should do is make the TEB page executable by calling the "VirtualProtect" function with the "flNewProtect" parameter set to PAGE_EXECUTE_READWRITE.

You can use this demo to test this trick.

N.B. For more stealthy way to conceal the point at which the page protection is changed to executable, use the "VirtualAlloc" function instead of "VirtualProtect". The allocation type in this case must be MEM_COMMIT only.

Trick 2)

This trick can easily detect memory breakpoints. It relies on the fact that the "ReadProcessMemory" function returns false if you try to read guarded or no-access memory. To use this trick, all you have to do is call the "ReadProcessMemory" function with the "Handle" parameter set to 0xFFFFFFFF, the "lpBaseAddress" parameter set to the image base, and the "nSize" parameter set to the size of image. If it returns false, then at least one memory breakpoint is present.

You can use this demo to test this trick.

N.B. Certain executables have gap inaccessible pages e.g. those pages intended for anti-dumping described in a previous post. So you have to take care of that if implementing this trick.

N.B. ReadProcessMemory has also been used as a stealthy way to read memory without triggering Hardware Breakpoints.


Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar





OllyDbg RaiseException Bug

12 November 2012 at 22:48
By: walied
In this post i will show you a bug in OllyDbg that can be used to detect its presence. The trick is so easy that all you have to do is call the "RaiseException" function with the "dwExceptionCode" parameter set to EXCEPTION_BREAKPOINT 0x80000003. The response depends on the OllyDbg version used. If it is v1.10, then the exception is going to be silently swallowed by the debugger and the registered exception handler is not called. In v2.01 (alpha 4), several message boxes pop up and the exception handler is not called either. Only v2.01 (beta 2) is immune to this bug.



The reason behind this bug is OllyDbg trying to read the x86 instruction pointed to by the "ExceptionAddress" field of the "EXCEPTION_RECORD" structure to ensure it is 0xCC or 0x03. In case of EXCEPTION_BREAKPOINT exceptions raised by explicitly calling the "RaiseException" function, the instructions at ExceptionAddress is definitely not 0xCC or 0x03.


You can find a demo here and its source code from here.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

VirtualBox CPUID Discrepancy

12 November 2012 at 23:53
By: walied
In this post i will show you a weird issue i have lately found in VirtualBox. This issue is seen only if VirtualBox is running without hardware virtualization support (VT-x/AMD-V).

For example, when Windows XP is running in VirtualBox with no hardware virtualization support, it is forced to use INT 2E to make system calls instead of SYSENTER. This is because SYSENTER is apparently not supported by VirtualBox. The problem here is that in this case the CPUID instruction still detects supported SYSENTER/SYSEXIT instructions.

We can use this discrepancy to detect VirtualBox (only if running with no hardware virtualization). All we have to do is execute CPUID (Leaf 1) and if we have bit 0x800 of EDX set, then execute SYSENTER in the form of any system call e.g. ZwDelayExecution. If an EXCEPTION_ILLEGAL_INSTRUCTION 0xC000001D is raised, then VirtualBox is present.


You can find a demo here and source code from here.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Hidding Threads From Debuggers

23 November 2012 at 05:05
By: walied
In this post i will take into discussion an old anti-debug trick that many of us know well. The trick is the ability of our code to hide specific threads from debuggers. This is usually achieved by calling the ntdll "ZwSetInformationThread" function with the "ThreadInformationClass" parameter set to ThreadHideFromDebugger 0x11. Sample code for this trick can be found here.

If we take the "ZwSetInformationThread" function into disassembly, we can easily see that the "ThreadInformationLength" parameter must be zero for the function call to succeed, otherwise ERROR_BAD_LENGTH is returned. See image below.

Β And here is the 64-bit version

As you can see from the two images above, the whole function call ends up setting the "HideFromDebugger" bit of the "_ETHREAD" structure. Once this flag has been set, the kernel guarantees that the debugger will never receive any debug events from the corresponding thread.

For example, let's take the LOAD_DLL_DEBUG_EVENT events. As you know, any time a module is loaded into the address space of specific process, the debugger is notified of this action through the LOAD_DLL_DEBUG_EVENT events.The debugger then inspects various interesting fields in the "LOAD_DLL_DEBUG_INFO" structure e.g. ImageBase. Depending on the debugger configuration, the debugger notifies you of that or not. You can see this if you instruct OllyDbg to break on new module.

The two images above show how OllyDbg acts if a normal (not hidden) thread loads a new DLL. It is as follows:
1) Thread Loads a new DLL via calling e.g. the "LoadLibrary" function.

2) The "LoadLibrary" function wraps up a call to the ntdll "ZwMapViewOfSection" function.

3) The kernel mode part of ZwMapViewOfSection calls the "DbgkMapViewOfSection" function.

4) The "DbgkMapViewOfSection" function queries both the "HideFromDebugger" bit of the "_ETHREAD" structure and the value of the "DebugPort" field of the "_EPROCESS" structure. If the "HideFromDebugger" bit is not set and the "DebugPort" field is set, then the function builds the "LOAD_DLL_DEBUG_INFO" structure and calls the "DbgkpSendApiMessage" function which is responsible for delivering the debug event to the attached debugger.
On the other side, if the "HideFromDebugger" bit is set, DbgkMapViewOfSection returns immediately without delivering the debug event. See images below.


N.B. Regarding the UN/LOAD_DLL_DEBUG_EVENT's, there are other factors that determine whether or not the debug event is going to be delivered to debugger e.g. the "SuppressDebugMsg" bit of the Thread Environment Block (TEB).

5)Β  In the debugger, the "WaitForDebugEvent" function returns with the "dwDebugEventCode" field set to LOAD_DLL_DEBUG_EVENT 0x6. Given this, the debugger figures out that a new module has just been loaded and that it should inspect the "LOAD_DLL_DEBUG_INFO" structure to extract the new image base, file handle, etc.

6) After extracting info. from the "LOAD_DLL_DEBUG_INFO" structure, the debugger calls the "ContinueDebugEvent" function to continue executing the thread.

Similar to LOAD_DLL_DEBUG_EVENT's, debuggers never get notified of exceptions raised in the scope of hidden threads. To ensure that let's have a look at the "DbgkForwardException" function.

As you can see in the image above, the "HideFromDebugger" bit of the "_ETHREAD" structure is queried here as well.

Conclusion: When the "HideFromDebugger" bit flag of the "_ETHREAD" structure is set, the thread will not receive any debug events.

If we look again at the "NtSetInformationThread" function in disassembly, we will see that the function call is one-way i.e. you can make this function call to hide the thread from debugger but you can not make this call to un-hide the thread from debuggers.

Let's have a look at the "ZwQueryInformationThread" function. As the name implies, we can use this function to determine if a specific thread is hidden from debuggers. See below.

And here is the 64-bit version.

As you can see from the two images above, the "ThreadInformationLength" parameter must be one for this function call to succeed. If it is one as expected, nothing surprising is seen, the function just sets the first byte pointed to by the "ThreadInformation" parameter to one if the "HideFromDebugger" bit of the "_ETHREAD" structure is set. Given this knowledge, i have created a small OllyDbg v1.10 plugin to detect any hidden thread in the process being debugged esp. if we are attaching to an active process. The plugin is called HiddenThreads. You download it from here and its source code from here.

Unfortunately, in older versions of Windows e.g. XP, the "ZwQueryInformationThread" function can't be used to detect if a thread is hidden from debuggers as the ThreadHideFromDebugger information class 0x11 is simply not implemented. The function call returns ERROR_INVALID_PARAMETER.

Now that we have seen how to hide a thread from debuggers, how this works under the hood, and how to detect if a thread is hidden from debuggers, let's try to find another way to hide the thread other than calling the "ZwSetInformationThread" function.

With the introduction of the "ZwCreateThreadEx" function e.g. Windows Vista and 7, a new flags parameter is present. This flag causes new threads to be created hidden from debuggers i.e. you don't need to call the "ZwSetInformationThread" function. If we set this parameter (the 7th parameter) to 0x4, then the new thread will be hidden from debuggers. In this case, setting the "HideFromDebugger" bit occurs in the "PspAllocateThread" function. See image below.


You can find a demo here and its source code from here.


This post was written based on debugging sessions on Windows 7 64-bit. This is why you see me switching from x86 to x64.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

SuppressDebugMsg As Anti-Debug Trick

24 November 2012 at 22:06
By: walied
In this post i will show you a new anti-debug trick that affects many debuggers e.g. WinDbg and IDA Debugger.

When you load a module into the address space of a process usually via calling e.g.Β  the kernel32 "LoadLibrary" function, the debugger is notified of this through the LOAD_DLL_DEBUG_EVENT event. This occurs at the point the "NtMapViewOfSection" function calls the "DbgkMapViewOfSection" function.

As we saw in the previous post, the "HideFromDebugger" flag of the "_ETHREAD" structure and the "DebugPort" field of the "_EPROCESS" structure are queried. If the "HideFromDebugger" flag is not set and the "DebugPort" field is set, the debug event is delivered to the debugger but only after the return value of the "DbgkpSuppressDbgMsg" function is checked.

If the "DbgkpSuppressDbgMsg" function returns false, the debug event is delivered to the debugger and vice versa. Now let's see the "DbgkpSuppressDbgMsg" function in disassembly.


As you can see in the image below, it checks the "SuppressDebugMsg" flag of the 64-bit TEB of the thread. If it is set, the function returns true and the debug event is not delivered to the debugger.

Also, the "SuppressDebugMsg" field of the 32-bit TEB is queried, if the "Wow64Process" field of the "_EPROCESS" structure is set.

Notes:
1) Each Wow64 process has two Process Environment Blocks (PEBs), a 64-bit one and a 32-bit one.

2) Each thread in a Wow64 process has two Thread Information Blocks (TEBs), a 64-bit one and a 32-bit one. The 64-bit TEB is of size 2 pages and the 32-bit TEB is of size 1 page. The 32-bit TEB always follows the 64-bit TEB.

3) If the "Wow64Process" field of the "_EPROCESS" structure is set, then it is a Wow64 process (32-bit process running on 64-bit system). This field holds the address of the process's 32-bit PEB.
In WinDbg and IDA debugger, if our process loads a module e.g. walied.dll via calling e.g. the "LoadLibrary" function, the debugger receives the LOAD_DLL_DEBUG_EVENT event and caches the "hFile" field of the "LOAD_DLL_DEBUG_INFO" structure. It uses the "hFile" field to ReadFile info. e.g. debug info. from walied.dll

The problem here is that WinDbg and IDA debugger don't CloseHandle(hFile) until the UNLOAD_DLL_DEBUG_EVENT event for walied.dll is received. So, if we set the "SuppressDebugMsg" bit of TEB and then call FreeLibrary("walied.dll"), then the debugger will not receive the UNLOAD_DLL_DEBUG_EVENT for walied.dll. Any subsequent attempt to acquire an exclusive access to walied.dll via calling the "CreateFile" function will definitely fail which is a very sign of debugger existence.

A demo can be found here and its source code from here.

The trick mentioned above affects WinDbg and IDA debugger. OllyDbg v1.10 is affected but in a slightly different way. OllyDbg v1.10 does not CloseHandle(hFile) even if the corresponding UNLOAD_DLL_DEBUG_EVENT event is received.

N.B. OllyDbg v2.x is not affected since it immediately CloseHandle the "hFile" field of the "LOAD_DLL_DEBUG_INFO" structure once it receives the LOAD_DLL_DEBUG_EVENT event.

Conclusion:
Setting the "SuppressDebugMsg" bit of thread's TEB prevents the attached debugger from receiving UN/LOAD_DLL_DEBUG_EVENT's from this thread.

For debuggers to be immune to this trick, they should use the "hFile" field to read info. and close this handle immediately.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Windows Internals: SkipThreadAttach

7 December 2012 at 14:32
By: walied
In this post i will not present any new tricks but i will instead discuss a new issue introduced in later versions of Windows regarding thread creation.
In a previous post, i quickly explained the ntdll "NtCreateThreadEx" function and its flag HideFromDebugger 0x4 that when passed to the function causes the new thread to be created hidden from debuggers.

In this post we will see another interesting flag that i prefer to call it SuppressDllMains 0x2. Let's see this in disassembly.

As we can see in the image above, the "PspAllocateThread" function inspects the "Flags" parameter. If the SuppressDllMains 0x2 flag is set, then the function sets the "SkipThreadAttach 0x8" bit flag in the new thread's TEB.

Similarly for the 64-bit version of the function. If the "SuppressDllMains" flag is passed, then the "SkipThreadAttach 0x8" bit flag is set in both the 32-bit TEB and 64-bit TEB of the new thread.

N.B. The bit flags are at offset 0xFCA in 32-bit TEB's and at offset 0x17EE in 64-bit TEB's.

Now let's see what the "SkipThreadAttach" bit flag does. To track this, we will have to shift to user-mode.

In OllyDbg, search for the "\xCA\x0F" (0xFCA) in ntdll.dll and see which functions make use of the "SkipThreadAttach 0x8" bit flag.

The ntdll "RtlIsCurrentThreadAttachExempt" function was among the results i found.
This function returns false if the "SkipThreadAttach" bit flag is not set.
If the "SkipThreadAttach" bit flag is set, another bit flag "RanProcessInit 0x20" is tested. If not set, the function returns true. Otherwise, the function returns false. In C code it looks something like below.

Searching for all references to the "RtlIsCurrentThreadAttachExempt" function, i found one interesting place in ntdll.dll where this function is called, that is LdrpInitializeThread.Β 

The "LdrpInitializeThread" function is for calling the DllMain's of loaded dlls ( and TLS callbacks as well) each time a thread is initializing (with the "fdwReason" parameter set to DLL_THREAD_ATTACH) or is exiting (with the "fdwReason" parameter set to DLL_THREAD_DETACH).

Taking the "LdrpInitializeThread" function in disassembly, we can see that ifΒ  the ntdll "RtlIsCurrentThreadAttachExempt" function returns true e.g. due to the "NtCreateThreadEx" function being called with the "Flags" parameter set to SuppressDllMains 0x2, the DllMains and TLS callbacks of loaded modules will not be called in the context of the new thread. See image below.


A good example for this is the "DbgUiIssueRemoteBreakin" function in ntdll.dll of Windows 7. This function is called by the "DebugActiveProcess" function to create the attaching thread in the context of the process to be debugged.
In Windows XP, the thread created by the "DbgUiIssueRemoteBreakin" function caused the DllMains and TLS callbacks of loaded modules to be called, presenting another layer of protection against attaching.
In Windows 7, since the "DbgUiIssueRemoteBreakin" function ends up calling the "NtCreateThreadEx" function with the "Flags" parameter set to 0x2 (SuppressDllMains), no DllMain's or TLS callbacks are called for the debugger thread.

You can download the demo of this post from here and source code from here.

You can follow me on Twitter @waleedassar

Any comments or ideas are very welcome.

Call64, Bypassing Wow64 Emulation Layer

14 January 2013 at 12:48
By: walied
In this post i will discuss a piece of code that i wrote to ease the process of issuing 64-bit system calls without passing through the Wow64 emulation layer implemented in Wow64cpu.dll, Wow64.dll, and Wow64win.dll.


I implemented it in a function called "Call64()". Since some arguments in 64-bit system calls are 64 bits long, the "Call64()" function expects its arguments in the form of pointers to LARGE_INTEGER structures. Also, the return value is in the form of a pointer to a LARGE_INTEGER structure.


Let's take the implementation of this function step by step.

The first argument Call64 takes is a pointer to a LARGE_INTEGER structure which will receive the return value (RAX) of this system call. It is the caller's responsibility to allocate this structure. Also, it is the caller's responsibility to type-cast the value returned in it.

The second argument the function takes is the system call number or ordinal e.g. The "ZwWaitForSingleObject" function in Windows 7 has a system call number of 0x1.

This argument is later used to formulate the shellcode used to issue the 64-bit system call.


Since this function is supposed to make 64-bit system calls with different number of arguments, the function is implemented as variadic (A function with an indefinite number of arguments) with the third argument being the number of arguments the system call expects. The next arguments are all in the form of pointers to LARGE_INTEGER structures.

The function prototype is like below:

After we have looked at how the arguments look like, let's see how the function works.

First, given the number of arguments, it calculated the stack space needed and commits it using the "_alloca" function. The newly-allocated stack space is initialized to zero.

The function takes the first four arguments and stores them in RCX, RDX, R8, and R9 respectively. Extra arguments are stored on stack. Also, shadow space is taken care of.

Using the value of the 64-bit mode Code Segment selector, the function makes a Far Call to a 64-bit shellcode responsible for issuing the system call.


Suppose that we want to make a call to the "ZwClose" function using the "Call64" function, what you should do is allocate two LARGE_INTEGER structure, one to hold the value of the "Handle" parameter and the other to receive the return value (RAX). It looks like below.

Other example is the "ProcessConsoleHostProcess" class of the "ZwSetInformationProcess" function. If we trace into this call, we will find that the Wow64 emulation layer implemented in Wow64.dll prevents Wow64 processes from making such call and thus preventing them from changing their console host processes. See implementation of the "wow64!whNtSetInformationProcess" function.

The sole solution to this is to directly make the system call without passing through the Wow64 emulation layer. The call using the "Call64" function is like below.


N.B. You should bear in mind that some system calls expect pointer arguments to be aligned by 8 and this is why we should align them by using e.g. the "_aligned_malloc" function.


Source code and examples can be found here. The function has also been implemented in a Dynamic Link Library, you can find it and its header and .lib files here.

GitHub project from here.

Any comments, ideas, or bug reports are more than welcome.

You can follow me on Twitter @waleedassar

A Real Random VirtualAlloc

18 January 2013 at 00:26
By: walied
In this post i will discuss one disadvantage of using the "VirtualAlloc" function to allocate memory and also suggest a trick to play around this disadvantage.

If you ever used the "VirtualAlloc" functionΒ  to allocate memory, you must have noticed that addresses returned are almost the same over instances of the same process. This is due to the "ZwAllocateVirtualMemory" function doing nothing to ensure the randomness of the base address returned, at least in Windows 7.

N.B. VirtualAlloc is just a wrap up of the "VirtualAllocEx" function which is a wrap up of the ntdll "ZwAllocateVirtualMemory" function.

To test that fact, we will create a small application that does almost nothing but calling the "ZwAllocateVirtualMemory" function and printing the base address at which memory has been allocated.
The source code looks like below.

N.B. Even though the ASLR has nothing to do with randomizing the base address of memory returned by ZwAllocateVirtualMemory, we just set the "IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE" bit field for testing purposes.

Compile the above source code and run the application several times. See image below.

As you can see in the image above, the base address is the same (0x30000) across all instances of the process and this poses a security issue.

It seems that Microsoft has taken care of this issue while allocating stack memory for threads. Now in later versions of Windows e.g. Windows 7, the "RtlCreateUserStack" function which is responsible for reserving and committing the memory for threads is calling the "NtSetInformationProcess" function with a new information class to reserve the stack memory at a random address. The new process information class is ProcessThreadStackAllocation 0x29.

Now let's see how this new information class reserves memory.

Looking at the disassembly we can see that the function checks the "StackRandomizationDisabled" flag of the "_EPROCESS" structure. We can also see the function trying to randomize some variable by using the "SystemTime" field of the "SharedUserData" page, and the "RDTSC" instruction.

The function then calls the "MiScanUserAddressSpace" and "ZwAllocateVirtualMemory" functions to reserve memory at a random base address.


Now let's try to test the "ZwSetInformationProcess" function and see if addresses returned are really random. So, we compile the code in the image below and see.
N.B. Setting the "IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE" bit field is necessary for the "StackRandomizationDisabled" bit flag of the "_EPROCESS" structure to be unset.



As you can see in the two images above, in each time we invoked the application we got a random address for the memory allocated.


Conclusion:
using the "ProcessThreadStackAllocation" class of the "ZwSetInformationProcess" function, we can guarantee a random address for memory we allocate which can be considered a security enhancement.

Code and examples for this post can be found here.

You can follow me on Twitter @waleedassar

Wow64Log

26 January 2013 at 21:38
By: walied
In this post i will discuss an interesting functionality that i discovered while reversing Wow64.dll and specifically the "wow64!ProcessInit" function. Now let's take the function into assembly and see how it looks like.

The first thing the function does is open a registry key by calling the "ZwOpenKey" function with the "ObjectAttributes" parameter having the "ObjectName" member set to "REGISTRY\MACHINE\SOFTWARE\Microsoft\WOW64". So our first conclusion here is that the function tries to open the registry key "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Wow64" to retrieve specific information that may affect the Wow64 process throughout its lifetime. Usually, the key does not exist, at least on my machine (Windows 7 SP1 64-bit).


Next, if the key was successfully opened, the "Wow64GetWow64ImageOption" function is then called with the first parameter set to the opened key handle, the second parameter pointing at the wide string "Wow64ExecuteFlags", the third parameter set to 0x4 (REG_DWORD), the fourth parameter pointing at the variable that will receive the returned value, and the fifth parameter set to 0x4 (output size).

The "Wow64GetWow64ImageOption" function opens the IFEO registry key and queries the registry value whose name is the string pointed to by the second parameter and on error, it queries the same registry value under the registry key whose handle was given in the first parameter.

The extracted flags are then used to initialize three Wow64 global variables, Wow64!Wow64CommittedStackSize, Wow64!Wow64MaximumStackSize, and Wow64InfoPtr.


Whether the registry key was successfully opened or not and whether the "Wow64ExecuteFlags" value was successfully extracted or not, the "ProcessInit" function then directly jumps to the code you see in the image below.


As you can see it is trying to load a library called "Wow64Log.dll" residing in the "System32" directory by calling the ntdll "LdrLoadDll" function. Usually, this module does not exist.

N.B. Since this code is x64, the library Wow64Log.dll must be 64-bit.

Then comes some code that tries to resolve certain function addresses from Wow64Log.Dll by calling the "LdrGetProcedureAddress" function.

N.B. The kernel32 "GetProcAddress" function is just a wrap up of the ntdll "LdrGetProcedureAddress" function.

The code we see in the image above tries to resolve addresses of the "Wow64LogInitialize", "Wow64LogSystemService", "Wow64LogMessageArgList", and "Wow64LogTerminate"Β  functions from the Wow64Log.dll.Β  If any of the functions' addresses could not be resolved, the function fails and Wow64Log.dll is unloaded from the address space by calling the ntdll "LdrUnloadDll" function.

Assuming Wow64Log.dll was found in NtSystemRoot\\System32 and the above mentioned functions were found to be exported from it, we will have the global wow64.dll function pointers, "pfnWow64LogInitialize", "pfnWow64LogSystemService", "pfnWow64LogMessageArgList", and "pfnWow64LogTerminate" holding the address of the "Wow64LogInitialize", "Wow64LogSystemService", "Wow64LogMessageArgList", and "Wow64LogTerminate" functions respectively. The "Wow64LogInitialize" function will then be immediately called.

The "Wow64LogSystemService" function will be called every time the "Wow64!Wow64SystemServiceEx" function is called i.e. called with every system call being issued. This can be used for system call logging.

The "Wow64LogMessageArgList" function is called by the "Wow64!Wow64LogPrint" function to log certain events, more likely errors.

The "Wow64LogTerminate" function is called upon process termination by the "Wow64!whNtTerminateProcess" function.


The above mentioned topic can be used as a simple method for injecting 64-bit Dll's into Wow64 (32-Bit) processes by dropping Wow64Log.dll into system32.

Here is a simple Wow64Log.dll that i wrote as a demo.


You can follow me on Twitter @waleedassar

Wow64-Specific Anti-Debug Trick

29 January 2013 at 07:17
By: walied
In this post i will show you an anti-debug trick that i have recently found. The trick is specific to Wow64 processes. It rely on the fact that 32-bit debuggers e.g. OllyDbg, IDA Pro Debugger, and WinDbg_x86 don't receive debug events for certain exceptions originating from 64-bit code. One example of these exceptions is EXCEPTION_BREAKPOINT 0x80000003.

N.B. In a Wow64 process in Windows 7, its 32-bit code is executing in CS=0x23, while its 64-bit code is executing in CS=0x33.

Let's take for example the ntdll "DbgPrompt" function in Windows 7 64-bit.Β  I chose DbgPrompt for two reasons:
1) Calls to it end up with executing the INT 0x2D instruction, which raises an EXCEPTION_BREAKPOINT.
2) The 32-bit version of it (in 32-bit version of ntdll.dll) calls the 64-bit version of it (in 64-bit version of ntdll.dll).

N.B. The ntdll "DbgPrompt" function wraps up calls to the non-exported "DebugPrompt" function.

So, now if we call the "DbgPrompt" function from within our 32-bit code, we know that the call will end up with an EXCEPTION_BREAKPOINT raised from 64-bit mode.

The interesting thing here is that if you call the function without a debugger, the exception will be raised and its exception handler will be called. One the other hand, if a debugger is present, no exceptions are raised and the instruction following INT 2D will be executed.

Given the above knowledge, i wrote a simple demo for that Wow64-specific anti-debug trick. You can download the demo from here and its source code from here.







To bypass this trick, you have to use a 64-bit debugger where the exception will be raised and seen by the debugger.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar
❌