Normal view

There are new articles available, click to refresh the page.
Before yesterdayInclude Security Research Blog

Reverse Engineering Windows Printer Drivers (Part 2)

In our blog last post (Part 1), we discussed how you can find and extract drivers from executables and other packages and the general methodology for confirming that drivers are loaded and ready. We also highlighted the Windows driver architecture. In this post, we’ll focus on an introduction to the driver architecture, basics of reverse engineering drivers, and inspect a bug found in the drivers we’re analyzing as part of this series.

We will start with the Dot4.sys driver, as it is the largest of the three and probably contains the most functionality. First, we will identify the type of driver, go over how to figure out areas where user input is received, and then we will talk about the general architecture of the driver.

Identifying the Type of Driver

Immediately after loading the driver onto the system, we can check that the driver is a WDM or a WDF driver by checking its imports and by filtering for functions with the name Wdf. If any functions with the Wdf prefix are found, then it’s assumed that it is a WDF driver, otherwise we assume it’s a WDM driver.

Tips and Tricks

There are several publicly available data structures that you will commonly encounter when reverse engineering drivers. We’re using Ghidra for reverse engineering, but unfortunately it does not come with these data structures by default.

To set up Windows kernel data structures, we used the .gdt (Ghidra Data Type) found here. To install and use these types once a project has been loaded, go to Window -> Ghidra Datatype Manager. Once the datatype manager tab is visible, open the gdts (Open File Archive) as shown in the image below:

To verify that the types have been successfully loaded, use the search functionality to search for a Windows driver type (e.g. _DRIVER_OBJECT), as shown in the figure below:

Note: Ghidra has trouble when trying to decompile structures with unions and structures inside of these unions. For more references, check out this GitHub issue.

Finding the DriverEntry

Just like every program starts with main, drivers have an exported entrypoint known as either DriverEntry or DriverInitialize. We will use DriverEntry in this example.

DriverEntry can be found in the Ghidra symbol tree as DriverEntry. It’s an exported function, because the Windows kernel must call DriverEntry when it finishes loading the driver in memory. In Ghidra, often the entrypoint of a binary is just called entry.

The signature for the driver entry is shown below:

Analyzing DriverEntry

NTSTATUS DriverInitialize(
  _DRIVER_OBJECT *DriverObject,
  PUNICODE_STRING RegistryPath
)
{...}

The types of parameters for DriverEntry are shown in the function prototype above. In order to match this function signature in Ghidra, we can change the types of input parameters by right clicking on the current types of the parameters in DriverEntry and using the function Retype variable.

Major Functions

The driver accepts a _DRIVER_OBJECT* in its entry. This pointer contains an array to a table of function pointers to be populated by the driver called MajorFunction. The indexes to this array of functions correspond to specific events.

When reverse engineering Windows drivers, this array is commonly populated in DriverEntry . In Dot4.sys, this array is populated in DriverEntry, as shown in the figure below:

Each index in the MajorFunction table represents an IRP_MAJOR_CODE. These correspond to an action/event. For example:

MajorFunction[0] = IoCreate, 

The IRP code equivalent to 0 is IRP_MJ_CREATE. This means that whenever a user-mode function opens a handle to the device using WinAPI (CreateFile()), the function the driver assigned to this event is called. MajorFunction table callbacks all share the same function prototype.

It is important to know that these MajorFunction dispatches can happen in parallel with each other. This is important to note, because race conditions can be the source of bugs when there is no appropriate locking. 

While there are no locks, some internal I/O manager reference counters stop something like IRP_MJ_CLOSE happening at the same time as an IRP_MJ_DEVICE_CONTROL.

The most important IRP Major Function code is IRP_MJ_DEVICE_CONTROL. This is the portal through which user mode processes ask kernel mode drivers to perform actions using the WinAPI (DeviceIoControl), optionally sending user input and optionally receiving output.

Here is documentation to read more on the topic:

An important concept to understand is that each feature/command/action that your driver implements is represented by an ioctl code. This IOCTL is not an arbitrary number but is defined by a C macro, which is supplied with information that determines the transfer method, access control and other properties.

The output is an encoded ioctl code that has all these pieces of information inside of a single four byte number. The various methods of data transferred in ioctls are crucial for different needs, whether it be a light request passing in tens of bytes or super high speed hardware device capable of doing gigabytes of processing a second. The listing below shows a general dissection of an ioctl code.



Types of Data Transfer from User Mode to Kernel Mode



METHOD_BUFFERED is the simplest method, copying memory from user mode into an allocated kernel buffer, this type results in the least bugs, but it’s the most costly because when interacting with high speed hardware devices, constantly copying memory between user mode and kernel mode is slow.

METHOD_IN/OUT_DIRECT is more targeted for performance while still attempting to be safe. This method pre-initializes an MDL (Memory Descriptor List), which is a kernel mode reflection of a piece of memory in user mode. Using this method, copying and context switching is avoided but this can also be prone to certain types of bugs.

METHOD_NEITHER is the most flexible. It passes the user mode address of the buffer to the kernel mode driver. The driver developer determines how they want to transfer this memory and is the most difficult to correctly implement without introducing bugs.

A very handy tool capable of decoding these encoded ioctls is the osr online ioctl decoders.

Analyzing MajorFunctions

Now it’s time to explore where user input comes from in drivers. The function type of the MajorFunctions are:

NTSTATUS DriverDispatch(
  _DEVICE_OBJECT *DeviceObject,
  _IRP *Irp
)

All of these functions receive a _DEVICE_OBJECT that represents the device and an IRP (I/O Request Packet). The structure of the _IRP can be found here. It contains many members but for accessing information about the user input such as the input/output buffer sizes, the most important member is the IO_STACK_LOCATION* CurrentStackLocation. The input and out lengths of the buffers are always stored at this location.

inBufLength = CurrentStackLocation->Parameters.DeviceIoControl.InputBufferLength;

outBufLength = CurrentStackLocation->Parameters.DeviceIoControl.OutputBufferLength;

Warning:

Due to the bug mentioned earlier when working with Ghidra, Ghidra will not properly be able to find the IoStackLocation struct. Instead it is shown as offset 0x40 into the Tail parameter. Here is the original code:

PIO_STACK_LOCATION irpSp = IoGetCurrentIrpStackLocation( Irp );
ULONG IoControlCode =  irpSp->Parameters.DeviceIoControl.IoControlCode

And here is what it looks like in Ghidra:

IoCode = *(uint *)(*(longlong *)&(Irp->Tail).field_0x40 + 0x18);

Here we can see that field_0x40 is equivalent to getting the PIO_STACK_LOCATION and offset 0x18 into that pointer results in the IoCode. It is important to remember this because you will see it often.

Finding the Input

The actual member to access to reach the user buffer is different depending on the ioctl data method transfer type:

METHOD_BUFFERED:

The user data is stored in Irp->AssociatedIrp.SystemBuffer, there are no driver specific bugs, just common overflows or maybe miscalculations with userdata.

METHOD_IN/OUT_DIRECT:

The user data can be stored both in Irp->AssociatedIrp.SystemBuffer or the Irp->MdlAddress.

When using MdlAddress, the driver must call MmGetSystemAddressForMdlSafe to obtain the buffer address. MmGetSystemAddressForMdlSafe is a macro, so it’s not visible as an API in Ghidra. Instead, it can be identified with the following common pattern:

There is another type of bug that can occur when using data from MDLs. It’s known as a double-fetch, and this will be discussed later.

METHOD_NEITHER

This is the most complicated wait to retrieve and output data from/to user mode, as it is the most flexible but requires calling numerous APIs. Here is an example provided by Microsoft.

The virtual address of the User Buffer is located at CurrentStackLocation->Parameters.DeviceIoControl.Type3InputBuffer;

This type of IOCTL is not in Dot4.sys, therefore this will not be covered.

Our First Finding

Immediately opening the driver, there are several functions called in DriverEntry. Decompiling one of the functions results in the following output:

  int iVar1;
  int iVar2;
  int iVar3;
  RTL_QUERY_REGISTRY_TABLE local_b8 [2];
  
  memset(local_b8,0,0xa8);
  iVar2 = DAT_000334bc;
  iVar1 = DAT_000334b8;
  local_b8[0].Name = L"gTrace";
  local_b8[0].EntryContext = &DAT_000334b8;
  local_b8[0].DefaultData = &DAT_000334b8;
  local_b8[1].Name = L"gBreak";
  local_b8[0].DefaultType = 4;
  local_b8[0].DefaultLength = 4;
  local_b8[1].DefaultType = 4;
  local_b8[1].DefaultLength = 4;
  local_b8[1].EntryContext = &DAT_000334bc;
  local_b8[1].DefaultData = &DAT_000334bc;
  local_b8[0].Flags = 0x20;
  local_b8[1].Flags = 0x20;
  iVar3 = RtlQueryRegistryValues(0x80000001,L"Dot4",local_b8,0,0);
  if (iVar3 < 0) {
    DbgPrint("DOT4: call to RtlQueryRegistryValues failed - status = %x\n",iVar3);
  }
  else {
    if (iVar1 != DAT_000334b8) {
      DbgPrint("DOT4: gTrace changed from %x to %x\n",iVar1);
    }
    if (iVar2 != DAT_000334bc) {
      DbgPrint("DOT4: gBreak changed from %x to %x\n",iVar2);
    }
  }
  return;
}

Immediately visible in the function above are the calls to DbgPrint(), which print debug messages. It provides evidence that it changes global variables, indicated by the &DAT_ prefix in Ghidra. It uses RtlQueryRegistryValues() to query two registry keys named gTrace and gBreak, and two global variables at 0x334b8 and 0x334bc. We will refer to these global variables as gTrace and gBreak respectively.

It is important to note that the first argument of RtlQueryRegistryValues() is RTL_REGISTRY_SERVICES | RTL_REGISTRY_OPTIONAL (0x80000001)

Only admins can modify these keys, because only admins have access to the service registries, meaning that the bug we found in this fragment could not be used for local privilege escalation. However, we chose to continue exploring it as an exercise in analyzing Windows drivers.

When looking at the uses of gTrace and gBreak outside of this function, the messages indicate they are used to set breakpoints for debugging DriverEntry and to enable debug printing and logging.

gBreak

if ( (gBreak & 1) != 0 )
  {
    DbgPrint("DOT4: Breakpoint Requested - DriverEntry - gBreak=%x - pRegistryPath=<%wZ>\n", gBreak, SerivceName);
    DbgBreakPoint();
  }

gTrace

case 0x3A2006u:
      if ( (gTrace & 8) != 0 )
        result = DbgPrint("DOT4: Dispatch - IOCTL_DOT4_OPEN_CHANNEL - DevObj= %x\n", a1);
      break;

The API RtlQueryRegistryValues() is designed for retrieving multiple values from the Registry. MSDN has several remarks to make about this API:

“Caution: If you use the RTL_QUERY_REGISTRY_DIRECT flag, an untrusted user-mode application may be able to cause a buffer overflow. A buffer overflow can occur if a driver uses this flag to read a registry value to which the wrong type is assigned. In all cases, a driver that uses the RTL_QUERY_REGISTRY_DIRECT flag should additionally use the RTL_QUERY_REGISTRY_TYPECHECK flag to prevent such overflows.”

The flags for both of these fetches are 0x20 (RTL_QUERY_REGISTRY_DIRECT), and the bitflag for the RTL_QUERY_REGISTRY_TYPECHECK is #define RTL_QUERY_REGISTRY_TYPECHECK 0x00000100

What size variables are these registry checks expecting?

local_b8[0].DefaultLength = 4;

They are querying for a registry key with a 4 byte length (a DWORD), so we could possibly overflow these global variables by creating a registry key with the same name but of type REG_SZ.

The area that would be overflowed is DAT_000334b8. This would not be a stack overflow but a global buffer overflow. Global variables are typically stored alongside other global variables rather than on the stack, and as a result exploitation can be more difficult. For example, return addresses aren’t typically present so an attacker must rely on the application’s use of these variables to gain control of execution.

Testing the Bug

First, change the gTrace key at \Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\dot4 to a 40 character long string value of the letter A using REGEDIT (Registry Editor)

The changes are not immediate. We must unload and reload the driver to make sure that the entrypoint is called again. We can do this by unplugging and replugging the USB cable to the printer to which we are connected.

Now to confirm a finding ideally a VM setup would be used, but we can use WinDBG local kernel debugging to confirm our findings as all we need is data introspection.

Setting Up Debugging


In the Microsoft Store, search for WinDBG Preview, and install it.

In a administrator prompt, enable debugging by running the following command

bcdedit /debug on

and restart. Then, start debugging kernel live. There are certain constraints such as the inability to deal with breakpoints. WinDBG is limited to the actions it can retrieve from a program DUMP.

Once loaded, we attach a printer so that we can check for the presence of dot4 in the loaded modules list, using the following command:

lkd> lmDvm dot4
Browse full module list
start             end                 module name

Notice from the output that it’s not there. When this type of trouble arises, the number one thing to is to run is:

.reload

After checking .reload, the problem was fixed as seen in the output below:

lkd> lmDvm dot4
Browse full module list
start             end                 module name
fffff806`b7ed0000 fffff806`b7ef8000   Dot4       (deferred)             
    Image path: \SystemRoot\system32\DRIVERS\Dot4.sys
    Image name: Dot4.sys
    Browse all global symbols  functions  data
    Timestamp:        Mon Aug  6 19:01:00 2012 (501FF84C)
    CheckSum:         0002B6FE
    ImageSize:        00028000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

Debugging

Now that we’ve set up debugging, let’s calculate where the overflow would occur. Ghidra loads the driver at 

The address of the global variable gTrace overflowed is at 0x0334b8. The Relative Virtual Address from the base would be (0x334b8-0x10000) = 0x234b8.

kd> dq dot4+234b8
fffff806`b7fc34b8  00000000`00520050 ffffbb87`302c5290
fffff806`b7fc34c8  00000000`00000000 00000000`00680068
fffff806`b7fc34d8  ffff908f`ff709b10 00000000`00000000
fffff806`b7fc34e8  00000000`00000000 00000000`00000000
fffff806`b7fc34f8  00000000`00000000 00000000`00000000

Using the DQ command, we can check the qword value of the memory at dot4+234b8. The value of this variable has been set to 0x520050 (remember little endian) and gBreak is 0, but the value in front of QWORD at dot4+234b8 looks like a kernel address (visible from the prefixed fffff).

Checking the byte content (db) shows a very interesting hex dump:

lkd> db ffffbb87`302c5290
ffffbb87`302c5290  41 00 41 00 41 00 41 00-41 00 41 00 41 00 41 00  A.A.A.A.A.A.A.A.
ffffbb87`302c52a0  41 00 41 00 41 00 41 00-41 00 41 00 41 00 41 00  A.A.A.A.A.A.A.A.
ffffbb87`302c52b0  41 00 41 00 41 00 41 00-41 00 41 00 41 00 41 00  A.A.A.A.A.A.A.A.
ffffbb87`302c52c0  41 00 41 00 41 00 41 00-41 00 41 00 41 00 41 00  A.A.A.A.A.A.A.A.
ffffbb87`302c52d0  41 00 41 00 41 00 41 00-41 00 41 00 41 00 41 00  A.A.A.A.A.A.A.A.
ffffbb87`302c52e0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  .......

The above is a UNICODE_STRING, the structure in the kernel used for describing a string. The value 0x00520050 in front of the string describes two things:

  • The total length 0x0052 (82)
  • The string length 0x0050 (80)

Why are these larger than 40 letters? Because a UTF16 character occupies 2 bytes. It’s important not to forget that there’s a null byte at the end of the string (null byte is actually 2 bytes with UTF16). Meaning that the total length is 82 bytes and the actual string content length is 80 bytes

Exploitability

Following the 2 global variables comes 16 bytes of padding for alignment purposes, meaning that the only bytes we can overflow into are random bytes that do nothing, making this unexploitable.

Driver Architecture

Since the DOT4 driver is actually a default Microsoft driver, the IOCTL codes for it are open source for applications that want to implement a protocol over DOT4. In the Windows 10 SDK we were able to find all the publicly exposed IOCTLs:

#define FILE_DEVICE_DOT4         0x3a
#define IOCTL_DOT4_USER_BASE     2049
#define IOCTL_DOT4_LAST          IOCTL_DOT4_USER_BASE + 9
 
#define IOCTL_DOT4_CREATE_SOCKET                 CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE +  7, METHOD_OUT_DIRECT, FILE_ANY_ACCESS)
#define IOCTL_DOT4_DESTROY_SOCKET                CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE +  9, METHOD_OUT_DIRECT, FILE_ANY_ACCESS)
#define IOCTL_DOT4_CREATE_SOCKET CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE + 7, METHOD_OUT_DIRECT, FILE_ANY_ACCESS)
#define IOCTL_DOT4_WAIT_FOR_CHANNEL              CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE +  8, METHOD_OUT_DIRECT, FILE_ANY_ACCESS)
#define IOCTL_DOT4_OPEN_CHANNEL                  CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE +  0, METHOD_OUT_DIRECT, FILE_ANY_ACCESS)
#define IOCTL_DOT4_CLOSE_CHANNEL                 CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE +  1, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_DOT4_READ                          CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE +  2, METHOD_OUT_DIRECT, FILE_ANY_ACCESS)
#define IOCTL_DOT4_WRITE                         CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE +  3, METHOD_IN_DIRECT, FILE_ANY_ACCESS)
#define IOCTL_DOT4_ADD_ACTIVITY_BROADCAST        CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE +  4, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_DOT4_REMOVE_ACTIVITY_BROADCAST     CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE +  5, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_DOT4_WAIT_ACTIVITY_BROADCAST       CTL_CODE(FILE_DEVICE_DOT4, IOCTL_DOT4_USER_BASE +  6, METHOD_OUT_DIRECT, FILE_ANY_ACCESS)

The general control flow of the drivers follows the listing below:

The diagram shows that a thread is created and then it waits for an event. A user mode program calls the driver IOCTL then the I/O manager calls the MajorFunction. The MajorFunction then inputs the IOCTL into the IRP queue and then sets the event. Setting the event tells the WorkerThread waiting for IRPs to process that there are IRPs in the IRP queue. The worker thread clears the event and then starts processing the IRPs. When the worker thread finishes processing the IRP it calls IoCompleteRequest, which signals to the I/O manager that it can now return back to user mode.

It is important to understand that these device drivers try to do everything synchronously by using queues. This design eliminates race conditions, as at no moment are two threads ever working parallel to user requests.

IoCompleteRequest is the API that every driver must call to tell the I/O Manager that processing has been completed and that the user mode program can now inspect their input/output buffers.

The Attack Surface

Out of the IOCTLs we listed before, few take user input and those that do, don’t have very complicated inputs:

  • IOCTL_DOT4_DESTROY_SOCKET takes in 1 byte, the socket/channel index, which selects the DOT4 socket/channel to destroy.
  • IOCTL_DOT4_WAIT_FOR_CHANNEL takes in 1 bytes and uses it as a channel index.
  • IOCTL_DOT4_OPEN_CHANNEL takes in 1 byte and uses it as a channel index.
  • IOCTL_DOT4_READ – takes a quantity and an offset.
  • IOCTL_DOT4_WRITE – takes a quantity and an offset.

IOCTL_DOT4_CREATE_SOCKET is the most interesting ioctl as it takes in 56 bytes in the form of a certain structure, described by the code snippet below:

typedef struct _DOT4_DC_CREATE_DATA
{
    // This or service name sent
    unsigned char bPsid;
 
    CHAR pServiceName[MAX_SERVICE_LENGTH + 1];
 
    // Type (stream or packet) of channels on socket
    unsigned char bType;
 
    // Size of read buffer for channels on socket
    ULONG ulBufferSize;
 
    USHORT usMaxHtoPPacketSize;
 
    USHORT usMaxPtoHPacketSize;
 
    // Host socket id returned
    unsigned char bHsid;
 
} DOT4_DC_CREATE_DATA, *PDOT4_DC_CREATE_DATA;

This structure serves mainly for settings options when creating a new socket and assigning a name to the socket.

Common Bugs (Specific to Windows Drivers)

Double Fetch

The reason drivers use METHOD_NEITHER/METHOD_OUT/IN_DIRECT is because it allows them to read and write data from the user mode program without copying the data over to kernel mode. This is done by creating an MDL or receiving an MDL (Irp->MdlAddress).

This means that changes to the mapped user mode memory reflect to the mapped memory in KernelMode, therefore a value can change after passing certain checks.. Let’s take a piece of example code:

    int* buffer = MmGetSystemAddressForMdlSafe( mdl, NormalPagePriority | MdlMappingNoExecute );
 
    char LocalInformation[0x100] = { 0 };
    
    // Max Size we can handle is 0x100
    if (*buffer <= 0x100) {
        memcpy(LocalInformation, buffer, *buffer);
    } else {
        FailAndErrorOut();
    }

If *buffer was a stable variable and didn’t change, then this code would be valid and bug-free, but after the check happens, if a user mode thread decides to change the value of *buffer in their address space, this change would reflect into the kernel address space and then suddenly *buffer could be much larger than 0x100 after the check, introducing memory corruption.

The way to avoid these type of bugs is to store the information in a local variable that cannot change:

            int* buffer = MmGetSystemAddressForMdlSafe( mdl, NormalPagePriority | MdlMappingNoExecute );
 
            char LocalInformation[0x100] = { 0 };
            
            int Size = *buffer;
            
            // Max Size we can handle is 100
            if (Size <= 0x100) {
                memcpy(LocalInformation, buffer, Size);
            } else {
                FailAndErrorOut();
            }

In dot4.sys, every IOCTL that uses MDLs copies over the content before using it, except for IOCTL_DOT4_READ and IOCTL_DOT4_WRITE, which use the MDLs in copy operations to READ/WRITE once, so there’s no possibility of double fetch.

IoStatus.Information

When using METHOD_BUFFERED, the IO Manager determines how much information to copy over by looking at IoStatus.information, which is set by the driver dispatch routine called. It’s important to know that a driver should fill this member not to the size of the buffer that they want to return, but to the amount of content they actual wrote into the buffer, because the IO Manager does not initialize the buffer returned and if you haven’t actually used all the buffer, you may leak information.

Other Resources About Windows Driver Exploitation

We highly recommend Ilja Van Sprundel’s amazing talk about common driver mistakes:

Possible Exploration and Future Targets

This driver must parse commands that arrive from the printer, and this seems like an interesting candidate for fuzzing and could possibly open up printer->driver RCEs/LPEs. A quick look at the strings with search for reply brings up: 

It seems that there are nine Reply packet handlers, which are not enough for fuzzing, but maybe manually auditing. This, however, is out of the scope of this guide, as it is out of reach of a user mode program.

In this analysis, we audited one of the three drivers. The other two drivers are lower-level drivers in the device stack responsible for implementing interaction with the USB port to read and write messages. A potential area for more bugs would be cancellable IRPs, though we suspect that it uses cancel-safe IRP queues, but there could be cancellable IRP problems when passing IRPs down the device stack onto lower level drivers, etc.

In this analysis we focused on specific printer drivers that were publicly distributed, but the methodology used in this blog post is applicable to most drivers and is hopefully of use to those who are planning to audit a driver. Using these strategies and methodologies, you can find driver-specific bugs and possible vulnerabilities that could open up security concerns for your infrastructure and user devices.

The post Reverse Engineering Windows Printer Drivers (Part 2) appeared first on Include Security Research Blog.

Reverse Engineering Windows Printer Drivers (Part 1)

Note: This is Part 1 in a series of posts discussing security analysis of printer drivers extracted and installed from public resources. This part explains how we located publicly available drivers distributed by WeWork and conducted initial analysis. Part 2 come shortly after and will cover our exploration with in-depth technical details about how Windows kernel drivers work and the techniques we used to discover bugs in these drivers.

Almost every large organization uses printers, and while the printer market is fairly distributed, it is still heavily dominated by only a few players. Printers need kernel mode drivers to work so that they can communicate through USB and other means, though this is not always the case as modern operating systems are pivoting to user-mode drivers to ensure safety. A vulnerability in a kernel mode printer driver could result in Local Privilege Escalation (LPE) if exploited successfully.

In this two-part series, we’ll discuss the steps we took to analyze these drivers. We’ll also discuss some helpful background information for beginning analysis of Windows kernel-mode drivers.

Step 1. Find Driver Documentation or Public Resources

Since most of the public uses a search engine to find drivers, we will emulate the way a WeWork user would find print drivers so that we can also discuss the implications of using unofficial sources to find installers. The first step we took was to search for documentation and driver downloads in the same way as a user. The drivers found will be used in our analysis. 

What printers does WeWork use?

A quick online search provides these links: 


According to the setup documents, WeWork uses HP, Kyocera and Konica printers. Though this instructional manual seems to be from a non-official source, an attempt to run these installers will be unsuccessful as they expect to be connected to a printer. A search through WeWork’s publicly available documentation shows that for Russian and Chinese WeWork spaces, only the WeWork_HP_installer.exe is documented. It seems that either the other printers are much rarer, or WeWork does not publish documentation publicly.

Step 2. Unpacking Resources

Unpacking Windows Resources

With a bit of web crawling for “WeWork_Installer_HP.exe”, the HP installer executable can be found at https://s3.amazonaws.com/it-assets/printing/wework_installer_HP.exe.

Since this executable contains no digital signature, its origin from WeWork cannot be verified. VirusTotal shows that it is not flagged by any antivirus engines, but they advise to continue on a virtual machine (VM).

The installer does not display a prompt to select where files are stored similar to most common software installers, but we used ProcMon to identify where files are placed on the local machine. Typically, you would first check C:\Program Files or C:\Program Files (x86) for changes. In this analysis, a folder named WeWork_printer_drivers was found in C:\Program Files (x86), which contained two executable files: HP_UPD.exe and win_39754.exe. The files have the following icons displayed in Windows Explorer:

These executables are self-extracting 7-Zip executables and can be opened with the 7-Zip application.

Opening win_39754.exe shows some references to a printer client software known as Papercut, but this does not contain any driver.

Opening HP_UPD.exe (which presumably stands for HP Universal Printer Driver), points to a file directory that contains .inf files for these printers and their properties. See the following documentation for more information on .inf files:

https://docs.microsoft.com/en-us/windows-hardware/drivers/install/overview-of-inf-files


Exploring the files further, there are directories with the name drivers, with each directory containing a subdirectory named either WINXP, WIN2000, AMD64. These directories contain drivers. Out of the directory names, AMD64 is the one most modern architecture for modern day windows operating systems.

Extracting the drivers in this folder, there are 5 drivers:

  • HPZid412.sys
  • HPZisc12.sys
  • HPZipr12.sys
  • HPZius12.sys
  • HPZs2k12.sys

These files all have additional information about them in their properties. Their properties can be viewed by right-clicking on them and selecting Properties->Details, where their descriptions and their original file names are shown.

They seem to be used for implementing the DOT4 (IEEE 1284.4) multiplexing data channel protocol over USB. In fact, the original filenames are references to Microsoft default DOT4 protocol drivers, and the strings of the original Dot4 Microsoft drivers are extremely similar to the HP drivers, almost exact. For more confirmation, BinDiff could be used to check the similarity of the two binaries. 

Unpacking MacOS resources

After an attempt to find the package described in the public facing documents, we settled with the file in the MacOSPrinterSetup instructions, which provides a DMG file.

Opening the DMG file in 7-Zip presents the following directory structure:

Immediately, the most interesting place to find drivers would be the .pkg file that contains the packages which contain binaries. Opening in 7-Zip provides folders:

From the above list of files, the most relevant to kernel drivers would be a KEXT (Kernel Extension), and it seems there is only one relevant package with kext in its name: com.hp.print.ps.kext.pkg. Opening it in 7-Zip results in the files below:

The directory contains these files, the most important of which is the Payload file which contains the actual binaries. We can open this file in 7-Zip and it contains numerous empty path folders which just hold other folders. KEXTs are folders that contain plists (files that describe the KEXT) and the MACH-O binaries. The path to the KEXT in the Payload file is shown below:

Payload\System\Library\Extensions\hp_io_printerclassdriver_enabler.kext\Contents

This is the path inside the payload to the KEXT contents folder. It provides the directory structure below:

CodeSignature is a directory of signatures for verifying the file. The Info.plist file describes the properties of the KEXT and Version.plist contains version numbers, but where is the binary?

As it turns out, this KEXT is a Codeless Kernel Extension, which can be verified by looking in the Info.plist file containing properties in an XML format. Specifically, KEXTs with binaries contain the CFBundleExecutable property. Inspecting the Info.plist of this KEXT, we find no CFBundleExecutable property.

The purpose of this KEXT is to point the operating system to the subsystems which this hardware device (the printer) uses, and direct it to the NON-KERNEL driver responsible for handling the hardware (IOKit). The XML keys responsible for telling us which pkg is responsible for handling this printer is the USB Printing Class

<key>IOProviderClass</key>
    <string>IOUSBInterface</string>
<key>IOProviderMergeProperties</key>
    <dict>
        <key>ClassicMustNotSeize</key>
            <true/>
        <key>USB Printing Class</key>
            <string>/Library/Printers/hp/Frameworks/HPDeviceModel.framework/Runtime/HPIOPrinterClassDriver.plugin</string>

In the string above, we see a path to a user mode plugin. A word in this path provides a clue into which package contains this plugin. HPDeviceModel, the process used to inspect this plugin, can also be used for the IOKit user mode driver (com.hp.DeviceModel4.pkg / HPIOPrinterClassDriver.plugin). 

Note: Unpacking these macOS driver packages confirms that these drivers are user-mode drivers and not kernel-mode drivers. We did not pursue further analysis on the macOS drivers as the value from attacking them is far less than kernel-mode drivers.

Step 3. Confirmation 

Note: For this step, we will use Windows as it is the only one with Kernel Drivers.

With our research, we now know that the HP drivers are the Dot4 default drivers. This theory can be tested by connecting a printer that supports Dot4 to your computer via USB,and then using a tool like WinObjEx64, which can inspect loaded drivers. 

Browsing the loaded drivers shows:

From the image above, you can confirm that three drivers are loaded: dot4, Dot4Print and dot4usb. The loaded drivers indicate that the operating system is ready to interact with the printer. Despite the fact that there were 5 drivers, it seems (from analysis) that only three drivers are loaded on modern systems. The three files unpacked are: 

  • dot4.sys -> HPZid412.sys
  • dot4prt.sys -> HPZipr12.sys
  • dot4usb.sys -> HPZius12.sys

The binaries for these default dot4 drivers can be found at C:\Windows\System32\Drivers once they have been loaded for the first time.

Devices listed on the system are show in the image below:

While drivers show that the operating system is ready to interact with the printer, it is ultimately up to a user-mode application to initiate a printing sequence. The application can initiate a printing sequence if the drivers present an interactable device to the user-mode application. In the image above, a dot4 device that allows for interaction between user-mode and the driver exists on the system.

Step 4. Architecture and Research

The Windows operating system is massive. It hosts a variety of subsystems, so we focused our research on Windows during analysis. For this research, the goal was to study the different types of drivers and how they affect security. 

Types of Windows Drivers

It’s important to understand that there are several types of Windows drivers and frameworks: 

WDM – The first type of drivers that were created were WDM (Windows Driver Models). This driver is a raw driver and manages resources and devices. When it came to device drivers this seemed to be almost an impossible task due to the endless amount of state that had to be managed, this issue is discussed in depth in old Microsoft archives that can be found here.

https://channel9.msdn.com/Shows/Going+Deep/Doron-Holan-Kernel-Mode-Driver-Framework?term=kernel&lang-en=true

KMDF – The Kernel Mode Driver Framework (KMDF) was invented to relieve some of the difficulties developing device drivers, giving developers APIs that would handle edge cases. It implements state machines for PnP, I/O, and others.

UMDF – The User Mode Driver Framework (UMDF) is the user-mode equivalent of KMDF.

WDF – The Windows Driver Framework (WDF) is a term that encompasses KMDF and UMDF.

Conclusion

For this first post in our WeWork printer analysis series, we found resources online and unpacked them. The analysis covered in this post is the initial step in identifying WeWork printer drivers so that we can research further into their security. In the next post, we will look into reverse engineering and attempting to discover exploitable bugs in these drivers. 

The post Reverse Engineering Windows Printer Drivers (Part 1) appeared first on Include Security Research Blog.

Customizing Semgrep Rules for Flask/Django and Other Popular Web Frameworks

We customize and use Semgrep a lot during our security assessments at IncludeSec because it helps us quickly locate potential areas of concern within large codebases. Static analysis tools (SAST) such as Semgrep are great for aiding our vulnerability hunting efforts and usually can be tied into Continuous Integration (CI) pipelines to help developers catch potential vulnerabilities early in the development process.  In a previous post, we compared two static analysis tools: Brakeman vs. Semgrep. A key takeaway from that post is that when it comes to custom rules, we found that Semgrep was easy to use.

The lovely developers of Semgrep, as well as the general open source community provide pre-written rules for many frameworks that can be used with extreme ease–all it requires is a command line switch and it works. For example:

semgrep --config "p/flask"

Running this on its own can catch bad practices and mistakes. However, writing custom rules can expand Semgrep’s out-of-the-box functionality significantly and is done by advanced security assessors who understand code level security concerns. Whether you want to add rules that look for more specific problems or similar rules with a bigger scope, it’s up to the end-user rule writer to expand in whichever direction they want.

In this post, we walk through some scenarios to write custom Semgrep rules for two popular Python frameworks: Django and Flask.

Why Write Custom Rules for Frameworks?

We see a lot of applications built on top of frameworks like Django and Flask and wanted to prevent duplicative manual effort to identify similar patterns of security concerns on every assessment. While the default community rules are very good in Semgrep, at IncludeSec we needed more than that. Making use of Semgrep’s powerful rules system makes it possible to extend these to cover even more sources of bugs related to framework usage, such as:

  • vulnerabilities caused by use of specific deprecated APIs
  • vulnerabilities caused by lack of error checking in specific patterns
  • vulnerabilities introduced due to lack of locking/mutexes
  • specific combinations of API calls that can cause inefficiencies or loss of performance, or even introduce race conditions

If any of these issues occur frequently on specific APIs then Semgrep is ideal since a one time investment will pay off dividends in future development process.

Making Use of Frameworks 

For developers, using frameworks like Django and Flask make coding easier and more secure. But they aren’t foolproof. If you use them incorrectly, it is still possible to make mistakes. And for each framework, these mistakes tend to follow common patterns.

SAST tools like Semgrep offer the possibility of automating checks for some of these patterns of mistakes to find vulnerabilities that may be common within a framework. 

An analogy for SAST tooling is a compiler whose warnings/errors you can configure extremely easily. This makes it a perfect fit when programming specific frameworks, as you can catch potentially dangerous usages of APIs & unsafe operations before code is ever committed. For auditors it is extremely helpful when working with large codebases, which can be daunting at first due to the sheer amount of code. SAST tooling can locate security “codesmells”, and where there is codesmell, there are often leads to possible security concerns.

Step 1. Find patterns of mistakes

In order to write custom rules for a framework, you first have to do some research to identify where in the framework mistakes might occur.

The first place to look when identifying bad habits is the official documentation — often one can find big blocks of formatting with the words WARNING, ERROR, MISTAKE. These blocks can often clue you into common problems with examples, avoiding time wasted searching forums/Stack Overflow posts for common bugs.

The next place to search where one can find real world practical examples would be bug bounty platforms, such as HackerOne, BugCrowd, etc. Searching these platforms can result in quite niche but severe mistakes that might not be in official documentation but can occur in live production applications.

Finally, intentionally vulnerable “hack me” applications such as django.nV, which explain common vulnerabilities that might occur. With concise, straightforward exercises that one can do to learn and also hammer in the impact of the bugs at hand.

For example, in the Flask documentation for logins https://flask-login.readthedocs.io/en/latest/#login-example , a warning block mentions that 

Warning: You MUST validate the value of the next parameter. If you do not, your application will be vulnerable to open redirects. For an example implementation of is_safe_url see this Flask Snippet.

This block warns us about open redirects in the specific login situation it presents, we’ll use something similar for our vulnerable code example: an open redirect where the redirect parameter comes from a url encoded GET request.

Step 2. Identify the pieces of information and the markers in your code

When writing rules, we have to identify the pieces of information that the specific code encodes. This way we can ensure that the patterns we write will be as accurate as possible. Let’s look at an example from Flask:

from flask import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

In this example code, we can see a piece of Flask code that contains an open redirect vulnerability. We can dissect it into its various properties and see how we can match this in Semgrep. First we’ll mention the specific semantics of this function and what exactly we want to match.

Properties:

1. @app.route("/redirect/") – Already on the first line we see that our target functions have a route decorator that tells us that this function is used to handle a request, or that it directly receives user input by virtue of being an endpoint handler. Matching route/endpoint handlers is effective because input to an endpoint handler is unsanitized and could be a potential area of concern: 

from flask import redirect 
 
def do_redirect(uri):
    if is_logging_enabled():
        log(uri)
    
    return redirect(uri)
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    
    if unsafe_uri(uri):
        return redirect_to_index()
    
    return do_redirect(uri)

In the listing above if we were to match every function that includes do_redirect instead of only route handlers that include do_redirect we could end up with false positives where an input to a function has already been sanitized. Already here we have some added complexity that does not bode well with other static analysis tools. In this case we would match do_redirect even though the URI it receives has already been sanitized in the function unsafe_uri(uri). This brings us to our first constraint: we need to match route handlers. 

2.    def handle_request(uri):here it’s important that we match a function right below the function decorator, and that this function takes in a parameter. We could match any function that has a route decorator which also contains a redirect, but then we could possibly match a function where the redirect input is constant or comes from sanitized storage. Matching a route handler with a parameter guarantees that it receives unsanitized user input. We can be sure of this because Flask does not do any URL sanitization. Specifying this results in more accurate matching and finer detection and brings us to our second constraint: that we need to match route handlers with 1 or more parameters

3.    return redirect(uri)here it may seem obvious, all we have to do is match redirect, right? Sadly, it is not that easy. Many APIs can have generic names that may collide with other modules using a generic text/regex search, this can be especially problematic in languages that support function overloading, where a specific overloaded instance of a function may have problems, but other overloaded instances are fine. Not accounting for these may result in many false positives. For example, consider the following snippet:

from robot import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

If we only matched redirect, we would match the redirect function from a module named robot which could be a false positive. An even more horrifying scenario to match would be an API or module that is imported under another name, e.g.:

from flask import redirect as rd

Thankfully, specifying the origin of the function allows Semgrep to handle all these cases and we’ll go more into detail on this when developing the patterns.

What does a good pattern account for?

A good pattern depends on your goals and how you use rules: finding performance bottlenecks, enforcing better programming practices, or finding security concerns as an auditor, everyone’s needs are different.

For a security assessment, it is important to find potential areas of concern, for example often areas that do not include sanitization are potentially dangerous. Ideally we want to eliminate as many false positives as possible and we can do this by excluding functions with sanitization. This brings us to our final constraint: we don’t want to match any functions containing sanitization keywords.

The Constraints

So far we have the following constraints:

  • match a route handler
  • match a function that takes in 1 or more parameters
  • match a redirect in the function that takes in a parameter from the function
  • IDEALLY: don’t match a function containing sanitization keywords

Step 3. Developing The Pattern

Now that we know all the constraints, and the semantics of the code we want to match we can finally start writing the pattern. I’ll put the end pattern for display, and we’ll dissect it together. Semgrep takes YAML files that describe multiple rules. Each rule contains a specific pattern to match.

 rules:
- id: my_pattern_id
  languages:
  - python
  message: found open redirect
  severity: ERROR
  patterns:
  - pattern-inside: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify) 

rules: – Every Semgrep rule file has to start with the rules tag, this is an array of rules as a Semgrep rule file may contain multiple rules.

- id: my_pattern_id Every Semgrep rule in the rules array has an id, this is essentially the name of the rule and must be unique.

languages: 
  - python

The language this rule works with. This determines how it parses the pattern & which files it checks.

message: found open redirect the message displayed when the Semgrep search matches a pattern, you can think of this like a compiler warning message.

severity: ERROR determines the color and other aspects of the messages upon a successful match. You can think of this as a compiler error, except it’s just a more severe warning, this is good for filtering through different levels of matches with Semgrep, or to cut down on time by searching only for erroneous patterns.

patterns:
  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify)

This is the final part of the rule and contains the actual logic of the pattern, a rule has to contain a top-level pattern element. In order for a match to be successful the final result of all the logic has to be true. In this case the top level element is a patterns, which only returns true if all the elements underneath it return true.

  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)

This pattern searches for code that satisfies the first 3 constraints, with the ellipsis representing anything. @app.route(...) will match any call to that function with any number of arguments (including none).

def $X(..., $URI_VAR, ...):

matches any function, and stores its name in the variable $X. It then matches any argument in this function, whether it be in the middle or at the end and stores it in $URI_VAR.

The Ellipsis following matches any code in this function until the next statement in the pattern which in this case is flask.redirect($URI_VAR) which matches redirect only if its arguments come from the function variable $URI_VAR. If these constraints are all satisfied, it then passes the text it matches onto the next pattern and it returns true.

One amazing feature of Semgrep is its ability to match fully qualified function names, even when they are imported with an alias. In this case, matching flask.redirect($URI_VAR) would match only redirects from flask, even if they are imported with another name (such as redir or rd).

- pattern-not-regex: (sanitize|validate|safe|check|verify)

This pattern is responsible for eliminating potential false positives. It’s very simple: it runs a regex against the matched text and if the regex comes back with any matches, it returns false otherwise it returns true. With this we’re checking if likely sanitization elements exist in the function code. The text that is used to check for these sanitization elements is obviously not perfect, but it can be tailored to the project you are working on and can always be extended to include more possible keywords. Alternatively it can be removed completely when considering the false positives vs. missed true positives balance.

Step 4. Testing & Debugging

Now that we’ve made our pattern, we can test it on the online Semgrep playground to see if it works. Here we can make small changes and get instant feedback in order to improve our patterns. Below is an example of the rules at work matching the unsanitized open redirect and ignoring the safe redirect.

https://semgrep.dev/s/65lY

Trade Offs, Quantity vs Quality

When designing these patterns, it’s possible to spend all your time trying to write the best pattern that catches every situation, filters out all the false-positives and what not, but this is an almost futile endeavor and can lead into rabbit holes. Also, overly precise rules may filter things that weren’t even meant to be filtered. The dilemma always comes down to how many false positives are you willing to handle–this tradeoff is up to Semgrep users to decide for themselves. When absolutely critical it may be better to have more false positives but to catch everything, whereas from an auditor’s perspective it may be better to have a more precise ruleset to start with a good lead and to be efficient, and then audit unmatched code later. Or perhaps a graduated approach where higher false positive rules are enabled for subsequent runs of SAST tooling.

Return on Investment

When it comes to analysis tools, it’s important to understand how much you need to set up & maintain to truly get value back. If they are complicated to update and maintain sometimes it’s just not worth it. The great upside to Semgrep is the ease of use–one can start developing patterns after doing the 20 minute tutorial and make a significant amount of rules in a day, and the benefits can be felt immediately. It requires no fiddling with versions or complicated compiler setup, and once a ruleset has been developed it’ll work on any supported languages. 

Showcase – Django.nV

Django.nV is a very well-made intentionally vulnerable application that uses the Django framework to introduce a variety of bugs for learning framework-specific penetration testing, from XSS to more framework specific bugs. Thanks to nVisium for making a great training application open source!

We used Django.nV to test IncludeSec’s inhouse rules and came up with 4 new instances of vulnerabilities that the community rulesets missed:

django.nV/taskManager/settings.py
severity:warning rule:MD5Hasher for password: use a more secure hashing algorithm for password
124:PASSWORD_HASHERS = ['django.contrib.auth.hashers.MD5PasswordHasher']
 
django.nV/taskManager/templates/taskManager/base_backend.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
58:                        <span class="username"><i class="fa fa-user fa-fw"></i> {{ user.username|safe }}</span>
 
django.nV/taskManager/templates/taskManager/tutorials/base.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
54:                        <span class="username">{{ user.username|safe }}</span>
 
django.nV/taskManager/views.py
severity:warning rule:django open redirect: unvalidated open redirect
394:    return redirect(request.GET.get('redirect', '/taskManager/'))

MD5Hashing – detects that the MD5Hasher has been used for passwords, which is cryptographically insecure.

Unsafe template usage in HTML – detects the use of user parameters with the safe keyword in html, which could introduce XSS.

Open redirect – very similar to the example patterns we already discussed. It detects an open redirect in the logout view.

We’ve collaborated with the kind developers of Semgrep and the people over at returntocorp (r2c) to get certain rules in the default Django Semgrep rule repository.

Conclusion

In conclusion, Semgrep makes it relatively painless to write custom static analysis rules to audit applications. Improper usage of framework APIs can be a common source of bugs, and we at IncludeSec found that a small amount of up front investment learning the syntax paid dividends when auditing applications using these frameworks.

The post Customizing Semgrep Rules for Flask/Django and Other Popular Web Frameworks appeared first on Include Security Research Blog.

Customizing Semgrep Rules for Flask/Django and Other Popular Web Frameworks

We customize and use Semgrep a lot during our security assessments at IncludeSec because it helps us quickly locate potential areas of concern within large codebases. Static analysis tools (SAST) such as Semgrep are great for aiding our vulnerability hunting efforts and usually can be tied into Continuous Integration (CI) pipelines to help developers catch potential vulnerabilities early in the development process.  In a previous post, we compared two static analysis tools: Brakeman vs. Semgrep. A key takeaway from that post is that when it comes to custom rules, we found that Semgrep was easy to use.

The lovely developers of Semgrep, as well as the general open source community provide pre-written rules for many frameworks that can be used with extreme ease–all it requires is a command line switch and it works. For example:

semgrep --config "p/flask"

Running this on its own can catch bad practices and mistakes. However, writing custom rules can expand Semgrep’s out-of-the-box functionality significantly and is done by advanced security assessors who understand code level security concerns. Whether you want to add rules that look for more specific problems or similar rules with a bigger scope, it’s up to the end-user rule writer to expand in whichever direction they want.

In this post, we walk through some scenarios to write custom Semgrep rules for two popular Python frameworks: Django and Flask.

Why Write Custom Rules for Frameworks?

We see a lot of applications built on top of frameworks like Django and Flask and wanted to prevent duplicative manual effort to identify similar patterns of security concerns on every assessment. While the default community rules are very good in Semgrep, at IncludeSec we needed more than that. Making use of Semgrep’s powerful rules system makes it possible to extend these to cover even more sources of bugs related to framework usage, such as:

  • vulnerabilities caused by use of specific deprecated APIs
  • vulnerabilities caused by lack of error checking in specific patterns
  • vulnerabilities introduced due to lack of locking/mutexes
  • specific combinations of API calls that can cause inefficiencies or loss of performance, or even introduce race conditions

If any of these issues occur frequently on specific APIs, Semgrep is ideal as a one time investment will pay off dividends in the development process.

Making Use of Frameworks 

For developers, using frameworks like Django and Flask make coding easier and more secure. But they aren’t foolproof. If you use them incorrectly, it is still possible to make mistakes. And for each framework, these mistakes tend to follow common patterns.

SAST tools like Semgrep offer the possibility of automating checks for some of these patterns of mistakes to find vulnerabilities that may be common within a framework. 

An analogy for SAST tooling is a compiler whose warnings/errors you can configure extremely easily. This makes it a perfect fit when programming specific frameworks, as you can catch potentially dangerous usages of APIs & unsafe operations before code is ever committed. For auditors it is extremely helpful when working with large codebases, which can be daunting at first due to the sheer amount of code. SAST tooling can locate security “codesmells”, and where there is codesmell, there are often leads to possible security issues.

Step 1. Find patterns of mistakes

In order to write custom rules for a framework, you first have to do some research to identify where in the framework mistakes might occur.

The first place to look when identifying bad habits is the official documentation — often one can find big blocks of formatting with the words WARNING, ERROR, MISTAKE. These blocks can often clue you into common problems with examples, avoiding time wasted searching forums/Stack Overflow posts for common bugs.

The next place to search where one can find real world practical examples would be bug bounty platforms, such as HackerOne, BugCrowd, etc. Searching these platforms can result in quite niche but severe mistakes that might not be in official documentation but can occur in live production applications.

Finally, intentionally vulnerable “hack me” applications such as django.nV, which explain common vulnerabilities that might occur. With concise, straightforward exercises that one can do to learn and also hammer in the impact of the bugs at hand.

For example, in the Flask documentation for logins https://flask-login.readthedocs.io/en/latest/#login-example , a warning block mentions that 

Warning: You MUST validate the value of the next parameter. If you do not, your application will be vulnerable to open redirects. For an example implementation of is_safe_url see this Flask Snippet.

This block warns us about open redirects in the specific login situation it presents, we’ll use something similar for our vulnerable code example: an open redirect where the redirect parameter comes from a url encoded GET request.

Step 2. Identify the pieces of information and the markers in your code

When writing rules, we have to identify the pieces of information that the specific code encodes. This way we can ensure that the patterns we write will be as accurate as possible. Let’s look at an example from Flask:

from flask import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

In this example code, we can see a piece of Flask code that contains an open redirect vulnerability. We can dissect it into its various properties and see how we can match this in Semgrep. First we’ll mention the specific semantics of this function and what exactly we want to match.

Properties:

1. @app.route("/redirect/") – Already on the first line we see that our target functions have a route decorator that tells us that this function is used to handle a request, or that it directly receives user input by virtue of being an endpoint handler. Matching route/endpoint handlers is effective because input to an endpoint handler is unsanitized and could be a potential area of concern: 

from flask import redirect 
 
def do_redirect(uri):
    if is_logging_enabled():
        log(uri)
    
    return redirect(uri)
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    
    if unsafe_uri(uri):
        return redirect_to_index()
    
    return do_redirect(uri)

In the listing above if we were to match every function that includes do_redirect instead of only route handlers that include do_redirect we could end up with false positives where an input to a function has already been sanitized. Already here we have some added complexity that does not bode well with other static analysis tools. In this case we would match do_redirect even though the URI it receives has already been sanitized in the function unsafe_uri(uri). This brings us to our first constraint: we need to match route handlers. 

2.    def handle_request(uri):here it’s important that we match a function right below the function decorator, and that this function takes in a parameter. We could match any function that has a route decorator which also contains a redirect, but then we could possibly match a function where the redirect input is constant or comes from sanitized storage. Matching a route handler with a parameter guarantees that it receives unsanitized user input. We can be sure of this because Flask does not do any URL sanitization. Specifying this results in more accurate matching and finer detection and brings us to our second constraint: that we need to match route handlers with 1 or more parameters

3.    return redirect(uri)here it may seem obvious, all we have to do is match redirect, right? Sadly, it is not that easy. Many APIs can have generic names that may collide with other modules using a generic text/regex search, this can be especially problematic in languages that support function overloading, where a specific overloaded instance of a function may have problems, but other overloaded instances are fine. Not accounting for these may result in many false positives. For example, consider the following snippet:

from robot import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

If we only matched redirect, we would match the redirect function from a module named robot which could be a false positive. An even more horrifying scenario to match would be an API or module that is imported under another name, e.g.:

from flask import redirect as rd

Thankfully, specifying the origin of the function allows Semgrep to handle all these cases and we’ll go more into detail on this when developing the patterns.

What does a good pattern account for?

A good pattern depends on your goals and how you use rules: finding performance bottlenecks, enforcing better programming practices, or finding security concerns as an auditor, everyone’s needs are different.

For a security assessment, it is important to find potential areas of concern, for example often areas that do not include sanitization are potentially dangerous. Ideally we want to eliminate as many false positives as possible and we can do this by excluding functions with sanitization. This brings us to our final constraint: we don’t want to match any functions containing sanitization keywords.

The Constraints

So far we have the following constraints:

  • match a route handler
  • match a function that takes in 1 or more parameters
  • match a redirect in the function that takes in a parameter from the function
  • IDEALLY: don’t match a function containing sanitization keywords

Step 3. Developing The Pattern

Now that we know all the constraints, and the semantics of the code we want to match we can finally start writing the pattern. I’ll put the end pattern for display, and we’ll dissect it together. Semgrep takes YAML files that describe multiple rules. Each rule contains a specific pattern to match.

 rules:
- id: my_pattern_id
  languages:
  - python
  message: found open redirect
  severity: ERROR
  patterns:
  - pattern-inside: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify) 

rules: – Every Semgrep rule file has to start with the rules tag, this is an array of rules as a Semgrep rule file may contain multiple rules.

- id: my_pattern_id Every Semgrep rule in the rules array has an id, this is essentially the name of the rule and must be unique.

languages: 
  - python

The language this rule works with. This determines how it parses the pattern & which files it checks.

message: found open redirect the message displayed when the Semgrep search matches a pattern, you can think of this like a compiler warning message.

severity: ERROR determines the color and other aspects of the messages upon a successful match. You can think of this as a compiler error, except it’s just a more severe warning, this is good for filtering through different levels of matches with Semgrep, or to cut down on time by searching only for erroneous patterns.

patterns:
  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify)

This is the final part of the rule and contains the actual logic of the pattern, a rule has to contain a top-level pattern element. In order for a match to be successful the final result of all the logic has to be true. In this case the top level element is a patterns, which only returns true if all the elements underneath it return true.

  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)

This pattern searches for code that satisfies the first 3 constraints, with the ellipsis representing anything. @app.route(...) will match any call to that function with any number of arguments (including none).

def $X(..., $URI_VAR, ...):

matches any function, and stores its name in the variable $X. It then matches any argument in this function, whether it be in the middle or at the end and stores it in $URI_VAR.

The Ellipsis following matches any code in this function until the next statement in the pattern which in this case is flask.redirect($URI_VAR) which matches redirect only if its arguments come from the function variable $URI_VAR. If these constraints are all satisfied, it then passes the text it matches onto the next pattern and it returns true.

One amazing feature of Semgrep is its ability to match fully qualified function names, even when they are imported with an alias. In this case, matching flask.redirect($URI_VAR) would match only redirects from flask, even if they are imported with another name (such as redir or rd).

- pattern-not-regex: (sanitize|validate|safe|check|verify)

This pattern is responsible for eliminating potential false positives. It’s very simple: it runs a regex against the matched text and if the regex comes back with any matches, it returns false otherwise it returns true. This element is responsible for checking if sanitization elements exist in the function code. The text that is used to check for these sanitization elements is obviously not perfect, but it can be tailored to the project you are working on and can always be extended to include more possible keywords.

Step 4. Testing & Debugging

Now that we’ve made our pattern, we can test it on the online Semgrep playground to see if it works. Here we can make small changes and get instant feedback in order to improve our patterns. Below is an example of the rules at work matching the unsanitized open redirect and ignoring the safe redirect.

https://semgrep.dev/s/65lY

Trade Offs, Quantity vs Quality

When designing these patterns, it’s possible to spend all your time trying to write the best pattern that catches every situation, filters out all the false-positives and what not, but this is an almost futile endeavor and can lead into rabbit holes. Also, overly precise rules may filter things that weren’t even meant to be filtered. The dilemma always comes down to how many false positives are you willing to handle–this tradeoff is up to Semgrep users to decide for themselves. When absolutely critical it may be better to have more false positives but to catch everything, whereas from an auditor’s perspective it may be better to have a more precise ruleset to start with a good lead and to be efficient, and then audit unmatched code later. Or perhaps a graduated approach where higher false positive rules are enabled for subsequent runs of SAST tooling.

Return on Investment

When it comes to analysis tools, it’s important to understand how much you need to set up & maintain to truly get value back. If they are complicated to update and maintain sometimes it’s just not worth it. The great upside to Semgrep is the ease of use–one can start developing patterns after doing the 20 minute tutorial and make a significant amount of rules in a day, and the benefits can be felt immediately. It requires no fiddling with versions or complicated compiler setup, and once a ruleset has been developed it’ll work on any supported languages. 

Showcase – Django.nV

Django.nV is a very well-made intentionally vulnerable application that uses the Django framework to introduce a variety of bugs for learning framework-specific penetration testing, from XSS to more framework specific bugs. Thanks to nVisium for making a great training application open source!

We used Django.nV to test IncludeSec’s inhouse rules and came up with 4 new instances of vulnerabilities that the community rulesets missed:

django.nV/taskManager/settings.py
severity:warning rule:MD5Hasher for password: use a more secure hashing algorithm for password
124:PASSWORD_HASHERS = ['django.contrib.auth.hashers.MD5PasswordHasher']
 
django.nV/taskManager/templates/taskManager/base_backend.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
58:                        <span class="username"><i class="fa fa-user fa-fw"></i> {{ user.username|safe }}</span>
 
django.nV/taskManager/templates/taskManager/tutorials/base.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
54:                        <span class="username">{{ user.username|safe }}</span>
 
django.nV/taskManager/views.py
severity:warning rule:django open redirect: unvalidated open redirect
394:    return redirect(request.GET.get('redirect', '/taskManager/'))

MD5Hashing – detects that the MD5Hasher has been used for passwords, which is cryptographically insecure.

Unsafe template usage in HTML – detects the use of user parameters with the safe keyword in html, which could introduce XSS.

Open redirect – very similar to the example patterns we already discussed. It detects an open redirect in the logout view.

We’ve collaborated with the kind developers of Semgrep and the people over at returntocorp (ret2c) to get certain rules in the default Django Semgrep rule repository.

Conclusion

In conclusion, Semgrep makes it relatively painless to write custom static analysis rules to audit applications. Improper usage of framework APIs can be a common source of bugs, and we at IncludeSec found that a small amount of up front investment learning the syntax paid dividends when auditing applications using these frameworks.

The post Customizing Semgrep Rules for Flask/Django and Other Popular Web Frameworks appeared first on Include Security Research Blog.

Customizing Semgrep Rules for Flask, Django, and Other Popular Web Frameworks

We customize and use Semgrep a lot during our security assessments at IncludeSec because it helps us quickly locate potential areas of concern within large codebases. Static analysis tools (SAST) such as Semgrep are great for aiding our vulnerability hunting efforts and usually can be tied into Continuous Integration (CI) pipelines to help developers catch potential vulnerabilities early in the development process.  In a previous post, we compared two static analysis tools: Brakeman vs. Semgrep. A key takeaway from that post is that when it comes to custom rules, we found that Semgrep was easy to use.

The lovely developers of Semgrep, as well as the general open source community provide pre-written rules for many frameworks that can be used with extreme ease–all it requires is a command line switch and it works. For example:

semgrep --config "p/flask"

Running this on its own can catch bad practices and mistakes. However, writing custom rules can expand Semgrep’s out-of-the-box functionality significantly and is done by advanced security assessors who understand code level security concerns. Whether you want to add rules that look for more specific problems or similar rules with a bigger scope, it’s up to the end-user rule writer to expand in whichever direction they want.

In this post, we walk through some scenarios to write custom Semgrep rules for two popular Python frameworks: Django and Flask.

Why Write Custom Rules for Frameworks?

We see a lot of applications built on top of frameworks like Django and Flask and wanted to prevent duplicative manual effort to identify similar patterns of security concerns on every assessment. While the default community rules are very good in Semgrep, at IncludeSec we needed more than that. Making use of Semgrep’s powerful rules system makes it possible to extend these to cover even more sources of bugs related to framework usage, such as:

  • vulnerabilities caused by use of specific deprecated APIs
  • vulnerabilities caused by lack of error checking in specific patterns
  • vulnerabilities introduced due to lack of locking/mutexes
  • specific combinations of API calls that can cause inefficiencies or loss of performance, or even introduce race conditions

If any of these issues occur frequently on specific APIs, Semgrep is ideal as a one time investment will pay off dividends in the development process.

Making Use of Frameworks 

For developers, using frameworks like Django and Flask make coding easier and more secure. But they aren’t foolproof. If you use them incorrectly, it is still possible to make mistakes. And for each framework, these mistakes tend to follow common patterns.

SAST tools like Semgrep offer the possibility of automating checks for some of these patterns of mistakes to find vulnerabilities that may be common within a framework. 

An analogy for SAST tooling is a compiler whose warnings/errors you can configure extremely easily. This makes it a perfect fit when programming specific frameworks, as you can catch potentially dangerous usages of APIs & unsafe operations before code is ever committed. For auditors it is extremely helpful when working with large codebases, which can be daunting at first due to the sheer amount of code. SAST tooling can locate security “codesmells”, and where there is codesmell, there are often leads to possible security issues.

Step 1. Find patterns of mistakes

In order to write custom rules for a framework, you first have to do some research to identify where in the framework mistakes might occur.

The first place to look when identifying bad habits is the official documentation — often one can find big blocks of formatting with the words WARNING, ERROR, MISTAKE. These blocks can often clue you into common problems with examples, avoiding time wasted searching forums/Stack Overflow posts for common bugs.

The next place to search where one can find real world practical examples would be bug bounty platforms, such as HackerOne, BugCrowd, etc. Searching these platforms can result in quite niche but severe mistakes that might not be in official documentation but can occur in live production applications.

Finally, intentionally vulnerable “hack me” applications such as django.nV, which explain common vulnerabilities that might occur. With concise, straightforward exercises that one can do to learn and also hammer in the impact of the bugs at hand.

For example, in the Flask documentation for logins https://flask-login.readthedocs.io/en/latest/#login-example , a warning block mentions that 

Warning: You MUST validate the value of the next parameter. If you do not, your application will be vulnerable to open redirects. For an example implementation of is_safe_url see this Flask Snippet.

This block warns us about open redirects in the specific login situation it presents, we’ll use something similar for our vulnerable code example: an open redirect where the redirect parameter comes from a url encoded GET request.

Step 2. Identify the pieces of information and the markers in your code

When writing rules, we have to identify the pieces of information that the specific code encodes. This way we can ensure that the patterns we write will be as accurate as possible. Let’s look at an example from Flask:

from flask import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

In this example code, we can see a piece of Flask code that contains an open redirect vulnerability. We can dissect it into its various properties and see how we can match this in Semgrep. First we’ll mention the specific semantics of this function and what exactly we want to match.

Properties:

1. @app.route("/redirect/") – Already on the first line we see that our target functions have a route decorator that tells us that this function is used to handle a request, or that it directly receives user input by virtue of being an endpoint handler. Matching route/endpoint handlers is effective because input to an endpoint handler is unsanitized and could be a potential area of concern: 

from flask import redirect 
 
def do_redirect(uri):
    if is_logging_enabled():
        log(uri)
    
    return redirect(uri)
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    
    if unsafe_uri(uri):
        return redirect_to_index()
    
    return do_redirect(uri)

In the listing above if we were to match every function that includes do_redirect instead of only route handlers that include do_redirect we could end up with false positives where an input to a function has already been sanitized. Already here we have some added complexity that does not bode well with other static analysis tools. In this case we would match do_redirect even though the URI it receives has already been sanitized in the function unsafe_uri(uri). This brings us to our first constraint: we need to match route handlers. 

2.    def handle_request(uri):here it’s important that we match a function right below the function decorator, and that this function takes in a parameter. We could match any function that has a route decorator which also contains a redirect, but then we could possibly match a function where the redirect input is constant or comes from sanitized storage. Matching a route handler with a parameter guarantees that it receives unsanitized user input. We can be sure of this because Flask does not do any URL sanitization. Specifying this results in more accurate matching and finer detection and brings us to our second constraint: that we need to match route handlers with 1 or more parameters

3.    return redirect(uri)here it may seem obvious, all we have to do is match redirect, right? Sadly, it is not that easy. Many APIs can have generic names that may collide with other modules using a generic text/regex search, this can be especially problematic in languages that support function overloading, where a specific overloaded instance of a function may have problems, but other overloaded instances are fine. Not accounting for these may result in many false positives. For example, consider the following snippet:

from robot import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

If we only matched redirect, we would match the redirect function from a module named robot which could be a false positive. An even more horrifying scenario to match would be an API or module that is imported under another name, e.g.:

from flask import redirect as rd

Thankfully, specifying the origin of the function allows Semgrep to handle all these cases and we’ll go more into detail on this when developing the patterns.

What does a good pattern account for?

A good pattern depends on your goals and how you use rules: finding performance bottlenecks, enforcing better programming practices, or finding security concerns as an auditor, everyone’s needs are different.

For a security assessment, it is important to find potential areas of concern, for example often areas that do not include sanitization are potentially dangerous. Ideally we want to eliminate as many false positives as possible and we can do this by excluding functions with sanitization. This brings us to our final constraint: we don’t want to match any functions containing sanitization keywords.

The Constraints

So far we have the following constraints:

  • match a route handler
  • match a function that takes in 1 or more parameters
  • match a redirect in the function that takes in a parameter from the function
  • IDEALLY: don’t match a function containing sanitization keywords

Step 3. Developing The Pattern

Now that we know all the constraints, and the semantics of the code we want to match we can finally start writing the pattern. I’ll put the end pattern for display, and we’ll dissect it together. Semgrep takes YAML files that describe multiple rules. Each rule contains a specific pattern to match.

 rules:
- id: my_pattern_id
  languages:
  - python
  message: found open redirect
  severity: ERROR
  patterns:
  - pattern-inside: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify) 

rules: – Every Semgrep rule file has to start with the rules tag, this is an array of rules as a Semgrep rule file may contain multiple rules.

- id: my_pattern_id Every Semgrep rule in the rules array has an id, this is essentially the name of the rule and must be unique.

languages: 
  - python

The language this rule works with. This determines how it parses the pattern & which files it checks.

message: found open redirect the message displayed when the Semgrep search matches a pattern, you can think of this like a compiler warning message.

severity: ERROR determines the color and other aspects of the messages upon a successful match. You can think of this as a compiler error, except it’s just a more severe warning, this is good for filtering through different levels of matches with Semgrep, or to cut down on time by searching only for erroneous patterns.

patterns:
  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify)

This is the final part of the rule and contains the actual logic of the pattern, a rule has to contain a top-level pattern element. In order for a match to be successful the final result of all the logic has to be true. In this case the top level element is a patterns, which only returns true if all the elements underneath it return true.

  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)

This pattern searches for code that satisfies the first 3 constraints, with the ellipsis representing anything. @app.route(...) will match any call to that function with any number of arguments (including none).

def $X(..., $URI_VAR, ...):

matches any function, and stores its name in the variable $X. It then matches any argument in this function, whether it be in the middle or at the end and stores it in $URI_VAR.

The Ellipsis following matches any code in this function until the next statement in the pattern which in this case is flask.redirect($URI_VAR) which matches redirect only if its arguments come from the function variable $URI_VAR. If these constraints are all satisfied, it then passes the text it matches onto the next pattern and it returns true.

One amazing feature of Semgrep is its ability to match fully qualified function names, even when they are imported with an alias. In this case, matching flask.redirect($URI_VAR) would match only redirects from flask, even if they are imported with another name (such as redir or rd).

- pattern-not-regex: (sanitize|validate|safe|check|verify)

This pattern is responsible for eliminating potential false positives. It’s very simple: it runs a regex against the matched text and if the regex comes back with any matches, it returns false otherwise it returns true. This element is responsible for checking if sanitization elements exist in the function code. The text that is used to check for these sanitization elements is obviously not perfect, but it can be tailored to the project you are working on and can always be extended to include more possible keywords.

Step 4. Testing & Debugging

Now that we’ve made our pattern, we can test it on the online Semgrep playground to see if it works. Here we can make small changes and get instant feedback in order to improve our patterns. Below is an example of the rules at work matching the unsanitized open redirect and ignoring the safe redirect.

https://semgrep.dev/s/65lY

Trade Offs, Quantity vs Quality

When designing these patterns, it’s possible to spend all your time trying to write the best pattern that catches every situation, filters out all the false-positives and what not, but this is an almost futile endeavor and can lead into rabbit holes. Also, overly precise rules may filter things that weren’t even meant to be filtered. The dilemma always comes down to how many false positives are you willing to handle–this tradeoff is up to Semgrep users to decide for themselves. When absolutely critical it may be better to have more false positives but to catch everything, whereas from an auditor’s perspective it may be better to have a more precise ruleset to start with a good lead and to be efficient, and then audit unmatched code later. Or perhaps a graduated approach where higher false positive rules are enabled for subsequent runs of SAST tooling.

Return on Investment

When it comes to analysis tools, it’s important to understand how much you need to set up & maintain to truly get value back. If they are complicated to update and maintain sometimes it’s just not worth it. The great upside to Semgrep is the ease of use–one can start developing patterns after doing the 20 minute tutorial and make a significant amount of rules in a day, and the benefits can be felt immediately. It requires no fiddling with versions or complicated compiler setup, and once a ruleset has been developed it’ll work on any supported languages. 

Showcase – Django.nV

Django.nV is a very well-made intentionally vulnerable application that uses the Django framework to introduce a variety of bugs for learning framework-specific penetration testing, from XSS to more framework specific bugs. Thanks to nVisium for making a great training application open source!

We used Django.nV to test IncludeSec’s inhouse rules and came up with 4 new instances of vulnerabilities that the community rulesets missed:

django.nV/taskManager/settings.py
severity:warning rule:MD5Hasher for password: use a more secure hashing algorithm for password
124:PASSWORD_HASHERS = ['django.contrib.auth.hashers.MD5PasswordHasher']
 
django.nV/taskManager/templates/taskManager/base_backend.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
58:                        <span class="username"><i class="fa fa-user fa-fw"></i> {{ user.username|safe }}</span>
 
django.nV/taskManager/templates/taskManager/tutorials/base.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
54:                        <span class="username">{{ user.username|safe }}</span>
 
django.nV/taskManager/views.py
severity:warning rule:django open redirect: unvalidated open redirect
394:    return redirect(request.GET.get('redirect', '/taskManager/'))

MD5Hashing – detects that the MD5Hasher has been used for passwords, which is cryptographically insecure.

Unsafe template usage in HTML – detects the use of user parameters with the safe keyword in html, which could introduce XSS.

Open redirect – very similar to the example patterns we already discussed. It detects an open redirect in the logout view.

We’ve collaborated with the kind developers of Semgrep and the people over at returntocorp (ret2c) to get certain rules in the default Django Semgrep rule repository.

Conclusion

In conclusion, Semgrep makes it relatively painless to write custom static analysis rules to audit applications. Improper usage of framework APIs can be a common source of bugs, and we at IncludeSec found that a small amount of up front investment learning the syntax paid dividends when auditing applications using these frameworks.

The post Customizing Semgrep Rules for Flask, Django, and Other Popular Web Frameworks appeared first on Include Security Research Blog.

❌
❌