Normal view

There are new articles available, click to refresh the page.
Before yesterdayInclude Security Research Blog

Customizing Semgrep Rules for Flask/Django and Other Popular Web Frameworks

We customize and use Semgrep a lot during our security assessments at IncludeSec because it helps us quickly locate potential areas of concern within large codebases. Static analysis tools (SAST) such as Semgrep are great for aiding our vulnerability hunting efforts and usually can be tied into Continuous Integration (CI) pipelines to help developers catch potential vulnerabilities early in the development process.  In a previous post, we compared two static analysis tools: Brakeman vs. Semgrep. A key takeaway from that post is that when it comes to custom rules, we found that Semgrep was easy to use.

The lovely developers of Semgrep, as well as the general open source community provide pre-written rules for many frameworks that can be used with extreme ease–all it requires is a command line switch and it works. For example:

semgrep --config "p/flask"

Running this on its own can catch bad practices and mistakes. However, writing custom rules can expand Semgrep’s out-of-the-box functionality significantly and is done by advanced security assessors who understand code level security concerns. Whether you want to add rules that look for more specific problems or similar rules with a bigger scope, it’s up to the end-user rule writer to expand in whichever direction they want.

In this post, we walk through some scenarios to write custom Semgrep rules for two popular Python frameworks: Django and Flask.

Why Write Custom Rules for Frameworks?

We see a lot of applications built on top of frameworks like Django and Flask and wanted to prevent duplicative manual effort to identify similar patterns of security concerns on every assessment. While the default community rules are very good in Semgrep, at IncludeSec we needed more than that. Making use of Semgrep’s powerful rules system makes it possible to extend these to cover even more sources of bugs related to framework usage, such as:

  • vulnerabilities caused by use of specific deprecated APIs
  • vulnerabilities caused by lack of error checking in specific patterns
  • vulnerabilities introduced due to lack of locking/mutexes
  • specific combinations of API calls that can cause inefficiencies or loss of performance, or even introduce race conditions

If any of these issues occur frequently on specific APIs then Semgrep is ideal since a one time investment will pay off dividends in future development process.

Making Use of Frameworks 

For developers, using frameworks like Django and Flask make coding easier and more secure. But they aren’t foolproof. If you use them incorrectly, it is still possible to make mistakes. And for each framework, these mistakes tend to follow common patterns.

SAST tools like Semgrep offer the possibility of automating checks for some of these patterns of mistakes to find vulnerabilities that may be common within a framework. 

An analogy for SAST tooling is a compiler whose warnings/errors you can configure extremely easily. This makes it a perfect fit when programming specific frameworks, as you can catch potentially dangerous usages of APIs & unsafe operations before code is ever committed. For auditors it is extremely helpful when working with large codebases, which can be daunting at first due to the sheer amount of code. SAST tooling can locate security “codesmells”, and where there is codesmell, there are often leads to possible security concerns.

Step 1. Find patterns of mistakes

In order to write custom rules for a framework, you first have to do some research to identify where in the framework mistakes might occur.

The first place to look when identifying bad habits is the official documentation — often one can find big blocks of formatting with the words WARNING, ERROR, MISTAKE. These blocks can often clue you into common problems with examples, avoiding time wasted searching forums/Stack Overflow posts for common bugs.

The next place to search where one can find real world practical examples would be bug bounty platforms, such as HackerOne, BugCrowd, etc. Searching these platforms can result in quite niche but severe mistakes that might not be in official documentation but can occur in live production applications.

Finally, intentionally vulnerable “hack me” applications such as django.nV, which explain common vulnerabilities that might occur. With concise, straightforward exercises that one can do to learn and also hammer in the impact of the bugs at hand.

For example, in the Flask documentation for logins https://flask-login.readthedocs.io/en/latest/#login-example , a warning block mentions that 

Warning: You MUST validate the value of the next parameter. If you do not, your application will be vulnerable to open redirects. For an example implementation of is_safe_url see this Flask Snippet.

This block warns us about open redirects in the specific login situation it presents, we’ll use something similar for our vulnerable code example: an open redirect where the redirect parameter comes from a url encoded GET request.

Step 2. Identify the pieces of information and the markers in your code

When writing rules, we have to identify the pieces of information that the specific code encodes. This way we can ensure that the patterns we write will be as accurate as possible. Let’s look at an example from Flask:

from flask import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

In this example code, we can see a piece of Flask code that contains an open redirect vulnerability. We can dissect it into its various properties and see how we can match this in Semgrep. First we’ll mention the specific semantics of this function and what exactly we want to match.

Properties:

1. @app.route("/redirect/") – Already on the first line we see that our target functions have a route decorator that tells us that this function is used to handle a request, or that it directly receives user input by virtue of being an endpoint handler. Matching route/endpoint handlers is effective because input to an endpoint handler is unsanitized and could be a potential area of concern: 

from flask import redirect 
 
def do_redirect(uri):
    if is_logging_enabled():
        log(uri)
    
    return redirect(uri)
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    
    if unsafe_uri(uri):
        return redirect_to_index()
    
    return do_redirect(uri)

In the listing above if we were to match every function that includes do_redirect instead of only route handlers that include do_redirect we could end up with false positives where an input to a function has already been sanitized. Already here we have some added complexity that does not bode well with other static analysis tools. In this case we would match do_redirect even though the URI it receives has already been sanitized in the function unsafe_uri(uri). This brings us to our first constraint: we need to match route handlers. 

2.    def handle_request(uri):here it’s important that we match a function right below the function decorator, and that this function takes in a parameter. We could match any function that has a route decorator which also contains a redirect, but then we could possibly match a function where the redirect input is constant or comes from sanitized storage. Matching a route handler with a parameter guarantees that it receives unsanitized user input. We can be sure of this because Flask does not do any URL sanitization. Specifying this results in more accurate matching and finer detection and brings us to our second constraint: that we need to match route handlers with 1 or more parameters

3.    return redirect(uri)here it may seem obvious, all we have to do is match redirect, right? Sadly, it is not that easy. Many APIs can have generic names that may collide with other modules using a generic text/regex search, this can be especially problematic in languages that support function overloading, where a specific overloaded instance of a function may have problems, but other overloaded instances are fine. Not accounting for these may result in many false positives. For example, consider the following snippet:

from robot import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

If we only matched redirect, we would match the redirect function from a module named robot which could be a false positive. An even more horrifying scenario to match would be an API or module that is imported under another name, e.g.:

from flask import redirect as rd

Thankfully, specifying the origin of the function allows Semgrep to handle all these cases and we’ll go more into detail on this when developing the patterns.

What does a good pattern account for?

A good pattern depends on your goals and how you use rules: finding performance bottlenecks, enforcing better programming practices, or finding security concerns as an auditor, everyone’s needs are different.

For a security assessment, it is important to find potential areas of concern, for example often areas that do not include sanitization are potentially dangerous. Ideally we want to eliminate as many false positives as possible and we can do this by excluding functions with sanitization. This brings us to our final constraint: we don’t want to match any functions containing sanitization keywords.

The Constraints

So far we have the following constraints:

  • match a route handler
  • match a function that takes in 1 or more parameters
  • match a redirect in the function that takes in a parameter from the function
  • IDEALLY: don’t match a function containing sanitization keywords

Step 3. Developing The Pattern

Now that we know all the constraints, and the semantics of the code we want to match we can finally start writing the pattern. I’ll put the end pattern for display, and we’ll dissect it together. Semgrep takes YAML files that describe multiple rules. Each rule contains a specific pattern to match.

 rules:
- id: my_pattern_id
  languages:
  - python
  message: found open redirect
  severity: ERROR
  patterns:
  - pattern-inside: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify) 

rules: – Every Semgrep rule file has to start with the rules tag, this is an array of rules as a Semgrep rule file may contain multiple rules.

- id: my_pattern_id Every Semgrep rule in the rules array has an id, this is essentially the name of the rule and must be unique.

languages: 
  - python

The language this rule works with. This determines how it parses the pattern & which files it checks.

message: found open redirect the message displayed when the Semgrep search matches a pattern, you can think of this like a compiler warning message.

severity: ERROR determines the color and other aspects of the messages upon a successful match. You can think of this as a compiler error, except it’s just a more severe warning, this is good for filtering through different levels of matches with Semgrep, or to cut down on time by searching only for erroneous patterns.

patterns:
  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify)

This is the final part of the rule and contains the actual logic of the pattern, a rule has to contain a top-level pattern element. In order for a match to be successful the final result of all the logic has to be true. In this case the top level element is a patterns, which only returns true if all the elements underneath it return true.

  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)

This pattern searches for code that satisfies the first 3 constraints, with the ellipsis representing anything. @app.route(...) will match any call to that function with any number of arguments (including none).

def $X(..., $URI_VAR, ...):

matches any function, and stores its name in the variable $X. It then matches any argument in this function, whether it be in the middle or at the end and stores it in $URI_VAR.

The Ellipsis following matches any code in this function until the next statement in the pattern which in this case is flask.redirect($URI_VAR) which matches redirect only if its arguments come from the function variable $URI_VAR. If these constraints are all satisfied, it then passes the text it matches onto the next pattern and it returns true.

One amazing feature of Semgrep is its ability to match fully qualified function names, even when they are imported with an alias. In this case, matching flask.redirect($URI_VAR) would match only redirects from flask, even if they are imported with another name (such as redir or rd).

- pattern-not-regex: (sanitize|validate|safe|check|verify)

This pattern is responsible for eliminating potential false positives. It’s very simple: it runs a regex against the matched text and if the regex comes back with any matches, it returns false otherwise it returns true. With this we’re checking if likely sanitization elements exist in the function code. The text that is used to check for these sanitization elements is obviously not perfect, but it can be tailored to the project you are working on and can always be extended to include more possible keywords. Alternatively it can be removed completely when considering the false positives vs. missed true positives balance.

Step 4. Testing & Debugging

Now that we’ve made our pattern, we can test it on the online Semgrep playground to see if it works. Here we can make small changes and get instant feedback in order to improve our patterns. Below is an example of the rules at work matching the unsanitized open redirect and ignoring the safe redirect.

https://semgrep.dev/s/65lY

Trade Offs, Quantity vs Quality

When designing these patterns, it’s possible to spend all your time trying to write the best pattern that catches every situation, filters out all the false-positives and what not, but this is an almost futile endeavor and can lead into rabbit holes. Also, overly precise rules may filter things that weren’t even meant to be filtered. The dilemma always comes down to how many false positives are you willing to handle–this tradeoff is up to Semgrep users to decide for themselves. When absolutely critical it may be better to have more false positives but to catch everything, whereas from an auditor’s perspective it may be better to have a more precise ruleset to start with a good lead and to be efficient, and then audit unmatched code later. Or perhaps a graduated approach where higher false positive rules are enabled for subsequent runs of SAST tooling.

Return on Investment

When it comes to analysis tools, it’s important to understand how much you need to set up & maintain to truly get value back. If they are complicated to update and maintain sometimes it’s just not worth it. The great upside to Semgrep is the ease of use–one can start developing patterns after doing the 20 minute tutorial and make a significant amount of rules in a day, and the benefits can be felt immediately. It requires no fiddling with versions or complicated compiler setup, and once a ruleset has been developed it’ll work on any supported languages. 

Showcase – Django.nV

Django.nV is a very well-made intentionally vulnerable application that uses the Django framework to introduce a variety of bugs for learning framework-specific penetration testing, from XSS to more framework specific bugs. Thanks to nVisium for making a great training application open source!

We used Django.nV to test IncludeSec’s inhouse rules and came up with 4 new instances of vulnerabilities that the community rulesets missed:

django.nV/taskManager/settings.py
severity:warning rule:MD5Hasher for password: use a more secure hashing algorithm for password
124:PASSWORD_HASHERS = ['django.contrib.auth.hashers.MD5PasswordHasher']
 
django.nV/taskManager/templates/taskManager/base_backend.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
58:                        <span class="username"><i class="fa fa-user fa-fw"></i> {{ user.username|safe }}</span>
 
django.nV/taskManager/templates/taskManager/tutorials/base.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
54:                        <span class="username">{{ user.username|safe }}</span>
 
django.nV/taskManager/views.py
severity:warning rule:django open redirect: unvalidated open redirect
394:    return redirect(request.GET.get('redirect', '/taskManager/'))

MD5Hashing – detects that the MD5Hasher has been used for passwords, which is cryptographically insecure.

Unsafe template usage in HTML – detects the use of user parameters with the safe keyword in html, which could introduce XSS.

Open redirect – very similar to the example patterns we already discussed. It detects an open redirect in the logout view.

We’ve collaborated with the kind developers of Semgrep and the people over at returntocorp (r2c) to get certain rules in the default Django Semgrep rule repository.

Conclusion

In conclusion, Semgrep makes it relatively painless to write custom static analysis rules to audit applications. Improper usage of framework APIs can be a common source of bugs, and we at IncludeSec found that a small amount of up front investment learning the syntax paid dividends when auditing applications using these frameworks.

The post Customizing Semgrep Rules for Flask/Django and Other Popular Web Frameworks appeared first on Include Security Research Blog.

Customizing Semgrep Rules for Flask/Django and Other Popular Web Frameworks

We customize and use Semgrep a lot during our security assessments at IncludeSec because it helps us quickly locate potential areas of concern within large codebases. Static analysis tools (SAST) such as Semgrep are great for aiding our vulnerability hunting efforts and usually can be tied into Continuous Integration (CI) pipelines to help developers catch potential vulnerabilities early in the development process.  In a previous post, we compared two static analysis tools: Brakeman vs. Semgrep. A key takeaway from that post is that when it comes to custom rules, we found that Semgrep was easy to use.

The lovely developers of Semgrep, as well as the general open source community provide pre-written rules for many frameworks that can be used with extreme ease–all it requires is a command line switch and it works. For example:

semgrep --config "p/flask"

Running this on its own can catch bad practices and mistakes. However, writing custom rules can expand Semgrep’s out-of-the-box functionality significantly and is done by advanced security assessors who understand code level security concerns. Whether you want to add rules that look for more specific problems or similar rules with a bigger scope, it’s up to the end-user rule writer to expand in whichever direction they want.

In this post, we walk through some scenarios to write custom Semgrep rules for two popular Python frameworks: Django and Flask.

Why Write Custom Rules for Frameworks?

We see a lot of applications built on top of frameworks like Django and Flask and wanted to prevent duplicative manual effort to identify similar patterns of security concerns on every assessment. While the default community rules are very good in Semgrep, at IncludeSec we needed more than that. Making use of Semgrep’s powerful rules system makes it possible to extend these to cover even more sources of bugs related to framework usage, such as:

  • vulnerabilities caused by use of specific deprecated APIs
  • vulnerabilities caused by lack of error checking in specific patterns
  • vulnerabilities introduced due to lack of locking/mutexes
  • specific combinations of API calls that can cause inefficiencies or loss of performance, or even introduce race conditions

If any of these issues occur frequently on specific APIs, Semgrep is ideal as a one time investment will pay off dividends in the development process.

Making Use of Frameworks 

For developers, using frameworks like Django and Flask make coding easier and more secure. But they aren’t foolproof. If you use them incorrectly, it is still possible to make mistakes. And for each framework, these mistakes tend to follow common patterns.

SAST tools like Semgrep offer the possibility of automating checks for some of these patterns of mistakes to find vulnerabilities that may be common within a framework. 

An analogy for SAST tooling is a compiler whose warnings/errors you can configure extremely easily. This makes it a perfect fit when programming specific frameworks, as you can catch potentially dangerous usages of APIs & unsafe operations before code is ever committed. For auditors it is extremely helpful when working with large codebases, which can be daunting at first due to the sheer amount of code. SAST tooling can locate security “codesmells”, and where there is codesmell, there are often leads to possible security issues.

Step 1. Find patterns of mistakes

In order to write custom rules for a framework, you first have to do some research to identify where in the framework mistakes might occur.

The first place to look when identifying bad habits is the official documentation — often one can find big blocks of formatting with the words WARNING, ERROR, MISTAKE. These blocks can often clue you into common problems with examples, avoiding time wasted searching forums/Stack Overflow posts for common bugs.

The next place to search where one can find real world practical examples would be bug bounty platforms, such as HackerOne, BugCrowd, etc. Searching these platforms can result in quite niche but severe mistakes that might not be in official documentation but can occur in live production applications.

Finally, intentionally vulnerable “hack me” applications such as django.nV, which explain common vulnerabilities that might occur. With concise, straightforward exercises that one can do to learn and also hammer in the impact of the bugs at hand.

For example, in the Flask documentation for logins https://flask-login.readthedocs.io/en/latest/#login-example , a warning block mentions that 

Warning: You MUST validate the value of the next parameter. If you do not, your application will be vulnerable to open redirects. For an example implementation of is_safe_url see this Flask Snippet.

This block warns us about open redirects in the specific login situation it presents, we’ll use something similar for our vulnerable code example: an open redirect where the redirect parameter comes from a url encoded GET request.

Step 2. Identify the pieces of information and the markers in your code

When writing rules, we have to identify the pieces of information that the specific code encodes. This way we can ensure that the patterns we write will be as accurate as possible. Let’s look at an example from Flask:

from flask import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

In this example code, we can see a piece of Flask code that contains an open redirect vulnerability. We can dissect it into its various properties and see how we can match this in Semgrep. First we’ll mention the specific semantics of this function and what exactly we want to match.

Properties:

1. @app.route("/redirect/") – Already on the first line we see that our target functions have a route decorator that tells us that this function is used to handle a request, or that it directly receives user input by virtue of being an endpoint handler. Matching route/endpoint handlers is effective because input to an endpoint handler is unsanitized and could be a potential area of concern: 

from flask import redirect 
 
def do_redirect(uri):
    if is_logging_enabled():
        log(uri)
    
    return redirect(uri)
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    
    if unsafe_uri(uri):
        return redirect_to_index()
    
    return do_redirect(uri)

In the listing above if we were to match every function that includes do_redirect instead of only route handlers that include do_redirect we could end up with false positives where an input to a function has already been sanitized. Already here we have some added complexity that does not bode well with other static analysis tools. In this case we would match do_redirect even though the URI it receives has already been sanitized in the function unsafe_uri(uri). This brings us to our first constraint: we need to match route handlers. 

2.    def handle_request(uri):here it’s important that we match a function right below the function decorator, and that this function takes in a parameter. We could match any function that has a route decorator which also contains a redirect, but then we could possibly match a function where the redirect input is constant or comes from sanitized storage. Matching a route handler with a parameter guarantees that it receives unsanitized user input. We can be sure of this because Flask does not do any URL sanitization. Specifying this results in more accurate matching and finer detection and brings us to our second constraint: that we need to match route handlers with 1 or more parameters

3.    return redirect(uri)here it may seem obvious, all we have to do is match redirect, right? Sadly, it is not that easy. Many APIs can have generic names that may collide with other modules using a generic text/regex search, this can be especially problematic in languages that support function overloading, where a specific overloaded instance of a function may have problems, but other overloaded instances are fine. Not accounting for these may result in many false positives. For example, consider the following snippet:

from robot import redirect
 
@app.route("/redirect/<uri>")
def handle_request(uri):
    #unsafe open_redirect
    return redirect(uri)

If we only matched redirect, we would match the redirect function from a module named robot which could be a false positive. An even more horrifying scenario to match would be an API or module that is imported under another name, e.g.:

from flask import redirect as rd

Thankfully, specifying the origin of the function allows Semgrep to handle all these cases and we’ll go more into detail on this when developing the patterns.

What does a good pattern account for?

A good pattern depends on your goals and how you use rules: finding performance bottlenecks, enforcing better programming practices, or finding security concerns as an auditor, everyone’s needs are different.

For a security assessment, it is important to find potential areas of concern, for example often areas that do not include sanitization are potentially dangerous. Ideally we want to eliminate as many false positives as possible and we can do this by excluding functions with sanitization. This brings us to our final constraint: we don’t want to match any functions containing sanitization keywords.

The Constraints

So far we have the following constraints:

  • match a route handler
  • match a function that takes in 1 or more parameters
  • match a redirect in the function that takes in a parameter from the function
  • IDEALLY: don’t match a function containing sanitization keywords

Step 3. Developing The Pattern

Now that we know all the constraints, and the semantics of the code we want to match we can finally start writing the pattern. I’ll put the end pattern for display, and we’ll dissect it together. Semgrep takes YAML files that describe multiple rules. Each rule contains a specific pattern to match.

 rules:
- id: my_pattern_id
  languages:
  - python
  message: found open redirect
  severity: ERROR
  patterns:
  - pattern-inside: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify) 

rules: – Every Semgrep rule file has to start with the rules tag, this is an array of rules as a Semgrep rule file may contain multiple rules.

- id: my_pattern_id Every Semgrep rule in the rules array has an id, this is essentially the name of the rule and must be unique.

languages: 
  - python

The language this rule works with. This determines how it parses the pattern & which files it checks.

message: found open redirect the message displayed when the Semgrep search matches a pattern, you can think of this like a compiler warning message.

severity: ERROR determines the color and other aspects of the messages upon a successful match. You can think of this as a compiler error, except it’s just a more severe warning, this is good for filtering through different levels of matches with Semgrep, or to cut down on time by searching only for erroneous patterns.

patterns:
  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)
  - pattern-not-regex: (sanitize|validate|safe|check|verify)

This is the final part of the rule and contains the actual logic of the pattern, a rule has to contain a top-level pattern element. In order for a match to be successful the final result of all the logic has to be true. In this case the top level element is a patterns, which only returns true if all the elements underneath it return true.

  - pattern: |
      @app.route(...)
      def $X(..., $URI_VAR, ...):
        ...
        flask.redirect($URI_VAR)

This pattern searches for code that satisfies the first 3 constraints, with the ellipsis representing anything. @app.route(...) will match any call to that function with any number of arguments (including none).

def $X(..., $URI_VAR, ...):

matches any function, and stores its name in the variable $X. It then matches any argument in this function, whether it be in the middle or at the end and stores it in $URI_VAR.

The Ellipsis following matches any code in this function until the next statement in the pattern which in this case is flask.redirect($URI_VAR) which matches redirect only if its arguments come from the function variable $URI_VAR. If these constraints are all satisfied, it then passes the text it matches onto the next pattern and it returns true.

One amazing feature of Semgrep is its ability to match fully qualified function names, even when they are imported with an alias. In this case, matching flask.redirect($URI_VAR) would match only redirects from flask, even if they are imported with another name (such as redir or rd).

- pattern-not-regex: (sanitize|validate|safe|check|verify)

This pattern is responsible for eliminating potential false positives. It’s very simple: it runs a regex against the matched text and if the regex comes back with any matches, it returns false otherwise it returns true. This element is responsible for checking if sanitization elements exist in the function code. The text that is used to check for these sanitization elements is obviously not perfect, but it can be tailored to the project you are working on and can always be extended to include more possible keywords.

Step 4. Testing & Debugging

Now that we’ve made our pattern, we can test it on the online Semgrep playground to see if it works. Here we can make small changes and get instant feedback in order to improve our patterns. Below is an example of the rules at work matching the unsanitized open redirect and ignoring the safe redirect.

https://semgrep.dev/s/65lY

Trade Offs, Quantity vs Quality

When designing these patterns, it’s possible to spend all your time trying to write the best pattern that catches every situation, filters out all the false-positives and what not, but this is an almost futile endeavor and can lead into rabbit holes. Also, overly precise rules may filter things that weren’t even meant to be filtered. The dilemma always comes down to how many false positives are you willing to handle–this tradeoff is up to Semgrep users to decide for themselves. When absolutely critical it may be better to have more false positives but to catch everything, whereas from an auditor’s perspective it may be better to have a more precise ruleset to start with a good lead and to be efficient, and then audit unmatched code later. Or perhaps a graduated approach where higher false positive rules are enabled for subsequent runs of SAST tooling.

Return on Investment

When it comes to analysis tools, it’s important to understand how much you need to set up & maintain to truly get value back. If they are complicated to update and maintain sometimes it’s just not worth it. The great upside to Semgrep is the ease of use–one can start developing patterns after doing the 20 minute tutorial and make a significant amount of rules in a day, and the benefits can be felt immediately. It requires no fiddling with versions or complicated compiler setup, and once a ruleset has been developed it’ll work on any supported languages. 

Showcase – Django.nV

Django.nV is a very well-made intentionally vulnerable application that uses the Django framework to introduce a variety of bugs for learning framework-specific penetration testing, from XSS to more framework specific bugs. Thanks to nVisium for making a great training application open source!

We used Django.nV to test IncludeSec’s inhouse rules and came up with 4 new instances of vulnerabilities that the community rulesets missed:

django.nV/taskManager/settings.py
severity:warning rule:MD5Hasher for password: use a more secure hashing algorithm for password
124:PASSWORD_HASHERS = ['django.contrib.auth.hashers.MD5PasswordHasher']
 
django.nV/taskManager/templates/taskManager/base_backend.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
58:                        <span class="username"><i class="fa fa-user fa-fw"></i> {{ user.username|safe }}</span>
 
django.nV/taskManager/templates/taskManager/tutorials/base.html
severity:error rule:Unsafe XSS usage: unsafe template usage in html,
54:                        <span class="username">{{ user.username|safe }}</span>
 
django.nV/taskManager/views.py
severity:warning rule:django open redirect: unvalidated open redirect
394:    return redirect(request.GET.get('redirect', '/taskManager/'))

MD5Hashing – detects that the MD5Hasher has been used for passwords, which is cryptographically insecure.

Unsafe template usage in HTML – detects the use of user parameters with the safe keyword in html, which could introduce XSS.

Open redirect – very similar to the example patterns we already discussed. It detects an open redirect in the logout view.

We’ve collaborated with the kind developers of Semgrep and the people over at returntocorp (ret2c) to get certain rules in the default Django Semgrep rule repository.

Conclusion

In conclusion, Semgrep makes it relatively painless to write custom static analysis rules to audit applications. Improper usage of framework APIs can be a common source of bugs, and we at IncludeSec found that a small amount of up front investment learning the syntax paid dividends when auditing applications using these frameworks.

The post Customizing Semgrep Rules for Flask/Django and Other Popular Web Frameworks appeared first on Include Security Research Blog.

Announcing RTSPhuzz — An RTSP Server Fuzzer

15 June 2020 at 14:00

There are many ways software is tested for faults, some of those faults end up originating from exploitable memory corruption situations and are labeled vulnerabilities. One popular method used to identify these types of faults in software is runtime fuzzing.

When developing servers that implement an RFC defined protocol, dynamically mutating the inputs and messages sent to the server is a good strategy for fuzzing. The Mozilla security team has used fuzzing internally to great effect on their systems and applications over the years. One area that Mozilla wanted to see more open source work in was fuzzing of streaming media protocols, specifically RTSP.

Towards that goal IncludeSec is today releasing https://github.com/IncludeSecurity/RTSPhuzz. We’re also excited to announce the work of the initial development of the tool has been sponsored by the Mozilla Open Source Support (MOSS) awards program. RTSPhuzz is provided as free and open unsupported software for the greater good of the maintainers and authors of RTSP services — FOSS and COTS alike!

RTSPhuzz is based on the boofuzz framework, it and connects as a client to target RTSP servers and fuzzes RTSP messages or sequences of messages. In the rest of this post we’ll cover some of the important bits to know about it. If you have an RTSP server, go ahead and jump right into our repo and shoot us a note to say hi if it ends up being useful to you.

Existing Approaches

We are aware of two existing RTSP fuzzers, StreamFUZZ and RtspFuzzer.

RtspFuzzer uses the Peach fuzzing framework to fuzz RTSP responses, however it targets RTSP client implementations, whereas our fuzzer targets RTSP servers.

StreamFUZZ is a Python script that does not utilize a fuzzing framework. Similar to our fuzzer, it fuzzes different parts of RTSP messages and sends them to a server. However, it is more simplistic; it doesn’t fuzz as many messages or header fields as our fuzzer, it does not account for the types of the fields it fuzzes, and it does not keep track of sessions for fuzzing sequences of messages.

Approach to Fuzzer Creation

The general approach for RTSPhuzz was to first review the RTSP RFC carefully, then define each of the client-to-server message types as boofuzz messages. RTSP headers were then distributed among the boofuzz messages in such a way that each is mutated by the boofuzz engine in at least one message, and boofuzz messages are connected in a graph to reasonably simulate RTSP sessions. Header values and message bodies were given initial reasonable default values to allow successful fuzzing of later messages in a sequence of messages. Special processing is done for several headers so that they conform to the protocol when different parts of messages are being mutated. The boofuzz fuzzing framework gives us the advantage of being able to leverage its built-in mutations, logging, and web interface.

Using RTSPhuzz

You can grab the code from github. Then, specify the server host, server port, and RTSP path to a media file on the target server:

RTSPhuzz.py --host target.server.host --port 554 --path test/media/file.mp3

Once RTSPhuzz is started, the boofuzz framework will open the default web interface on localhost port 26000, and will record results locally in a boofuzz-results/ directory. The web interface can be re-opened for the database from a previous run with boofuzz’s boo tool:

boo open <run-*.db>

See the RTSPhuzz readme for more detailed options and ways to run RTSPhuzz, and boofuzz’s documentation for more information on boofuzz.

Open Source and Continued Development

This is RTSPhuzz’s initial release for open use by all. We encourage you to try it out and share ways to improve the tool. We will review and accept PRs, address bugs where we can, and also would love to hear any shout-outs for any bugs you find with this tool (@includesecurity).

The post Announcing RTSPhuzz — An RTSP Server Fuzzer appeared first on Include Security Research Blog.

Introducing: SafeURL – A set of SSRF Protection Libraries

At Include Security, we believe that a reactive approach to security can fall short when it’s not backed by proactive roots. We see new offensive tools for pen-testing and vulnerability analysis being created and released all the time. In regards to SSRF vulnerabilities, we saw an opportunity to release code for developers to assist in protecting against these sorts of security issues. So we’re releasing a new set of language specific libraries to help developers effectively protect against SSRF issues. In this blog post, we’ll introduce the concept of SafeURL; with details about how it works, as well as how developers can use it, and our plans for rewarding those who find vulnerabilities in it!

Preface: Server Side Request Forgery

Server Side Request Forgery (SSRF) is a vulnerability that gives an attacker the ability to create requests from a vulnerable server. SSRF attacks are commonly used to target not only the host server itself, but also hosts on the internal network that would normally be inaccessible due to firewalls.
SSRF allows an attacker to:

  • Scan and attack systems from the internal network that are not normally accessible
  • Enumerate and attack services that are running on these hosts
  • Exploit host-based authentication services

As is the case with many web application vulnerabilities, SSRF is possible because of a lack of user input validation. For example, a web application that accepts a URL input in order to go fetch that resource from the internet can be given a valid URL such as http://google.com
But the application may also accept URLs such as:

When those kinds of inputs are not validated, attackers are able to access internal resources that are not intended to be public.

Our Proposed Solution

SafeURL is a library, originally conceptualized as “SafeCURL” by Jack Whitton (aka @fin1te), that protects against SSRF by validating each part of the URL against a white or black list before making the request. SafeURL can also be used to validate URLs. SafeURL intends to be a simple replacement for libcurl methods in PHP and Python as well as java.net.URLConnection in Scala.
The source for the libraries are available on our Github:

  1. SafeURL for PHP – Primarily developed by @fin1te
  2. SafeURL for Python – Ported by @nicolasrod
  3. SafeURL for Scala – Ported by @saelo

Other Mitigation Techniques

Our approach is focused on protection on the application layer. Other techniques used by some Silicon Valley companies to combat SSRF include:

  • Setting up wrappers for HTTP client calls which are forwarded to a single-purposed proxy that prevents it from talking to any internal hosts based on firewall rules as the HTTP requests are proxied
  • At the application server layer, hijack all socket connections to ensure they meet a developer configured policy by enforcing iptables rules or more advanced interactions with the app server’s networking layer

Installation

PHP

SafeURL can be included in any PHP project by cloning the repository on our Github and importing it into your project.

Python

SafeURL can be used in Python apps by cloning the repository on our Github and importing it like this:

from safeurl import safeurl

Scala

To use SafeURL in Scala applications, clone the repository and store in the app/ folder of your Play application and import it.

import com.includesecurity.safeurl._

Usage

PHP

SafeURL is designed to be a drop-in replacement for the curl_exec() function in PHP. It can simply be replaced with SafeURL::execute() wrapped in a try {} catch {} block.

try {
    $url = "http://www.google.com";

    $curlHandle = curl_init();
    //Your usual cURL options
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (SafeURL)");

    //Execute using SafeURL
    $response = SafeURL::execute($url, $curlHandle);
} catch (Exception $e) {
    //URL wasn"t safe
}

Options such as white and black lists can be modified. For example:

$options = new Options();
$options->addToList("blacklist", "domain", "(.*)\.fin1te\.net");
$options->addToList("whitelist", "scheme", "ftp");

//This will now throw an InvalidDomainException
$response = SafeURL::execute("http://example.com", $curlHandle, $options);

//Whilst this will be allowed, and return the response
$response = SafeURL::execute("ftp://example.com", $curlHandle, $options);

Python

SafeURL serves as a replacement for PyCurl in Python.

try:
  su = safeurl.SafeURL()
  res = su.execute("https://example.com";)
except:
  print "Unexpected error:", sys.exc_info()

Example of modifying options:

try:
    sc = safeurl.SafeURL()

    opt = safeurl.Options()
    opt.clearList("whitelist")
    opt.clearList("blacklist")
    opt.setList("whitelist", [ 
    "google.com" , "youtube.com"], "domain")

    su.setOptions(opt)
    res = su.execute("http://www.youtube.com")
except:
    print "Unexpected error:", sys.exc_info()

Scala

SafeURL replaces the JVM Class URLConnection that is normally used in Scala.

try {
  val resp = SafeURL.fetch("http://google.com")
  val r = Await.result(resp, 500 millis)
} catch {
  //URL wasnt safe

Options:

SafeURL.defaultConfiguration.lists.ip.blacklist ::= "12.34.0.0/16"
SafeURL.defaultConfiguration.lists.domain.blacklist ::= "example.com"

Demo, Bug Bounty Contest, and Further Contributions

An important question to ask is: Is SafeURL really safe? Don’t take our word for it. Try to hack it yourself! We’re hosting live demo apps in each language for anyone to try and bypass SafeURL and perform a successful SSRF attack. On each site there is a file called key.txt on the server’s local filesystem with the following .htaccess policy:

<Files key.txt>
Order deny,allow
Deny from allow
Allow from 127.0.0.1

ErrorDocument 403 /oops.html
</Files>

If you can read the contents of the file through a flaw in SafeURL and tell us how you did it (patch plz?), we will contact you about your reward. As a thank you to the community, we’re going to reward up to one Bitcoin for any security issues. If you find a non-security bug in the source of any of our libraries, please contact us as well you’ll have our thanks and a shout-out.
The challenges are being hosted at the following URLs:
PHP: safeurl-php.excludesecurity.com
Python: safeurl-python.excludesecurity.com
Scala: safeurl-scala.excludesecurity.com

If you can contribute a Pull Request and port the SafeURL concept to other languages (such as Java, Ruby, C#, etc.) we could throw you you some Bitcoin as a thank you.

Good luck and thanks for helping us improve SafeURL!

The post Introducing: SafeURL – A set of SSRF Protection Libraries appeared first on Include Security Research Blog.

❌
❌